ADVANCES IN QUANT I TAT IV E STRUCTURE-PROPERTY RELATlONS HIPS Volume2
1999
This Page Intentionally Left Blank
ADV...
83 downloads
1402 Views
10MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
ADVANCES IN QUANT I TAT IV E STRUCTURE-PROPERTY RELATlONS HIPS Volume2
1999
This Page Intentionally Left Blank
ADVANCES IN Q UA NT ITAT IVE ST RUCT U RE-P R OPE RTY RELATlONS HIPS Editors: MARVIN CHARTON Department of Chemistry Pratt Institute Brooklyn, New York
BARBARA 1. CHARTON St. John’s University Science Library New York, New York
VOLUME2
1999
n
Al PRESS INC. Stamford, Connecticut
Copyright 0 1999 by JAl PRESS INC. 100 Prospect Street Stamford, Connecticut 06904-08 I 1 All rights reserved. No part of this publication may be reproduced, stored on a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, filming, recording, or otherwise without prior permission in writing from the publisher. ISBN: 1-7623-0067-1
Manufactured in the United States of America
CONTENTS
vii
LIST OF CONTRIBUTORS PREFACE Marvin Charton and Barbara Charton
ix
EXPLORING THE ENERGETICS OF BINDING I N CHROMATOGRAPHY A N D RELATED EVENTS Philip S. Magee
1
STRUCTURAL EFFECTS ON GAS-PHASE REACTIVITIES Gabriel Chuchani, Masaaki Mishima, Rafael Notario, and JoskLuis M. Abboud
35
THE PREDICTION OF MELTING POINT John C. Dearden
127
THE APPLICATION OF THE INTERMOLECULAR FORCE MODEL TO PEPTIDE A N D PROTEIN QSAR Marvin Charton
177
INDEX
253
V
This Page Intentionally Left Blank
LIST OF CONTRIBUTORS
Jose-Luis M. Abboud
Institute de Quimica Fisica Rocasolano C. S. I. C. Madrid, Spain
Marvin Charton
Department of Chemistry Pratt Institute Brooklyn, New York
Gabriel Chuchani
Centro de Quimica Institute Venezolano de Investigaciones Cientificas Caracas, Venezuela
John C. Dearden
School of Pharmacy and Chemistry John Moore University Liverpool, England
Philip S. Magee
BIOSAR Research Project Vallejo, California
Masaaki Mishima
Institute for Fundamental Research of Organic Chemistry Kyushu University Fukuoka, Japan
Rafael Notario
Institute de Quimica FisIca Rocasolano C S. I. C. Madrid, Spain
VII
This Page Intentionally Left Blank
PREFACE Quantitative structure property relationships (QSPR) have become a major method of chemical research. In the course of this development the field has suffered from fragmentation. Applications of QSPR are found in all major chemical disciplines including physical organic, physical, medicinal, agricultural, biological, environmental, and polymer chemistry. Frequently workers in one area are unaware of parameterizations and models used in other areas which they might well find useful. There is a common thread which runs through these widely diverse areas. The basic principles, parameterizations and methodology are the same or similar throughout. The object of this series is to provide interesting and timely reviews covering all aspects of the field. It is our hope that this will encourage the transfer of new methods, techniques, and parameterizations from the area in which they were developed to other areas that can make good use of them. In view of the widespread use of QSPR we believe that this is an important objective. We hope that this series will provide the cross-fertilization which we believe to be so sorely needed. Marvin and Barbara I. Charton Editors
This Page Intentionally Left Blank
EXPLORING THE ENERGETICS OF BINDING IN CHROMATOGRAPHY AND RELATED EVENTS
Philip S. Magee
I. Introduction to Adsorption Binding II. The Modeling of Intermolecular Binding Forces III. Binding of Organic Compounds on Inorganic Polymers A. Heats of Adsorption on Clay, Silica, and Alumina B. Adsorption Chromatography on Silica and Alumina IV. Binding of Organic Compounds to Organic Polymers A. Heats of Adsorption on Cellulose and Activated Carbon B. Adsorption Chromatography on Cellulose and Paper V. Binding of Organic Compounds on Bioorganic Polymers VI. Conclusions Note References
Advances in Quantitative Structure Property Relationships Volume 2, pages 1-33. Copyright © 1999 by JAI Press Inc. All rights of reproduction in any form reserved. ISBN: 0-7623-0067-1 1
2 2 5 5 9 17 17 21 28 31 31 31
2
PHILIPS. MAGEE
I. INTRODUCTION TO ADSORPTION BINDING Binding between organic compounds and both simple and complex inorganic and organic polymers are common events in chemistry, biochemistry, medicine, and agriculture. Experimentally, each event is accompanied by a measurable heat of adsorption, equilibrium constant, or physical retardation, as in chromatography. Under a given set of conditions, these measures will depend intimately on some function of the intermolecular forces expressed by the interaction of the known molecular structure with the generally unknown surface structure(s). As molecular structure of the adsorbant varies, so also will the binding measure, and when these differences are expressed in energetic terms, the opportunity for mechanistic insight presents itself. Correlating the differences in binding energetics with parameters derived from molecular structure can be accomplished in many ways, perhaps too many ways. Only the careful selection of descriptors that clearly reveal the nature of the intermolecular forces (imf) have any chance of revealing the underlying mechanisms of the binding process. Successful correlations with mechanistic descriptors should identify and quantify the contribution of each imf and leave a residual that roughly matches the experimental error of the binding measure. Ideally, the information contained in the QSPR equation should reveal a plausible mechanism of binding for the organic structure and some information about the nature of the surface structure(s). Although correlation is never clearly causal, an examination of many different binding QSPR's can lead to a high degree of mechanistic confidence that can be applied to new experimental designs and the crafting of either stronger or weaker adsorbents. This review will explore directly measured heats of adsorption and equilibrium constants as well as retardation (R^) in chromatography where the energetics of adsorption are balanced against the solvation forces of the eluent. In all cases, the data from each source will be reduced to energetic differences and correlated with descriptors that clearly define the possible imf interactions. From these equations^ conclusions will be drawn or inferred about the intimate relations between the adsorbate and the polymeric surface.
II. THE MODELING OF INTERMOLECULAR BINDING FORCES Adsorption binding is complex and may express a full menu of intermolecular forces (imf). These range from simple ion-pairing to several forms of van der Waals forces: (1) dipole-dipole or Keesom interactions, (2) dipole-induced dipole or Debye interactions, and (3) induced dipole-induced dipole or London interactions.^ Beyond these interactions, which decrease progressively in specificity from ion pairing to London forces, most polar groups are capable of forming specific hydrogen bonds with suitably located donor and acceptor atoms.^ To confound the unraveling of the mechanism even more, each of these binding interactions is
Energetics of Binding
3
subject to attenuation by strategically located groups capable of exerting a steric effect.^ Assessing the weighted mix of these diverse forces is the goal of QSPR modeling, with the frequent benefit of illuminating the binding mechanism to the point of a good working hypothesis. Let us clarify once more that structure-property modeling is not unambiguously causal. The relations are probabalistic in nature and, depending on the set under study, may contain accidental colinearities that conceal or oversimplify the picture. However, consistency in correlation over a substantial number of sets can provide very strong insights that lead to virtual mechanistic certainty. There are a vast number of descriptors favored by various researchers and this needs to be simplified for the purpose of this chapter. As the author has analyzed many of the most interesting data sets, it seems appropriate to cast all of the studies into a consistent descriptor format to facilitate cross comparisons. When citing completed work by other authors, an effort will be made to translate their description into compatible terms. In most cases, alternate descriptors for particular effects are colinear and the information derived from the QSPR is equivalent. The descriptors used by the author to model many of the data sets in this review will now be described and justified. One of the most frequent forces expressed in binding events is the mutual polarization of complementary induced dipole moments (London forces). This interaction depends on polarizable volume and is well modeled by molar refraction (MR),'* Bondi's volume (Vw),^ or molecular surface area (Aw),^ all of which are highly colinear^ and can be considered in general as "bulk" descriptors.^ The choice of this author is MR for ease of calculation of both simple and complex structures and concordance with a major portion of the QSPR literature. Polarizable volume is considered to be a fundamental property at the foundation of most intimate binding events. In addition to London forces, it should also model Debye interactions when the dipole moment is induced in the organic structure by preexisting dipoles on the polymer surface. For nonpolar compounds, MR is colinear with the extrathermodynamic measure, log P(octanol/water), and commonly used to model nonspecific binding events, e.g. binding to bovine serum albumin.^ In data sets containing a preponderance of polar compounds, log P will frequently correlate better than MR and give the appearance of greater mechanistic meaning. Log P, however, is a highly composite and constitutive measure based on the difference in free energy of creating cavities in octanol saturated with water and water saturated with octanol. In addition to analyzable factors, such as polarizable volume, inductive effects and all forms of hydrogen bonding, ^^ conformational changes and complex entropic effects accompany the phase transfer. Far from being a passive "shake flask" descriptor as many believe, Guy and Honda have shown that the interfacial transfer of methyl nicotinate from water to octanol is accompanied by an activation energy of 10.3 kcal/mol,^^ which strongly suggests a mechanistic process. Nevertheless, log P can be a useful diagnostic when dealing with nonspecific binding to complex polymers such as
4
PHILIPS. MAGEE
organic soils. It has recently been found that factoring log P into lipophilic (PL) and hydrophilic (PH) descriptors (PL + PH = log P) can lead to better correlations and stronger insight when combined with other intermolecular descriptors.^^ Thus, many "simple" log P correlations reveal, on factoring, both preference for either PL or PH substructures and the need for inclusion of supplementary MR and hydrogen-bond descriptors to complete the picture. All of the dipolar interactions are subject to modification by the electronic properties of substructures and substituents. This applies particularly to Keesom and Debye interactions where dipoles are built into the substructures. Effects on London interactions would necessarily be weaker and possibly undetectable. In addition, both nucleophilic and electrophilic sites in adsorbed compounds, as well as potential sites for hydrogen bonding, are directly impacted by changes in substructure and substituents. A full range of electrical effect substituent constants as described by Charton,^^ Hansch and Leo,^"^ and Magee^ are appropriate for defining substituent effects in binding. Factoring of the composite sigma's into inductive and resonance contributions can provide further insight to the mechanism.^ One of the characteristic features of polar molecules binding to polar surfaces is the formation of one or more specific hydrogen bonds between acceptor and donor sites. Essentially electrostatic in nature, these bonds are flexibly directional and can contribute 2-10 kcal/mol to the total binding energy. Electron pairs on oxygen and nitrogen form the majority of H-bond acceptors, while donor groups are typically 0 - H , N-H, S-H and specially activated C-H bonds. There is more than one way to describe the hydrogen bond in QSAR and QSPR relationships with honest differences of opinion. They can be described as continuous strength parameters based on experimental equilibria measured in nonpolar solvents or as simple chemical potentials based on the count of acceptor pairs on N and O, and on the count of X-H donors. A tremendous effort has been made to support the continuous H-bond model, which has many distinguished adherents.^^'^^ However, both methods are effective in correlation and no definitive study has been made to determine which of the two approaches is more effective. The question of H-bond potential or H-bond strength remains unresolved at this writing. As the author has had unusual success with the simple hydrogen bond potentials,^^ this approach is used exclusively in the new work presented here. All of the intermolecular forces are additionally subject to modification by steric interactions. For the purpose of mechanistic description, there are two equivalent measures of the steric effect, namely Taft's Es^^ and Charton's upsilon (i)),^^ both of which are derived from the kinetics of acid esterification and the hydrolysis of acid derivatives. Charton's observation that symmetrical and spinning-top substituents (H, X, CH3, CX3) show a linear correlation between Es and the van der Waals radii ( r j has led to a convenient reseating of steric effects, with i) directly and positively scaled to van der Waals projection.^^ The projective size of substituents is visually clear and correlations with upsilon(\)) are substantially easier to interpret
Energetics of Binding
5
than with the negatively scaled Es. The use of MR as a steric descriptor for substituents is seriously flawed as MR is a volume function of rj. We can now summarize the descriptors used in this review in a general imf relation: Binding Energy = f(MR, log P, PL, PH, sigma's,!), HBA, HBD, Fs) Sigma's refer to composite and factored electronic effects, while Fs refer to the occasional use of indicator variables to reset the intercept for discontinuous changes in molecular structure. As most of the descriptors are on a free energy scale (RT log X), the experimental binding energy must be expressed on a related scale to maximize correlation and allow direct comparisons between QSPR's.
III. BINDING OF ORGANIC COMPOUNDS ON INORGANIC POLYMERS The binding of organic compounds to inorganic polymers where suitable data exist is limited mainly to clays, alumina, and silica. Moreover, the data derive from two principal sources, namely the direct measurement of heats of adsorption and the indirect measurement of binding energy through retention in adsorption chromatography. These measures are sufficiently different to require separate treatment. A. Heats of Adsorption on Clay^ Silica, and Alumina Clay Despite the broad uses of clays in catalytic and adsorptive processes, there are very few references to comparative adsorption studies, although a vast literature of experimental studies must reside in company files. In general, clays have alternating layers of hydrated alumina and silica with a broad assortment of Group lA and IIA cations to balance the negative charge of the layers. Intercalation and a variety of bonding mechanisms have been studied by FT infrared and by differential scanning calorimetry.^^ Intercalation is proposed for nitrobenzene and trichloroethylene, while phenol, 4-chloroaniline, and triethanolamine show clear evidence of hydrogen bonding. As will be seen from studies of silica and alumina, it is highly probable that these alternating layers in clay participate in additional modes of complex binding. Silica Pure silica has a great number of different crystalline and amorphous forms based on the tetrahedral Si04 unit that is internally constructed of polymeric Si-O-Si bonds. At the surface, however, these bonds are broken by varying degrees of hydration with silicon preserving a tetrahedral structure by forming single SiOH and ^^m-Si(0H)2 structures, the density of which depend on the degree of hydra-
6
PHILIPS. MAGEE
tion. Under vacuum, water is slowly removed in the following approximate stages: physically adsorbed water at temperatures up to 400 K, hydrogen-bonded water up to 500 K, hydrogen-bridged geminal silanols and vicinal silanols up to 900-1000 K, and isolated silanols up to 1000-1200 K.^^ Hydrogen-bonding opportunities for adsorbents to SiOH groups are generally classified as isolated, geminal, and vicinal. Moreover, the surface silanol groups appear to function exclusively as H-bond donors.^^ A series of simple compounds of varying shape, basicity, and dipole moment were selected for adsorption onto hydroxylated and dehydroxylated silica.^^ Heats of adsorption from hexane on silica were measured by microcalorimetry for n-butylamine, n-butanol, n-butyraldehyde, n-butyric acid, n-nitropropane, and DMF. The dominant descriptor on both silicas was bascicity (p^^ of adsorbent conjugate acid), with no measurable dependance on shape or dipole moment. Heats of adsorption were lower for the dehydroxylated silica and the weaker mechanism of Lewis acid binding is thought to replace silanol H-bonding. Binding of alcohols from the gas phase^^'^^ and from carbon tetrachloride^^ on silica has been studied by infrared spectroscopy to measure the weak H-bonding to surface silanol groups. Most of the alcohols studied showed evidence of reactive chemisorption by cleavage of Si-O-Si groups. The strongest H-bonds were formed between single alcohol molecules adsorbed on adjacent silanol groups. This type of double H-bridge has also been observed for methyl mercaptan on porous silica gel.^"^ In other binding studies on partially dehydroxylated silica, the geminal hydroxyl groups are shown to provide the most stable adsorption sites for ammonia and pyridine, both of which form weak hydrogen bonds.^^ The study of aromatic adsorbents provides additional detail as substituent effects provide insight to the details of the binding process. Spectroscopic shifts of adsorbed ester carbonyl groups show linear relations with p^^'^ of the parent acids in binding from carbon tetrachloride onto activated silica.^^ Of particular interest is the finding that silanol binding can be single to the carbonyl oxygen or the aromatic n system and, in the case of benzyl benzoate, both types of H-bonds are formed by adjacent silanol groups. Silica immersed in the heptane solutions of six substituted anisoles show H-bonding interactions between surface silanols with methoxy groups as well as ring n systems.^^ A plot of the H-bonded SiOH frequencies is linear with the Hammett sigma of the anisole substituents. In the same study, surface silanol groups were found to form hydrogen bonds to the nitro groups of 4-nitroanisole, nitrobenzene, and 4-nitrotoluene. In direct comparison with this study, the H-bond interactions of triphenylsilanol in carbon tetrachloride with the same set of substituted anisoles and nitro compounds show nearly identical behavior.^^ Substituent sigma plots of anisoles, nitrobenzenes, and other substituted benzenes with the shift in silanol frequency show very similar slopes for both surface and solution silanols, clearly dispelling the need for a special surface effect. Studies of simple aromatics on silica and chlorinated silica confirm n system
Energetics of Binding
7
adsorption binding and also detect H-bonding of silanol groups to fluorine in fluorobenzenes."^^ Phenols in heptane bind to silica through H-bonds formed by silanol donors to either the hydoxyl group electron pairs or the n system of the ring.^^ There is no evidence for phenol as an H-bond donor. The eleven phenols in this study show a complex plot of sigma against the SiOH frequency, reflecting changes in binding mechanism with the variation in substituents. For phenols with large 2,6-groups, the predominant mode of adsorption is silanol binding to the aromatic n system, with no evidence of steric hindrance for 2,6-di-Nbutylphenol. For pentamethylphenol, however, adsorption is exclusively through silanol binding to the hydroxygroup electron pairs. Alumina
Although the alumina surface is in some ways analogous to that of silica, there are significant differences. While silanol groups function only as H-bond donor groups, the surface oxygen groups on alumina have higher ionic character and can also function as H-bond acceptors.^^ In addition, alumina activated at 500 °C is also populated with incompletely coordinated aluminum ions with strong Lewis acid properties.'^^ The free energies of adsorption of 22 phenylarenes and 29 fused aromatic hydrocarbons in pentane on alumina (1.3% water) has been studied by Snyder.^^'^^ The data show that phenylarenes are all nonplanar in solution but tend to adsorb in the plane of the adsorbent surface. The adsorption energies correlate with the physical size (carbon count, minimum width, maximum length) of the phenylarenes.^^ An analogous relation holds for the fused aromatic hydrocarbons, the list of which extends from benzene to coronene. The author has correlated the free energies of adsorption of the first 15 hydrocarbons (benzene to perylene) with polarizable volume (MR) in order to relate size to energy in a more mechanistic relation. One outlier binding more strongly than predicted (naphthacene) was deleted. This simple correlation is suggestive of London forces due to polarizable volume with binding to Lewis acid sites. Phenylarenes: benzene, naphthalene, azulene, acenaphthalene, phenanthrene, anthracene, pyrene, fluoranthene, triphenylene, 1,2-benzanthracene, chrysene, naphthacene, 1,2- and 1,3-benzpyrene, and perylene. AF(kcal/mol) = -0.0849 MR(-22.42) + 0.238 (Student rvalue) n=l4
5 = 0.230
r^ = 0.977
F = 502.85
(1)
In Eq. 1 and subsequent equations, n indicates the number of data points in the set, s is the standard error from the regression line, r^ is the explained variance and Fis the Fisher statistic that measures the overall strength of the relation. The number
8
PHILIPS. MAGEE
in parentheses after the descriptor is the Student T value. It represents the ratio of the regression coefficient to the statistical error in the coefficient. Minimal values for significance are T = 2.00 and F = 4.00. In another revealing study by Snyder—the free energies of adsorption of 66 nitrogen compounds (pyridines, anilines, and pyrroles) in pentane—are measured on alumina (3.6% water).^^ Excellent linear plots with Hammett's sigma are obtained for non-orf/zo-substituted pyridines, quinolines, anilines, and indoles. All have negative slopes except the indoles, the slope of which is strongly positive. Snyder concludes that all but the indoles are binding by nucleophilic transfer. The indoles/pyrroles bind by proton transfer to the alumina surface (H-bonding). The largest and most diverse group are the 3,4-substituted pyridines and these were regressed by the author (Eq. 2, Table 1). This relation clearly shows that nucleophilic binding dominates with electron donor groups lowering the binding free energy. AF(kcal/mol) = -0.0480 MR (-2.74) + 2.06 a (12.18) -1.14 HB (-10.27)-5.44 n=14
5 = 0.185
r^ = 0.949
F = 62.47
(2)
Table 1. Adsorption Energies of Pyridines from Pentane on Alumina (3.6% Water), 24 °C MR
a
HB
-6.48 -6.06 -6.24
11.30 5.65 10.27
-0.24 -0.17
0 0 0
-6.84
-7.17
5.42
-0.16
1
3-Me
-5.99
-5.86
5.65
-0.07
0
Pyridines
AF, Kcal/mol
3,4-DiMe 4-Me 4-Et
-6.48 -6.05 -6.19
3-NH2
Yest^
-0.15
Pyridine
-5.57
-5.49
1.03
0.00
0
4-CI
-5.40
-5.26
6.03
0.23
0
3-Acetyl
-6.69
-6.34
11.18
0.38
1
3-Formyl
-6.27
-6.19
6.88
0.35
1
3-CI
^.94
-4.97
6.03
0.37
0
3-Br 4-CN
-5.07 -5.53 -5.74
8.88 6.33
0.39 0.66
0 1
3-CN
-5.02 -5.43 -5.74
6.33
0.56
1
3,5-DiCI
-4.29
-4.50
12.06
0.74
0
Note: ^ Equation2.
Energetics of Binding
9
However, other intermolecular forces are also at work with both London forces (MR) and the H-bonding of polar substituent groups (amino, acetyl, formyl, and cyano) making a contribution. The binding of phenols to an aluminum oxide surface was studied by HolmesFarley using a novel procedure.^'* Thermally evaporated aluminum deposited on clean glass slides was exposed to oxygen to generate an aluminum/aluminum oxide surface for adsorption of a broad selection of 2,3- and 4-substituted phenols in competition with acetic acid. Adsorption was measured by evaluating water contact angles. Plots of the binding constants (log l/K) against phenol pK^ were linear for the 3,4-substituents and roughly linear against substituent radius for the 2-substituents. Binding of the 3,4-substituted phenols gives a reasonable linear plot against pA'^ when three more strongly binding acids are included (acetic, benzoic, and 4-trifluoromethylbenzoic). These observations provide strong evidence for a binding event dominated by the formation of H-bonds. As binding increases with pK^, the phenols are acting as H-bond donors to aluminum oxide groups, in contrast to binding on silica. In comparative studies. Glass and Ross have explored the differences in binding of hydrogen sulfide, methanethiol, ethanethiol, and dimethyl sulfide on silica gels^^ and alumina.^^ Heat-treated silica gel (20 h each at 240, 550, and 700 °C) and heat-treated y-alumina (20 h at 700 °C) were used in the adsorption experiments. Limiting heats of adsorption were substantially greater for alumina than for silica, though both display the same order with A//(ads) increasing with methyl substitution (Al/Si): H2S (16.0/5.5 kcal/mol), MeSH (16.5/7.0), EtSH (18.4/10.0), Me2S (20.7/12.2). In each case, the data are consistent with donor H-bonds from AlOH and SiOH surface hydroxyl groups. The difference in binding strength is consistent with the greater acidity of the AlOH groups. B. Adsorption Chromatography on Silica and Alumina Adsorption chromatography is an indirect measure of binding energy, depending on the careful selection of eluent to separate the compounds from both the origin and the solvent front. The observed R^ (compound/solvent travel) can be cast as a relative binding energy by the following transformation:^^ Rj^ = l o g [ l / R f - l ] The resulting scale extends from diminishing positive values through zero at R^ = 0.5 to increasing negative values as R^ increases and approaches 1.0. The scale is therefore positively related to binding energy. Reproducibility is a major problem with adsorpdon chromatography as discussed by Dallas in reference to thin layer techniques.^^ Moreover, the error of observation increases as the spot nears either the origin or solvent front, a fact which can lead to unusual residuals in a QSPR analysis. Despite these problems, the data remain consistently analyzable with excellent results because the order of R^ and Rj^ are never in doubt. Whether or not
10
PHILIPS. MAGEE
two successive TLC plates are exact duplicates is irrelevant if the relative Rj^ values lead to correlations that differ only in the intercept. A large number of TLC studies have been analyzed by Magee in mechanistic QSPR terms.^^ In addition, Magee has found that rank transform regression on ranked Rj^'s and descriptors can be substantially stronger than regression on real values.^ This supports the concept that TLC order as visually observed is absolute, while the measured spot positions increase in error as they recede from the midpoint in either direction. Silica
The general theory for the correlation and prediction of R^ values in TLC has been extensively reviewed by Snyder, one of the pioneers of binding energetics for adsorption of organic compounds on silica and alumina."^^ Snyder was the first to relate R^ to an equilibrium distribution coefficient, K, and to model AT as a function of adsorbent, solvent, and solute properties. The parameter, 5^, is a dimensionless adsorption energy of the solute from pentane solution onto an adsorbent of standard activity. The value is positively scaled to increasing binding energy. It is a function only of the solute with respect to silica or alumina and can be calculated by additivity for a vast number of additional solutes from those experimentally measured. Although Snyder also considered other descriptors such as molecular area of the solute, A^, and the eluent strength of the solvent, it is ^ that largely determines the variation of R^ with solute structure. His descriptors and concepts have been widely used by others in systematizing TLC observations. An excellent example is the work of Vernin and Vernin in applying Snyder's theory to the linear adsorption chromatography of 100 thiazoles on silica and alumina."^^'"^^ They were able to separate polarization and steric effects in addition to demonstrating the additive nature of the interactions. Analysis of a diverse set of aromatic hydrocarbons developed on silica gel G with diisobutylene as eluent is reported by Magee (Eq. 3, Table 2)?^ Many of the aromatics are substituted with CI-CI 8 alkyl chains. The observed R^ values are transformed to the energy scaled Rj^ for correlation and found to cover a broad range from -0.45 to 0.39. The only reasonable descriptor for hydrocarbon adsorption is the polarizable volume, MR, which proved to be uncorrelated with Rj^ (r = 0.09). Factoring MR into contributions from the aromatic rings, MRAr, and the aliphatic side chains, MRAl, led to the dramatic discovery of opposing volume effects. Only the aromatic groups are bound to silica, while the aliphatic side chains are strongly repelled. A further improvement was realized with an indicator variable for aliphatic chains longer than C5, which could be reasonably assumed to lose contact with the silica surface through flexibility. The resulting correlation accommodates the entire set and clearly reveals the nature of the binding process. R^ = 0.0078 MRAr (8.48) - 0.0027 MRAl (-2.31) - 0.190 ICh (-3.36) - 0.205 n = 36
5 = 0.077
r2 = 0.910
F = 107.9
(3)
Energetics of Binding
11
Table 2, TLC of Aromatic Hydrocarbons on Silica Gel G (eluent = diisobutylene) Aromatic
MRAI
Ich
25.36
74.95
1.0
25.36
56.47
1.0
-0.30
25.36
37.99
1.0
-0.27
-0.27
25.36
28.75
1.0
-0.27
-0.11
23.30
30.81
0.0
HC
i^M
Yest^
n-Ci5-phenyl
-0.45
-0.40
n-Ci2-phenyl
-0.37
-0.35
n-Cs-phenyi
-0.35
n-Ce-phenyl 1,3/5-TriEt-phenyl
MRAr
Cycio-Ce-phenyi
-0.19
-0.27
25.36
26.89
1.0
1,2,4-TriMe-phenyl
-0.18
-0.07
23.30
16.95
0.0
1 -Pr-2,4,6-TriMe-phenyl
-0.12
-0.12
22.27
31.84
0.0
PentaMe-phenyl
-0.05
-0.12
21.24
28.25
0.0
Durene
-0.02
-0.09
22.27
22.60
0.0
0.02
-0.07
24.33
18.48
0.0
-0.27
-0.31
24.33
84.19
1.0
Naphthalene
0.00
0.12
41.80
0.00
0.0
Acenaphthene
0.03
0.19
49.70
0.00
0.0
1,4-DiMe-naphthalene
0.03
0.08
39.74
11.30
0.0
1,5-DiMe-naphthalene
0.03
0.08
39.74
11.30
0.0
TetraH-naphthalene 2-Ci8-naphthalene
2,4,6-TriMe-naphthalene
0.12
0.05
38.71
16.95
0.0
4-Me-di phenyl
0.12
0.17
49.73
5.65
0.0
2,3-DiMe-naphthalene
0.18
0.08
39.74
11.30
0.0
Diphenyl
0.18
0.19
50.76
0.00
0.0
Fluorene
0.21
0.17
48.70
4.62
0.0
Diphenylmethane
0.27
0.18
50.72
4.62
0.0
1 -Phenylnaphthalene
0.19
0.31
66.13
0.00
0.0
9-Methylanthracene
0.21
0.22
56.18
5.65
0.0
Anthracene
0.29
0.24
57.21
0.00
0.0
Phenanthrene
0.29
0.24
57.21
0.00
0.0
2-Phenylnaphthalene
0.31
0.31
66.13
0.00
0.0
2,3-Benzofluorene
0.39
0.29
64.07
4.62
0.0
1,4-Diphenylbenzene
0.39
0.39
75.09
0.00
0.0
Pyrene
0.31
0.30
64.08
0.00
0.0
Chrysene
0.33
0.37
72.87
0.00
0.0
2,2'-Dinaphthyl
0.41
0.44
81.54
0.00
0.0
1,2-Dihydronaphthalene
0.07
0.04
34.29
9.24
0.0
9,10-Dihydrophenanthrene
0.27
0.15
48.70
9.24
0.0
Fluoranthene
0.35
0.30
64.07
0.00
0.0
Perylene
0.39
0.42
79.48
0.00
0.0
Notes: ^Equations.
12
PHILIP S. MAGEE
Klemm and coworkers report the TLC study of nitroarenes on both silica and alumina with benzene as the eluent (Eqs. 4 and 5, Table 3).'*^ The author analyzes their set of 15 nitrobenzenes substituted with 1-4 methyl groups and with MeO and a second nitro group. The variation is not great, but 7/15 have ortho substitution with the possibility of steric effects. Descriptors for the study are the summation of substituent MR, a, D, and an indicator variable for the H-bond acceptor qualities of the MeO and NO2 groups, HBA. The data are colinear for both silica and alumina (r^ = 0.914) and both depend on the same descriptors. Strong positive binding through the nitro and methoxy acceptor groups by silanol and aluminol overcomes a small negative bulk effect. No steric or electronic effects are observed. The H-bonding is stronger on alumina, consistent with the more acidic AlOH groups, while the negative bulk effect is somewhat larger. Silica: R^ = -0.0162 E MR M.92) + 0.223 HBA (6.65) + 0.212 « = 14(1 outlier)
s = 0.061
r^ = 0.836
F = 27.98
(4)
Alumina: R^ = -0.0261 E MR (-5.38) + 0.348 HBA (7.23) - 0.116 Ai=15
5 = 0.091
r^ = 0.S5l
F=34.17
(5)
Table 3. TLC of Nitrobenzenes on Silica and Alumina (eluent = benzene), 28.3 °C Nitrobenzene 2,6-DiMe 2,4,6-TriMe 4-N02-2,3,5,6-Me4 2-Me 2,3-DiMe
Vest
^^^'
Vest
-0.017 -0.017 0.000 0.070 0.052
0.046 -0.029 0.239 0.120 0.046
-0.501 -0.477 -0.477 -0.368
-0.384
^^f^^'
"LMR
HB
-0.505 -0.094
10.27 14.89 25.84
-0.348
-0.263 -0.384
5.65 10.27
0 0 2 0 0
3-Me
0.105
0.120
-0.308
-0.263
5.65
0
4-Me
0.140
0.120
-0.231
-0.263
5.65
0
3,4-DiMe
0.140
0.046
-0.213
-0.384
10.27
0
3-N02-2-Me
0.176
0.464
-0.176
0.267
11.98
2
4-NO2 3.N02-4-Me
0.250
0.539
-0.052
0.388
7.36
2
0.308 0.327
0.464
0.000 0.017
0.267
11.98
2
0.388
0.035 0.087
0.375 0.388
7.36 7.87 7.36
2 2 2
0.140
0.375
7.87
2
3-NO2 4-MeO 2-NO2
0.410 0.288
0.539 0.531 0.539
2-MeO
0.550^
0.531
Notes: ^Equation 4. ^Equation 5. ^Outlier. Deleted from Eq. 4.
Energetics of Binding
13
Pyridines provide an interesting departure in adsorption behavior as modeled by Magee.^^'"^ A set of 25 pyridines developed on silica gel with acetone includes a number of 2- and 6-substitutions that permit analysis of the steric effect in addition to electronic and bulk effects. Binding is complex and both MR and 7i(from log P) are supported as bulk effects. The negative electronic and steric effects clearly identify nucleophilic binding by the pyridine nitrogen to either silanol hydroxyl groups or hypervalently to silicon. Hypervalent binding to silicon is suggested as H-bonding to SiOH groups would be less likely to show a significant steric effect. R^ = 0.00687 MR (2.36) - 0.139 n (7.03) - 0.583 a (10.36) - 0.222 v^^^ (4.09) - 0.0892 n = 25 5 = 0.0695
r^ = 0.884 F = 38.17
(6)
In the review article already cited,"*^ Snyder provides experimental ^ values for 29 pyridines (Eq. 7, Table 5). These values are positively scaled to binding energy. Half of the substituents, 15/29, are potential H-bonding groups and are coded HB = 1. This set is analyzed by the author, and confirms the strong electronic effect supporting nucleophilic binding by pyridines on silica. The four deletions from Snyder's set are 2- and 4-hydroxy and aminopyridines, which are not true pyridines. ^ = -2.66 a (-11.75) + 3.69 HB (24.98) + 7.58 n = 25(4 deleted)
5 = 0.360
r^ = 0.969
F = 339.1
(7)
Alumina The similarity of the silanol and aluminol surfaces is revealed in Eqs. 4 and 5 which reveal differences in binding energetics but not in the basic mechanism of binding. It is reasonable to infer that the binding events on one will be mirrored to a significant extent on the other. These analogies are especially evident in the large comparative study on 100 thiazoles by Vernin and Vernin, where they find parallel trends in the energy of binding to alumina and silica."*^ However, the sensitivity of the thiazoles to steric effects of alkyl groups is more important on alumina than silica, in accord with stronger and closer binding to the surface. Snyder has done extensive work on the retention volumes of mono- and polyhalo-substituted benzenes on slightly hydrated alumina (0.7% water) with pentane as an eluent."^^ In addition to substituent MR, steric effects were tested for adjacent halogens and the electronic effect (sum of sigma's) is referenced to the nearest hydrogen adjacent to the smallest halogen. Thus, the electronic effect of 1-fluoro2-chlorobenzene is the sigma sum of /?-fluoro and m-chloro as though the compound were l[//]-2-fluoro-3-chlorobenzene. This treatment was found by the author to be superior to methods reversing the positional effect of the halo groups. Bulk and electronic effects are strongly supported with no evidence of a steric effect.
14
PHILIPS. MAGEE Table 4. Substituted Pyridines Developed by Acetone on Silica Gel, TLC
Pyridine
f^M
Yest^
In
I.MR
2a
^2,6
2-Aceto
-0.35
-0.32
-0.55
14.59
0.50
0.50
3-Amino
0.09
0.21
-1.23
8.51
-0.16
0.0
2-Benzoyl
-0.39 -0.41
-0.46 -0.37
0.95
0.50
0.23
3-Bromo
-0.31
-0.34
0.86
34.30 11.97 11.97
0.43
2-Bromo
0.65 0.00
2-Chloro
-0.37
-0.34
0.71
9.12
-0.27
-0.31
0.71
9.12
0.23 0.37
0.55
3-Chloro 2,4-Dimethyl
0.02
-0.06
1.02
13.36
-0.34
0.52
2,6-Dimethyl
-0.16
1.02
13.36
-0.34
1.04
2-Ethyl
-0.21
-0.16 -0.17
1.02
13.36
-0.15
0.56
2-Fluoro 3-Hydroxy
-0.35
-0.13
0.14
0.27
-0.01
-0.67
4.01 5.94
0.06
-0.05 -0.37
0.12
0.00
-0.35
1.12
-0.07
-0.09
0.51
17.03 8.74
0.35 -0.17
0.00 0.52
3-Methyl 4-Methyl
-0.09
-0.05
0.51
8.74
-0.07
0.00
0.03
0.51
8.74
-0.17
0.00
2-n-Propyl
1.55
18.05
-0.13
0.68
Pyridine
-0.25 -0.07
0.00 -0.27
0.00
0.00
-0.31
4.12 9.97
0.00
2-Formyl(CHO)
-O.03 -0.27
0.42
0.50
3-Formyl 4-Formyl
-0.14 -0.10
-0.14 -0.17
9.97 9.97
0.35 0.42
0.00 0.00
0.09 0.19
0.00 0.10 0.10 -0.03
10.28 10.28 10.28 17.98
0.00 0.00 0.00 -0.51
0.53 0.00 0.00 0.52
3-lodo 2-Methyl
2-Hydroxymethyl 3-Hydroxymethyl 4-Hydroxymethyl 2,4,6-Tri methyl
0.19 -0.02
0.86
-0.65 -0.65 -0.65 -1.03 -1.03 -1.03 1.53
0.39
0.00
Note: ^Equation 6.
The level of correlation and the irregular pattern of the residuals suggest that other factors may be involved, perhaps H-bonding to the smaller halogen substituents. As log R increases with binding, the positive bulk effect is opposed by increasing electron withdrawal from the ring, consistent with binding to Lewis acid sites of this highly activated alumina. Halobenzenes: Mono-F, CI, Br, I and all combinations of 1,2-, 1,3-, and 1,4-disubstitution, 1,2,3- and 1,2,4-triCl, 1,2,4,5-tetraCl, 1,2-diCl-4-Br, 1,2diCl-4-I, 1,3,5-triBr, 1,2,4,5-tetraBr, hexaCl log R = 0.0376 Z MR (9.69) - 0.614 S a (-6.00) + 0.366
Energetics of Binding
15
Table 5, Adsorption Energies of Substituted Pyridines on Silica
^
Pyridine
Yest^
a
HB
Pyridine
7.7
7.6
0.00
2-Methyl
8.1
8.0
-0.17
0
3-Methyl
7.8
-0.07
0
4-Methyl
7.8 8.2
8.0
-0.17
0
2,4-Dimethyl
8.5
8.5
-0.34
0
2,6-Dimethyl
8.1
8.5
-0.34
0
2,4,6-Tri methyl
9.1
8.9
-0.51
0
2-Ethyl
8.0
8.0
-0.15
0
2-n-Propyl
7.5
7.9
-0.13
0
2-Hydroxy
12.4^
12.3
-0.37
3-Hydroxy
11.0
0.12
4-Hydroxy
10.8 15.2^
-0.37
2-Amino
10.9^
-0.66
3-Amino 4-Amino
11.3 12.9^
12.3 13.0 11.7 13.0
-0.66
2-Hydroxymethyl
11.7
11.3
0.00
3-Hydroxymethyl
12.1
11.3
0.00
4-Hydroxymethyl
12.7
11.3
0.00
9.5
10.2
0.42
2-Formyl(CHO)
0
-0.16
3-Formyl
10.2
10.3
0.35
4-Formyl
10.1
10.2
0.42
2-Aceto
9.8
9.9
0.50
10.3 6.4 6.5 6.5 6.9
10.1 11.1 7.0 7.0 6.6
0.43 0.06 0.23 0.23 0.37
0 0 0
7.0 7.0
6.5 6.6
0.39 0.35
0 0
2-Benzoyl 2-Fluoro 2-Chloro 2-Bromo 3-Chloro 3-Bromo 3-lodo Notes: 'Equation 7. ^Not included in Eq. 7.
n = 42
5 = 0.152
r^ = 0.711
F = 48.01
(8)
Another retention study by Snyder concerns substituted phenols adsorbing from 20% /-PrOH in pentane onto hydrated alumina (3.9% water)."^^ The set is small and log R depends only on the substituent sigma values. Note that this dependance is opposite in direction to that of the halobenzenes in Eq. 8 and strongly suggests that phenol acidity as an H-bond donor is responsible for most of the binding energy.
16
PHILIP S. MAGEE
The degree of alumina hydration suggests that the surface holds sufficient AlOH groups to provide H-bond acceptor sites. This work is consistent with HolmesFarley's study of phenols binding to oxidized aluminum surfaces.^"* In support of sigma as a single descriptor, the residuals closely approach a normal distribution. Phenols: phenol, 4-methyl, 3,4- and 3,5-dimethyl, 3- and 4-methoxy, 3- and 4-chloro, 3- and 4-aceto, 4-formyl, 4-nitro log R = 2.08 Z a (6.21)-h 0.464 n=l2
5 = 0.373
r2 = 0.794
F = 38.52
(9)
The TLC development of simple mono-, di-, and triaminoanthraquinones on alumina with 3:1 hexane/acetone is analyzed by the author.'^^ Descriptors tested are summations of 7i (from log P), MR, a, \) for 1,4,5,8-substituents, and an indicator variable for monoaminoanthraquinones. In this rather large set, the n values dominate over the simple bulk factor, MR. As 7i is a composite descriptor, additional factors such as H-bonding are imphed. The second most important factor is electronic with the positive coefficient suggesting enhancement of amino group donor bonds to alumina. The indicator variable for monoaminoanthraquinones was unexpected and may suggest a different binding geometry for this subset. No steric interactions were observed. Substitution pattern: position 1-H, NH2, CH3, position 2-H, NH2, CH3, Br, position 3- and 4-H, NH2, CH3, CI, Br, position 5- and 8-H, NH2, CI, position 6- and 7-H, NH2 R^ = -0.358 271 (-7.02) + 0.454 S-a (6.20) - 0.386 MONO (-3.61) + 0.118 n = 60
5 = 0.227
r^ = 0.191
F = 73.04
(10)
The TLC development of substituted anilines shows both complex bulk effects (MR, 71), steric hindrance of amino-group binding, and, most surprising, no significant electronic effect.'*^ These observations relate to 60 anilines developed on neutral activated alumina with benzene as the eluent. All positions are mono- and poly-substituted. Interpretation is difficult. If the steric effect is blocking nucleophilic or H-binding to aluminol sites as observed with phenols, then a strong electronic effect should modify the nitrogen electron pair or the acidity of the NH groups. The only suggestion of an electronic effect is the need for an indicator variable to accomodate para-substitution by nitro, aceto, and carbomethoxy groups. These and other substituent effects were not handled by a" (7= -0.23) despite the large values for these groups.
Energetics of Binding
17
Substitution pattern: position 2-H, CH3, CI, OH, OCH3, position 3-H, CH3, CI, Br, OH, acetamido, position 4-H, CH3, CI, Br, OCH3, NH2, NO2, phenyl, acetamido, aceto, COOCH3, position 5-H, CH3, CI, OCH3, position 6-H, CI R^ = - 0.480 Z 71 (-11.30) + 0.028 Z MR (4.06) - 0.360 \)2,6 (-3.69) + 0.239 IN02 (4.30) + 0.121 n=:60
5 = 0.201
^ = 0.790
F = 55.56
(11)
IV. BINDING OF ORGANIC COMPOUNDS TO ORGANIC POLYMERS A. Heats of Adsorption on Cellulose and Activated Carbon Cellulose The literature on binding of organic compounds to cellulose is strong in the area of paper and cellulose thin-layer chromatography, but very weak in direct binding studies. Some work has been stimulated by the need to understand the binding of vat dyes to cellulose fiber and some rather specialized descriptors have been developed by Giles and Hassan."*^ Although no regression was applied to the measured binding affinity (kcal/mol) of over 80 anthraquinone dyes to viscose rayon, plots against dye solubilities and the longest conjugate chain length were used to develop several conclusions. For high cellulose binding, dyes must have planar structures and long conjugate systems. Binding is enhanced by hydrogen bonding and this is inhibited in the presence of water. More recent work by Timofei and coworkers quantifies and refines the work of Giles and Hassan by regression analysis and related techniques.^^'^^ Their work on sets of 46 and 49 anthraquinone vat dyes clearly shows the presence of steric, electronic, and hydrophobic effects in the dyeing process. Hydrogen bonding by proton donor groups of the dye molecule is also important. The main structural feature, however, is the descriptor of Giles and Hassan (number of bonds of the conjugated chain along the main axis, r^ = 0.835). As this descriptor is roughly proportional to molecular size, the operation of London forces is strongly inferred. Although no one equation incorporates all of the findings of Timofei and coworkers, the considerable complexity and specificity of the dye adsorption process is revealed through a full range of mechanistic effects. Carbon Carbon is the ultimate degradation product of cellulose and the many woody natural products used in the manufacture of activated carbon via incomplete combustion. Unlike cellulose, the nature of the scientific data is the complete
18
PHILIPS. MAGEE
reverse in that the majority of information is found in direct adsorption studies, rather than indirectly through adsorption chromatography. While cellulose has a relatively uniform surface composed of repeating glucose units, activated carbon presents a much more complex surface of mixed aromatic and aliphatic structures in varying states of partial oxidation, depending on the biomass used and the conditions of incomplete combustion. Adsorptive binding from solution is potentially complex in the global sense as different classes of compounds might be expected to seek different structurally compatible binding sites. That the expected complexity does not emerge from analysis of the data is a mystery that still awaits future insight. Some of the expected surface complexity of activated carbon is revealed by studies on the irreversible adsorption of phenolic compounds by Grant and King.^^ The observation that phenolic compounds react on carbon surfaces (chemisorption) and are difficult to remove was related to oxidative coupling promoted by high pH and oxygen availability. While the role of carbon in the mechanism of oxidative coupling remains speculative, it is known that carbon can catalyze oxidation reactions. The situation is much less complex in reversible adsorption as group contribution methods appear to predict well for simple adsorbates^^ and these are supported by correlations based on a count of carbon, hydrogen, halogen, nitrogen and oxygen atoms.^"^ While these methods and correlations do not directly address mechanism, the implication is that polarizable volume (MR) is the dominant descriptor and this, in turn, implies adsorption by a nonspecific mechanism. Kamlet and coworkers have applied their experimentally evaluated solvatochromic parameters to the binding analysis of 37 simple aliphatic compounds (alcohols, aldehydes, amines, chlorocarbons, esters, ethers, and ketones) from aqueous solution onto activated carbon.^^ It is beyond the scope of this chapter to discuss the solvatochromic approach in detail. However, it encompasses a full mechanistic approach to intermolecular forces by including polarizable volume, dipolarity, and both types of scaled H-bonding descriptors (donor/acceptor). As such, the approach is well suited to describing a range of mechanistic contributions to any set of data based on kinetic or equilibrium measures. It has the disadvantage of being experimentally intensive and is most appropriate when the compound parameters are previously tabulated. In the present study, the partitioning between adsorbed and solution phases, log a, is found to correlate strongly (r^ = 0.949) with polarizable volume, dipolarity and the H-bond acceptor basicities of the adsorbates. This is of exceptional interest in showing the sensitivity of the carbon surface to both the dipolarity and polarizability of adsorbates as well as revealing the presence of H-bond donors that the authors evaluate as somewhat stronger than that of n-octanol. While not directly related to activated carbon. Grate and coworkers use the same approach to show the essential identity of vapor adsorption on graphite and fullerene surfaces for a diverse selection of aliphatic and aromatic compounds.^^ Of interest is the identity of descriptors for binding on graphite, pure fullerene, and on crude activated carbon.
Energetics of Binding
19
The Freundlich equation relates the amount of solute adsorbed (X mg/g of adsorbent) to the equilibrium solute concentration (C mg/1) through two adsorption constants (k and 1/AO as follows: logX=log/:+(l/A01ogC Abe and coworkers have shown a linear relation between (1/AO ^^^ log k (r^ = 0.947) for adsorption on activated carbon.^^ The same authors measured the adsorption of 15 simple alcohols from water onto three activated carbons with gready different pore size distribution (A = wood, B = coal, C = coconut shell) (Eqs. 12-14, Table 6).^^ Good linear relations were obtained between the Freundlich adsorption constant, log k, and the molecular connectivity index, % (r^ = 0.973). As the connectivity index is not directly interpretable, the data have been reanalyzed in terms of polarizable volume (MR) and two indicator variables for branching (IBRCH = 0, 1, 2) and for primary, secondary, and tertiary C-OH (lOH = 1 , 2 , 3). This treatment clearly shows the dominance of London forces (MR) and the negative contributions of both C-C and C - 0 branching. However, only Eq. 12 approaches the level of correlation shown by the single connectivity descriptor. log ^(A) = 0.122 MR (8.40) - 0.201 IBRCH (-3.19) - 0.256 lOH (^.96)-2.44
Table 6. Adsorption of Alcohols from Water onto Activated Carbons Adsorbate
Alcoliol
Log M^ LogkB^
Logkd
Chi
MR
IBRCH
lOH
1 -Butanol
-0.262
0.505
0.910
2A}4
19.51
0
1
2-Butanol
*
0.396
*
2.270
19.51
0
2
2-Me-l -Propanol
-0.600
0.439
0.609
2.270
19.51
1
1
2-Me-2-Propanol
-1.114
0.170
-0.013
2.000
19.51
1
3
1.021
1.408
1 -Pentanol 2-Pentanol
0.328 •
0.995
2.914
24.13
0
1
•
2.770
24.13
0
2
*
0.824
*
2.808
24.13
0
2
2-Me-1 -Butanol
0.025
0.953
1.228
2.808
24.13
1
1
3-Me-1-Butanol
0.188
0.981
1.241
2.770
24.13
1
1
2-Me-2-Butanol
-0.341
0.840
1.045
2.561
24.13
1
3
3-Me-2-Butanol
-0.074
0.678
0.983
2.643
24.13
1
2
2,2-DiMe-1 -Propanol
-0.301
0.564
0.703
2.561
24.13
2
1
Cyclopentanol
-0.356
0.671
0.754
2.394
22.07
0
2
1.770
3.414
28.75
0
1
1.185
2.894
0
2
3-Pentanol
1 -Hexanol
0.772
1.408
Cyclohexanol
0.117
0.899
Notes: ^ A = wood, Eq. 12. 2B = coal, Eq. 13. ^C = coconut shell, Eq. 14.
26.69
20
PHILIPS. MAGEE
n=l2
5 = 0.134
r2 = 0.943
F = 44.53
(12)
log k(B) = 0.0934 MR (7.66) - 0.121 IBRCH (-2.35) - 0.096 lOH (-2.08)-1.19 A2=15
5 = 0.123
r2 = 0.877
F = 26.25
(13)
log k(C) = 0.110 MR (5.08) - 0.201 IBRCH (-2.14) -0.209 lOH (-2.71)-1.12 n=l2
5 = 0.200
r2 = 0.857
F = 16.00
(14)
Studies by Abe and coworkers on complex adsorbents such as local anesthetics and saccharides have led to results of surprising simplicity, as alluded to in the introduction to this section. One is generally accustomed to seeing the complexity of a correlation increase with the complexity of molecular structure. In fitting seven local anesthetics to the Freundlich equation, they find a linear relationship between l/N and molecular weight.^^ The correlation with MR is slightly lower in quality (r^=0.914). As 1/A^is linear in log k for adsorption on carbon, London forces appear to dominate the binding process for these moderately complex drugs. Equally surprising is their study of 13 saccharides and 4 polyhydric alcohols.^^ The Freundlich constant, log k, correlates highly with the carbon and oxygen count and acceptably well with MR. There is no evidence for other significant descriptors that might imply a complex binding mechanism. Local anesthetics: procaine, lidocaine, tetracaine, dibucaine, mepivacaine, chloroprocaine, benzocaine l/N = -l,6S X 10-^MW (11.6) + 0.286 n=lO
r^ = 0.951
F = 135
(15)
Saccharides: D-(+)-xylose, D-(-)-arabinose, D-(-)-2-deoxyribose, D-(+)-glucose, D-(+)-mannose, D-(-)-fructose, D-(+)-galactose, L-(+)-rhamnose, amethyl-D-(+)-glucoside, a-methyl-D-(-)-mannoside, D-(+)-maltose, D-(+)-sucrose, D-(+)-lactose Polyhydric alcohols: glycerol, me^o-erythritol, D-xylitol, D-(-)-mannitol log k = 0.867 N^ (6.46) - 0.610 N^ (^.05) - 2.31 n=l7
5 = 0.232
r2 = 0.949
F = 129.6
(16)
Energetics of Binding
21
log k = 0.0572 MR (9.40) - 2.68 n=l7
5 = 0.378
r^ = 0.855
F = 88.26
(17)
B. Adsorption Chromatography on Cellulose and Paper
Paper chromatography was a highly developed art/technique long before thinlayer plates with powdered cellulose were available to simplify the procedure. Both adsorbents are predominantly cellulose with low amounts of additives to improve physical properties and one would expect similar results in relative performance. However, the longer tank times and migration distance of paper chromatography, with less control over lateral diffusion, suggests separate treatment from TLCcellulose studies. Accordingly, we treat the presumably more precise powdered cellulose studies before the older technique of whole paper chromatography. Powdered Cellulose
A dramatic demonstration of the difference between binding to inorganic and organic polymer surfaces is provided by Sawicki and coworkers.^^ Developing a set of 22 polynuclear ring-carbonyl compounds (fluorenone, coumarin, anthrone, indanone, etc.) on alumina with toluene and on cellulose with DMF-water (35:65 v/v), they observe a radically different sequence. This is, of course, consistent with the expectations of completely different mixes of intermolecular forces for the same compounds binding to very different polymeric surfaces. Powdered cellulose plates are used for much the same separations that gave paper chromatography special advantages, namely for the separation of polar compounds such as amines, acids, heterocyclics, steroids, and complex biochemicals such as nucleic acid derivatives.^^ The technique is especially effective in separating simple aliphatic acids and amino acids. By rescaling the R^ data into the binding energy related log form, R^ (see Section III.B), a QSPR analysis can be performed in mechanistic terms. One very interesting set of aliphatic acids with exceptional variation in structure has been analyzed by the author.^'^^ The set of 49 acids is composed of simple aliphatic structures with hydroxy, amino, halo, and mercapto substituents. Development on cellulose plates with diethylamine-n-butanol-water (1:85:14) resulted in an R^ range of 0.07-0.97. By regression of Rj^ against descriptors of the aliphatic group of RCOOH, an excellent correlation is obtained. The Zf is derived from the partial calculation of log P(octanol/water) and the negative dependance is expected as cellulose is hydrophilic and would repel lipophilic structure. The electronic effect, Saj, is that delivered to the a position of the acid to modify the acidity of RCOOH. The HB descriptor is a simple count of both types of H-bonding by substituents on RCOOH. While H-bonding is expected to the OH groups of cellulose, the effect is weak and of the wrong sign. The indicator variables, lOH and INH2, suggest special behavior for hydroxy- and amino acids beyond that of
22
PHILIPS. MAGEE
the log P and inductive contributions. It seems highly probable that HB, lOH, and INH2 are strongly confounded in an accidental correlation despite the size of the set. In brief, this is a classic example of a statistical disaster that might have gone unnoticed but for the incorrect sign of the HB descriptor. Aliphatic acids: C2-C10 RCOOH with substituents - C2(a): F, CI, Br, I, CH3, OH, SH, NH2, C3(P): CI, CH3, OH, NH2, C4(Y): CH3, OH, NH2
R^ = - 0.367 Sf (-23.24) - 0.104 Z QJ (-2.69) - 0.392 HB (-5.62) + 0.265 lOH (3.74) + 0.811 INH2 (10.86) + 0.333 n = 49
5 = 0.096
r^ = 0.913
F=314
(18)
This set is of sufficient interest to analyze by an alternate method developed by the author.^"^ In this approach, a hypermolecule is developed and each position is analyzed by using positional descriptors to describe lipophilicity, f (from partial log P) and the electronic effect, %, for electronegativity. The atomic electronegativity is known to be directly related to atomic sigma charge and the inductive effect.^"^ Five of the longer alkanoic acids were too branched to accommodate and were deleted from the set. It was further necessary to combine some positions into small regions to have statistically significant loading of the matrix. Positions PI, P3, P4, and P6 are sufficiendy populated to retain their identity, but P22, P55, and P789 have been merged to define small regions. All positions and small regions were tested for lipophilicity (f), charge (%), and H-bonding (HB). As the analysis is positional, no special indicators were tested for hydroxy or amino acids. The result is strikingly different from that achieved in Eq. 18. While Eq. 19 is somewhat weaker statistically, it is far more credible. The lipophilic interactions are complex with P6 showing an unexpected positive slope, while P3 and P5 respond as expected. The electronic effect is quite interesting in being distributed over all positions rather than localized near the COOH group to influence acidity. This suggests that dipolar binding is significant at nearly every position regardless of distance from the COOH group. Finally, the H-bond term is not only strong as expected but positive as demanded by bonding to a hydroxy lie surface. While Eq. 18 is not entirely incorrect, positional analysis provides a more incisive measure of mechanistic detail. Same RCOOH less 5 deletions. Positional diagram: 8 6 1 5.5.4.3.C-COOH 9 7 2 2 RM = -0.177 f3 (-3.02) - 0.156 f5 (-3.21) + 0.394 f6 (2.79)
Energetics of Binding
23
- 0.0619 xl (-2.65) - 0.0933 x2 (-4.54) -0.140 x4 (-4.83) - 0.125 x6 (-2.57) - 0.0663 x789 (-2.45) + 0.934 H B (12.97)+ 0.0190 n = 44
5 = 0.176
A^ = 0.920
F = 43.39
(19)
The migrating species of an amino acid can be strongly affected by the basicity or acidity of the developing solvent and consequently alter the chromatographic pattern and the binding mechanism. A set of 38 amino acids with an exceptional range of structure (keto, carboxylic acid, carboxamide, mercapto, amino, thio, imidazole, hydroxy, sulfonyl, and aromatic substituents) was developed on cellulose plates with a basic eluent (/z-butanol-acetone-diethylamine-water [10:10:2:5]) and with an acidic eluent (isopropanol-formic acid-water [20:1:5]).^^ This set provides a unique opportunity to compare binding of amino acids to cellulose under both protonation and deprotonation conditions. The results are dramatically different. The descriptors tested are 2f (excluding the amino and carboxylic group), MR (same basis), and the combined count of both types of H-bonds (HB). Equation 20 shows the set developed with the basic eluent. Neither MR nor H-bonding have any significance in this relation which depends only on the calculated lipophilicity. Several outliers were detected and deleted. Four of the five outliers are basic (amino[3] and imidazole); the other is the only mercaptoamino acid. The correlation is of acceptable strength for a set of this diversity and the residuals approach a symmetrical distribution with central tendency, suggesting that only experimental error remains. Amino acids: a- and P-alanine, a-, P- and y-butyric and isobutyric acids, e-aminocaproic acid, a,Y-diaminobutyric acid, aspartic acid, citrulline, glutamic acid, glutamine, glycine, histidine, P-hydroxyglutamic acid, hydroxyproline, P-hydroxyvaline, leucine, isoleucine, lysine, methionine, methionine sulfone, norleucine, norvaline, ornithine, a-phenylalanine, aphenylglycine, proline, sarcosine, serine, threonine, tryptophan, tyrosine, valine R^ = - 0.269 Sf (-11.59) + 0.196 n = 33
5 = 0.186
r^ = 0M2
F = 134.3
(20)
The simplicity of this relation suggests that amino acid anions somehow inhibit the formation of hydrogen bonds between substituents and cellulose, perhaps by engaging in intramolecular H-bonds with the carboxylate group. Developed under acidic conditions, the same set displays a radically different binding mechanism as shown in Eq. 21. As in Eq. 20, five outliers were detected and deleted. Only one outlier was common to each eluent, namely, the mercapto-amino acid. The others
24
PHILIPS. MAGEE
were two terminal carboxylic acid groups, one keto- and one hydroxy-substituent. The correlation is dominated by the bulk descriptor, MR, and by H-bonding to cellulose (HB). The lipophilicity descriptor, Zf, has no significance. It is interesting to note that the bulk descriptor has a negative coefficient while that of HB is positive, as expected. This binding of the side-chain polar substituents is, of course, supplementary to that of the amino and carboxylic acid groups which are assumed to provide a constant binding energy through relatively strong H-bonds to cellulose. The correlation indicates that binding of the polar side chains is controlled by H-bonds to cellulose hydroxyl groups in opposition to the repulsion of the predominantly hydrocarbyl structures. One possibility for the difference in binding mechanism may be attributed to the additional strength of the neutral carboxylic acid bond to cellulose. The strength of this bond may force the side chain into closer contact with the cellulose surface to effect specific interactions unavailable to the amino acid anion. The equation is substantially weaker than Eq. 20, but displays a similar distribution of residuals with central tendency. Same amino acids as Eq. 20: RM = -0.0279 MR (-4.99) + 0.167 HB (6.82) + 1.05 n = 33
5 = 0.269
r^ = 0.665
F = 29.72
(21)
Paper
The use of paper for chromatography has a longer history than that of powdered cellulose plates. It also differs in the preparation process in that powdered cellulose has suffered more physical abuse than cellulose papers. In the following examples, we begin with the chromatography of aliphatic acids followed by studies of substituted 2-amino-l-alkanols and simply substituted chloro- and alkylphenols. Unfortunately, there is no way to obtain a direct comparison between related sets run on paper and powdered cellulose. The response of the descriptors provides the best evidence that binding in each case is essentially to cellulose and not to impurities therein. To minimize the variance, each of these studies was developed on the same paper, Whatman No. 1. An interesting small set of diversely structured mono-, di- and tricarboxylic acids was developed on paper with an acidic eluent (phenol-water-formic acid [75:25:1 v/v]).^^'^^ With only 15 members, no more than 3 descriptors can be used to correlate the set (Eq. 22, Table 7). Chosen for testing were MR and Ef of the aliphatic non-carboxylic structure. As each carboxylic acid was expected to bind strongly, two indicator variables, 12 and 13, were used to distinguish the di- and tricarboxylic acids from the singly binding monoacids. In agreement with Eq. 20, Ef (partial log P) proved to be substantially stronger than the simple bulk descriptor, MR. In addition, both indicator variables show strong positive coefficients, supporting
Energetics of Binding
25
additional H-bonding by each carboxylic acid group. It is interesting that the coefficient for 13 is substantially larger than for 12. RM = -0.308 Xf (-6.46) + 0.753 12 (5.32) + 1.13 13 (6.32) - 0.650 n=l5
^ = 0.231
r^ = 0.872
F = 24.98
(22)
The correlation is robust and the negative coefficient of Zf is similar to that of Eq. 20 for a much different set of acids developed on powdered cellulose. The term is basically repulsive for forcing lipophilic structures onto a hydrophilic surface. Substituted 2-amino-l-alkanols were developed on paper with n-butanol saturated with 0.1% ammonium hydroxide (Eqs. 23 and 24, Table 8).^^ The set is small but of exceptional structural variation. Descriptors selected for analysis are the MR and 71 values (aromatic partial log P) of the substituents, several of which were estimated as the groups are quite unusual (guanidylpropyl, imidazolemethyl, 4-hydroxyphenyl methyl, etc.). Binding of the 2-amino and 1-hydroxy groups is expected to be strong and to dominate orientation on the cellulose surface. As these associations are constant for all members, the analysis concerns the secondary effect on binding of the residual structure. In addition, indicator variables for aromatic structure and the capacity for forming additional H-bonds were tested. Consistent with the binding of other aliphatic structures to cellulose, the lipophilic descriptor, 71, correlates with much greater strength than the bulk descriptor, MR. Plotting
Table 7. Chromatography of Mono-, Di- and Tricarboxylic Adds on Whatman No. 1 (eluent = phenol:water:formlc add) Aliphatic Acid Aconitic Adipic Citric Fumaric Glutaric Glycol ic
'^M
Lactic
0.25 -0.79 0.45 -0.23 -0.55 -0.16 -0.41
Levulinic
-1.00
Yest^
If
12
0.21 -0.71 0.52
0.87
0 1 0 1 1
-0.03 -0.51
2.64 -0.12 0.44
13 1 0 1 0 0 0
-0.55
1.98 -0.98 -0.32
0
0
-0.75
0.31
0
0 0
-0.35
0
Malic
0.14
0.31
-0.55
1
Malonic
0.03
-0.10
0.66
1
0
Oxalic
0.66
0.10
0.00
1
0
Succinic
-0.29
-0.31
1.32
1
0
Syringic
-1.28
-1.20
1.78
0
0
Tartaric
0.63
0.85
-2.43
1
0
-0.03
-0.06
1.75
0
1
Tricarballylic Note: ^Equation 22.
26
PHILIPS. MAGEE
indicated curvature and additional strength is gained in the parabolic correlation. As 71 is colinear with Zf, the magnitude of the negative coefficient is in perfect agreement with that of other sets of mainly hydrocarbyl groups binding to cellulose (Eqs. 18, 20-22). Forced binding of aliphatic structure to cellulose is clearly repulsive in nature. Rj^ =-0.237 71 (-5.97)+ 0.421 n=l5
5 = 0.260
r^ = 0.733
F = 35.68
(23)
RM = -0.30171 (-6.72) - 0.0428 n^ (-2.26) + 0.549 n=l5
5 = 0.227
r2 = 0.813
F = 26.03
(24)
Data for 22 multiply substituted phenols developed on paper with xylene saturated with formamide were analyzed (Eqs. 25 and 26, Table 9).^^'^^ Descriptors tested were MR, n, and Hammett's a summed over all the substituents. Due to the simple nature of the substituents (CI, CH3, C2H5), there is a natural colinearity between MR and n(r = 0.991) that makes precise selection of the key descriptor difficult. In consistence with other cellulose and paper correlations, n is selected over MR. For 2,6-substituted phenols, Charton's upsilon(\)) is selected to describe
Table 8, Chromatography of 2-Amino-1 -Alcohols on Whatman No. 1 (eluent = n-butanol-0.1% ammonium hydroxide) Aminoalcohol
^M
Alaninol
0.52
Argininol Aspartidol
1.12 0.52
Ethanolamine
0.75
0.60 0.42
Glutamidiol
0.52
Histidinol
71
n'
0.38
0.51
1.08 0.76
-3.80 -0.77
0.26 14.44
0.55
0.00
0.00
0.48
0.62
-0.26
0.07
0.87
0.32
0.41
0.18
Isoleucinol
-0.23
-0.01
-0.14
0.43 1.82
Leucinol Lysinol
-0.23 1.12
-0.02
-0.16
1.87
3.50
1.19
1.08
-3.23
10.43
Phenylalaninol
4.04
Yest^
Yest^
0.30 1.32
0.59
3.31
-0.37
-0.05
-0.23
2.01
Prolinol
0.45
0.14
0.13
1.20
1.44
Serinol
0.75
0.67
0.81
-1.03
1.06
Threoninol
0.35
0.54
0.69
-0.52
0.27
-0.07
0.10
0.07
1.34
1.80
0.02
0.09
0.04
1.40
1.96
Tyros i no! Valinol Notes: ^Equation 23. ^Equation 24.
Energetics of Binding
27
Table 9. Chromatography of Substituted Phenols on Whatman No. 1 (eluent = xylene saturated with formamide) Phenol
Yest^
Zn
^2,6
-0.087
-0.120
-0.288
-0.351
1.22 1.22
0.00 0.52
'^M
3-Me-4-Chloro 2-Me-4-Chloro 3-Me-6-Chloro 2-Me-6-Chloro2
-0.432
-0.365
1.22
-0.908
-0.596
1.22
0.55 1.07
2,3-DiMe-4-Chloro
-0.432
-0.566
1.73
0.52
2,5-DiMe-4-Chloro
-0.501
-0.566
1.73
0.52
3,5-DiMe-4-Chloro
-0.410
-0.335
1.73
0.00
3,4-DiMe-6-Chloro
-0.575
-0.580
1.73
0.55
3-Me-5-Et-4-Chloro
-0.630
-0.550
2.24
0.00
3-Methyl
0.288
0.180
0.51
0.00
2-Methyl
0.000
-0.052 -0.664
0.51
0.52
1.93
-0.896 -0.871
1.93 2.42 2.42
0.55 1.07
3-Me-4,6-Dichloro
-0.720
2-Me-4,6-Dichloro 3,5-DiMe-2,4-Dichloro
-1.005
3,4-DiMe-2,6-Dichloro
-1.005 -1.061 -1.061
0.55 1.10
-0.213
-1.116 -1.095 -0.267
2,5-Dimethyl
-0.176
-0.267
1.02
0.52
3,4-Dimethyl
0.105
-0.035
1.02
0.00
3-Me-5-Et-2,4-Dichloro 2,3-Dimethyl
2.95 1.02
0.55 0.52
3,5-Dimethyl
-0.158
-0.035
1.02
0.00
3-Methyl-5-Ethyl
-0.368
-0.251
1.53
0.00
3-Methyl-2,4,6-Trichloro
-0.954
-1.209
2.64
1.10
Notes: ^Equation 25. ^Deleted from Eq. 26.
potential effects on phenolic H-bonding to cellulose. This excellent correlation again supports repulsive binding for lipophilic substituents and presumably for the phenyl ring as well. In addition, the 2,6-steric effect clearly identifies the phenolic group H-bond to cellulose as the primary binding mechanism. It is unfortunate that electronic support for this mode of binding was not significant for the limited selection of substituents in this set. Deletion of one outlier, 2-methyl-6-chlorophenol leads to a significant improvement in statistical strength but without change in interpretation. R^ = -0.422 171 (-9.12) - 0.445 1)2,5 ("^-^S) + 0.395 n = 22
5 = 0.129
^ = 0.906
F = 91.30
(25)
28
PHILIPS. MAGEE
RM = -0.463 E 71 (-12.03) - 0.337 1)2,5 M-'79) + 0.429 n = 2l
5 = 0.102
r^ = 0.941
F = 142.6
(26)
V. BINDING OF ORGANIC COMPOUNDS ON BIOORGANIC POLYMERS The binding of pesticides and ordinary organic chemicals to organic soils is a necessary field of research for understanding the complex process of soil binding and release in the application of chemicals to solve agricultural problems. Excellent experimental work has been performed and the physical chemistry of soils is well documented.^^ Measured values of soil/water partitioning, K(OMAV), are corrected for the organic matter (OM) content on the reasonable assumption that nonactivated sand/clay will have little affinity for binding organic chemicals. There are some exceptions, such as the strong ionic binding of paraquat dication to clay, but such cases are rare. The usual treatment of K(OMAV) is that of a simple partitioning event consistent with high log P(o/w) correlafions. The inaccuracy of this treatment was demonstrated by Magee through the application of log P factoring (see Section 11).^^ It is useful to review some of this work as a special extension of binding to organic polymers. In some interesting work by Briggs, 21 commercial pesticides were chromatographed on thin-layer plates composed of finely divided soil (Eqs. 27 and 28, Table J Q>^ 12,70 jYiQ Rj^ values correlate flawlessly with measured log P values and the factoring of log P does not reveal any additional information. The coefficients of PL and PH are nearly identical with that of log P. Note also that neither 5* nor r^ have changed and that F is simply halved due to the addition of a second descriptor. This is, in fact, a perfect example of a verified log P relation and of the harmlessness of factoring. It is also an excellent example of the effect of grinding the complex humic acid structures in the soil organic matter. The situation with physically intact organic matter is quite different. R^ = 0.522 log P (21.52) - 0.943 n = 21
^ = 0.109
r^ = 0.960
F = 463.1
(27)
R^ = 0.502 PL (15.56) + 0.531 PH (20.43) - 0.837 n = 2l
5 = 0.110
^ = 0.960
F = 230.9
(28)
A smaller set of 14 pesticides was measured in equilibrium with whole soil and water by Briggs^^ and factored by Magee (Eqs. 29 and 30, Table 11).^^ Correlation with log P is again satisfactory, but factoring now shows substantial improvement with selectivity for the hydrophilic substructures. Note that s and r^ are enhanced and F greatiy exceeds one half of the unfactored F, It is also interesting to note how
Energetics of Binding
29
Table 10, Thin-Layer Chromatography of Commercial Pesticides on Finely Divided Soil Pesticide Cycloheximide Oxycarboxin Fenuron Monuron Simazine Pyrazon
'^M
Yest^
LogP
PL
PH
-0.908 -0.432
-0.712 -0.482
0.55 0.90
5.79 4.27
-5.24 -3.37
-0.348
-0.421
0.96
3.25
0.035 0.087
0.017
1.84
-2.29 -2.41
0.041
1.85
4.25 3.62
1.50
3.46
2.35 2.36
3.23
-0.78
3.81
-1.45 -2.06
0.105
Captan
0.194
-0.140 0.371
Carbaryl
0.213
0.306
-1.77 -1.96
Picloram(Me ester)
0.269
0.259
2.30
Metobromuron
0.348
2.38
4.36 4.54
2,4-Dichlorophenol Diuron
0.368 0.454
0.296 0.557
2.80
3.24
-0.44
0.471
2.74
5.10
-2.36
Amiben(Me ester) Propanil
0.477
0.535
2.80
4.00
-1.20
0.501
0.516
4.63
-1.83
3,4-Dichloroaniline Linuron
0.550
0.530
2.80 2.78
3.78
-1.00
0.689
0.598
5.10
-2.12
Chlorbromuron
0.788
0.695
2.98 3.17
5.25
-2.08
Fenac(Me ester) Chloroxuron
1.005
1.028
3.80
5.29
-1.49
1.005
1.028
3.85
6.21
-2.36
Pentanochlor
1.061
0.978
3.70
5.21
-1.51
Fluorodifen
1.380
1.370
4.40
4.46
-0.06
-2.16
Note: ^ Equations 27 and 28. Vest from Eq. 27.
closely the log P coefficient of this equilibrium measure agrees with that of the thin-layer procedure (Eq. 27). log K(OMAV) = 0.557 log P (14.57) + 0.525 n=14
^ = 0.239
r^ = 0.947
F = 212.2
(29)
log K(OMAV) = 0.521 PL (15.05) + 0.640 PH (14.17) + 0.831 Az=14
5 = 0.197
r^ = 0.966
F = 158.5
(30)
While Eq. 30 is indicative of additional mechanism other than simple partitioning, the set is too small to define any specific effects beyond the imbalance of PL and PH. For this purpose, we are fortunate to have a major study by Sabljic on the soil adsorpfion coefficients of 128 polar compounds.^^ The collecfion is extremely diverse with anilines, nitrobenzenes, acetanilides, ureas, and carbamates in addition
30
PHILIPS. MAGEE Table 11. Distribution of Pesticides Between Soil Organic iMatter and Water LogK(OM/W)^
Pesticide Dimethoate
0.72
Aldicarb
LogP
PL
PH
0.79 1.57
3.33
-2.54
3.76
-2.19 -1.77
Simazine
1.39 1.44
1.85
3.62
Carbaryl
1.78
2.32
3.77
-1.45
Captan
2.06
2.54
3.32
-0.78
Diazinon
2.12 2.23
3.49
5.98
-2.49
3.17
7.24
-4.07
Chlorfenvinphos Fenamiphos
2.28
3.18
6.20
-3.02
Phorate
2.58
3.59
5.14
-1.55
Parathion
2.78
3.93
4.39
-0.46
Folpet
3.03
3.63
2.70
Captafol Dieldrin
3.08 3.87
3.83 6.2
4.61 7.74
0.93 -0.78 -1.54
Aldrin
4.45
7.4
7.40
0.00
Note:
^Corrected for sand content.
to 56 commercial pesticides. This set already deviates substantially from simple partitioning as found by Magee in Eq. 31.^^ This equation is then subjected to log P factoring as shown in Eq. 32. Polar compounds: 56 pesticides, 32 arylureas, 14 acetanilides, 8 anilines, 7 N-phenylcarbamates, 6-nitrobenzenes, 5 miscellaneous compounds log K(OMAV) = 0.365 log P (10.00) + 0.0175 MR (5.95) -0.385 HBD (4.99)+ 0.513 n=128
5 = 0.276
r2 = 0.874
F = 288.6
(31)
log K(OMAV) = 0.256 PL (5.31) + 0.401 PH (10.95) + 0.0257 MR (6.84) - 0.386 HBD (4.96) + 0.542 n=l2S
s = 0.265
1^ = 0.886
F =231.5
(32)
There are significant improvements in s, r^ and F (expected value = 216.4) along with a clear demonstration of selection for hydrophilic substructures. Hydrogenbond acceptors (HBA) also appear to play a role, but were just under statistical significance (T = 1.87). Equation 32 clearly shows the mechanistic complexities of the binding of polar compounds to complex soil structures and should serve to eliminate the oversimplified concept of passive partitioning. From a statistical
Energetics of Binding
31
viewpoint, a set of this size provides the additional opportunity to deduce completeness from the residual pattern. In this case, the residual distribution is a perfectly symmetrical gaussian, revealing that all significant information has been extracted.
VI. CONCLUSIONS All of the studies reviewed in this chapter, many of which are previously unpublished, have one thing in common. Each of the binding events can be described in mechanistic terms without compromising the quality of the correlation. No additional descriptors are necessary to account for the bulk of the experimental variance. In order to account for the energetics of binding to inorganic, organic, and bioorganic polymers, nothing more than descriptors modeling known intermolecular forces is required. Within the full range of examples presented, nearly every known imf: dispersion forces, electronic and steric effects, and both common types of H-bonding (HBA, HBD) have played critical roles in dissecting the energetics of each event. Even the complex descriptor, log P(o/w), can be made to show structural selectivity by various surfaces, although the effects are still composite. It now seems safe to state that any binding event for both related and unrelated compounds can now be analyzed in mechanistic terms, providing the data are well measured and the compound set is of sufficient size and diversity. While the choice of descriptors will change over time to reflect scientific advances, the key to consistency will always be selection of the best current descriptors that model each of the known intermolecular forces.
NOTE The author recognizes that a few readers may have sufficient interest in the raw data and descriptors to wish to repeat the work or perform a variation on it. The tables included in the text (Tables 1-11) are those of manageable size (n - 14-36). The tables for Equations 8, 10, 11, 18, 19, 31 and 32 have not been included due to excessive size in length or breadth {n = 38-128). Any or all are available from the author by simple request.
REFERENCES 1. Israelachvili, J. N. Intermolecular and Surface Forces', Academic Press: London, 1985, pp 45-85. 2. Smith, D. A., Ed., Modeling the Hydrogen Bond; American Chemical Society: Washington, DC, 1994. 3. Newman, M. S., Ed. Steric Effects in Organic Chemistry, John Wiley & Sons: New York, 1956. 4. Martin, Y. C. Quantitative Drug Design] Marcel Dekker: New York, 1978, pp 80-81. 5. Bondi, J. Phys. Chem. 1964, 68,441-451. 6. Moriguchi, I.; Kanada, V.; Komatsu, K. Chem. Pharm. Bull. 1976,24, 1799-1806. 7. Magee, P. S. In Rational Approaches to Structure, Activity, and Ecotoxicology of Agrochemicals; Draber, W; Fujita, T., Eds.; CRC Press: Boca Raton, PL 1992, pp 79-101. 8. Charton, M.; Charton, B. I. J. Org. Chem. 1979,44, 2284-2288. 9. Vandenbelt, J. M.; Hansch, C ; Church, C. J. Med Chem. 1972, 75,787-789.
32
PHILIPS. MAGEE
10. Charton, M.; Charton, B. J. Theor Biol. 1982, 99, 629-644. 11. Guy, R. H.; Honda, D. H. Int. J. Pharm. 1984, 79, 129-137. 12. Magee, P. S. In QSAR in Environmental Toxicology-IV; Hermens, J. L. M.; Opperhuizen, A., Eds.; Elsevier: Amsterdam, 1991, pp 155-178. 13. Charton, M. In Advances in Quantitative Structure—Property Relationships, Charton, M., Ed.; JAI Press: Greenwich, CT, 1996, pp 171-219. 14. Hansch, C ; Leo, A. Exploring QSAR; American Chemical Society: Washington, DC, 1995, Chaps. 1-2. 15. Kamlet, M. J.; Abboud, J.-L. M.; Abraham, M. H.; Taft, R. W. J. Org. Chem. 1983,48,2877-2887. 16. Raevsky, O. A.; Grigor'ev, V. Yu.; Kireev, D. B.; Zefirov, N. S. Quant. Struct.—Act. Relat. 1992, 77,49-63. 17. Reference 3, Chap. 13, pp 556-675. 18. Charton, M. In Topics in Current Chemistry. Charton, M.; Motoc, I., Eds.; Springer: Berlin, 1983, pp 57-91. 19. Charton, M. J. Am. Chem. Soc. 1969, 91, 615-618. 20. Gibbons, J. J.; Soundararajan, R. American Laboratory 1988, July, 38-46. 21. Jednacak-Biscan, J.; Cukman, D. Colloids and Surfaces 1989,41, 87-95. 22. Jeziorowski, H.; Knozinger, H.; Meye, W.; Muller, H. D. J. Chem. Soc, Faraday Trans. 11973, 69, 1744-1758. 23. Acosta Saracual, A. R.; Pulton, S. K.; Vicary, G. J. Chem. Soc, Faraday Trans. I 1982, 78, 2285-2296. 24. Meyer, C ; Bastick, J. Bull. Soc Chim. Fr 1978, 9-70, 359-362. 25. Hirva, R; Kakkanen, T. A. Surface Sci. 1992, 277, 530-538. 26. Cross, S. N. W.; Rochester, C. H. J. Chem. Soc, Faraday Trans. 11981, 77, 1027-1038. 27. Rochester, C. H.; Trebilco, D.-A. J. Chem. Soc, Faraday Trans. 11978, 74, 1125-1136. 28. Acosta Saracual, A. R.; Rochester, C. H. J. Chem. Soc, Faraday Trans. 11982, 78, 2787-2791. 29. Pohle, W. / Chem. Soc, Faraday Trans. 11982, 78, 2101-2109. 30. Rochester, C. H.; Trebilco, D.-A. J. Chem. Soc, Faraday Trans. 11978, 74, 1137-1145. 31. Snyder, L. R. J. Phys. Chem. 1963, 67, 240-248. 32. Snyder, L. R. J. Phys. Chem. 1963, 67, 234-240. 33. Snyder, L. R. / Phys. Chem. 1963, 67, 2344-2353. 34. Holmes-Farley, S. R. Langmuir 1988, 4,166-11 A. 35. Glass, R. W.; Ross, R. A. J. Phys. Chem. 1973, 77, 2571-2576. 36. Glass, R. W.; Ross, R. A. J. Phys. Chem. 1973, 77, 2576-2578. 37. Bate-Smith, E. C ; Westall, R. G. Biochim. Biophys. Acta 1950,4, 427-438. 38. Dallas, M. S. J. J. Chromatog. 1965,17, 267-277. 39. Magee, R S. Quant. Struct.—Act. Relat. 1986, 5, 158-165. 40. Snyder, L. R. \n Advances in Chromatography: Giddings, J. C ; Keller, R. A., Eds.; Marcel Dekker: New York, 1967, pp 3-46. 41. Vemin, G.; Vemin, Mrs. G. J. Chromatog. 1970,46, 48-65. 42. Vemin, G.; Vernin, Mrs. G. J. Chromatog. 1970, 46, 66-78. 43. Klemm, L. H.; Chia, D. S. W.; Kelly, H. R J. Chromatog. 1978,150, 129-134. 44. Zweig, G.; Sherma, J., Eds. Handbook of Chromatography: General Data and Principles', CRC Press: Boca Raton, FL 1972, Table TLC 60. 45. Snyder, L. R. J. Chromatog. 1965, 20, 463-495. 46. Snyder, L. R. J. Chromatog. 1964,16, 55-88. 47. Reference 44, Table TLC 69. 48. Reference 44, Table TLC 119. 49. Giles, C. H.; Hassan, A. S. A. J. Soc Dyers Colour 1958, 74, 846-857. 50. Timofei, S.; Schmidt, W.; Kurunczi, L.; Simon, Z.; Sallo, A. Dyes and Pigments 1994, 24, 267-279.
Energetics of Binding
33
51. Timofei, S.; Kurunczi, L.; Schmidt, W.; Fabian, W. M. R; Simon, Z. Quant. Struct.—Act. Relat. 1995,14, 444-449. 52. Grant, T. M.; King, C. J. Ind. Eng. Chem. Res. 1990, 29, 264-271. 53. Chitra, S. P.; Govind, R. AIChEJ. 1986, 32, 167-169. 54. Abe, I.; Hayashi, K.; Kitagawa, M. Kagaku to Kogyo (Osaka) 1981,55,441-442. 55. Kamlet, M. J.; Doherty, R. M.; Abraham, M. H.; Taft, R. W. Carbon 1985, 23, 549-554. 56. Grate, J. W.; Abraham, M. H.; Du, C. M.; McGill, R. A.; Shuely, W. J. Langmuir 1995, 11, 2125-2130. 57. Abe, I.; Hayashi, K.; Hirashima, T.; Kitagawa, M. Colloids and Surfaces 1984, 8, 315-318. 58. Abe, I.; Hayashi, K.; Hirashima, T.; Kitagawas, M. J. Colloid Interface Set. 1983, 94, 201-206. 59. Abe, I.; Kayama, H.; Ueda, I.; J. Pharm. Sci. 1990, 79, 354-358. 60. Abe, I.; Hayashi, K.; Kitagawa, M. Carbon 1983, 21, 189-191. 61. Sawicki, E.; Stanley, T. W.; Elbert, W. C ; Morgan, M. Talanta 1965,12, 605-616. 62. Reference 44, p 283-436. 63. Reference 44, Table TLC29. 64. Magee, R S. Quant. Struct.—Act. Relat. 1990, 9, 202-215. 65. Reference 44, Table TLC13. 66. Reference 44, Table PCI3. 67. Reference 44, Table PC57. 68. Reference 44, Table PC31. 69. Hartley, G. S.; Graham-Bryce, I. J. Physical Principles of Pesticide Behavior; Academic Press: London, 1980, pp 236-331. 70. Briggs, G. G. J. Agric. Food Chem. 1981, 29, 1050-1059. 71. SabljiC, A. Environ. Sci. Technol. 1987, 21, 358-366.
This Page Intentionally Left Blank
STRUCTURAL EFFECTS ON GAS-PHASE REACTIVITIES*
Gabriel Chuchani, Masaaki Mishima, Rafael Notario, and Jose-Luis M. Abboud
I. Introduction 36 II. Correlation Models and Substituent Constants 37 III. Reactions Involving Ionic Reagents and Products 42 A. Experimental Methods 42 B. Complexes between Bromide Ion and Substituted Benzenes (SB) 42 C. Li"*" Complexes 45 D. Halogen Cations as Lewis Acids in the Gas Phase 48 E. The Power of LFER: Ionization of Br0nsted Acids and the Discovery of "New" Substituents 52 F. Structural Effects on the Stability of Carbocations 56 G. SE on the Intrinsic Basicity of Carbonyl and Thiocarbonyl Compounds . . 66 H. Solvent Effects on Selected Proton Transfer Equilibria 75 I. Correlation between Carbocation Stability in the Gas Phase and Kinetics of Carbocation Formation Reactions in Solution 78
*Dedicated in memoriam to Prof Robert W. Taft Advances in Quantitative Structure Property Relationships Volume 2, pages 35-126. Copyright © 1999 by JAI Press Inc. Ail rights of reproduction in any form reserved. ISBN: 0-7623-0067-1 35
36
G. CHUCHANI, M. MISHIMA, R. NOTARIO, and J.-L. M. ABBOUD
IV. Reactions Involving Neutral Reagents and Products A. Experimental Considerations B. Esters C. Halides D. Carbonates E. Carbamates F. Thionocarbamates G. P-Hydroxyolefins H. a-Keto Acids I. Methanesulfonates J. Alcohols K. Addition of Ketene to Carboxylie Acids Acknowledgments References
83 83 83 100 106 112 113 114 116 116 119 120 121 121
\. INTRODUCTION The quantitative study of structural and substituent effects (SE) in organic chemistry (often by means of linear free energy relationships^"^) may provide important clues for the assignment and interpretation of reaction mechanisms. Difficulties met in the analysis of these effects frequently arise from the involvement of solvent. At variance with this situation, SE on gas-phase chemical reactivity (both kinetic and thermodynamic), are intrinsic, that is, free from perturbations originating in solvent-solute interactions. Comparison of SE on the same reaction taking place in solution and in the gas phase allows to quantify the influence of solvation.^ In the case of molecules involving long alkyl chains, the situation is obviously more complicated, as the molecule can solvate itself intramolecularly in a suitable conformation. Because of technical difficulties, there are relatively few systematic experimental studies of substituent effects on gas-phase reactivity. As to reactions of neutral species, some reviews on gas-phase pyrolysis are available^^"^^ but there seems to be no monographic treatment of SE in these processes. In the case of reactions involving anions and cations, Taft and Topsom^^ and Gal and Maria^"^ have published in 1987 and 1991, respectively, two major surveys. The first one specifically addresses the quantitative study of SE; the latter is more general and focuses on quantitative treatments of acid-base reactions involving neutral bases and a variety of charged electron acceptors. Here, a survey of some recent studies of SE on gas-phase reactivity is presented. Both neutral and charged reagents/products are treated. We try to cover material not included in these reviews and, eventually, when a minor overlap occurs, the treatment of the experimental data is somewhat different from, and complementary to, that given in refs. 13 and 14. In several cases, structural effects on gas-phase and solution reactivities are compared.
Gas-Phase Reactivities
37
Because of the highly specialized and widely different techniques used in the experimental study of charged and neutral species, we shall examine separately both groups of reactions. As we shall see, however, correlation techniques give a surprisingly unified picture of SE on these systems.
II. CORRELATION MODELS AND SUBSTITUENT CONSTANTS Hammett's classical definition^ of a parameters through Eq. 1 is an appropriate starting point: a^ = \ogK^-\ogK^
(1)
K^ and K^, respectively, stand for the ionization constants in water at 25 °C of benzoic acid and a meta- or/^ara-X-substituted benzoic acid. For each substituent, two families of substituent parameters, a and G^ are thus obtained. ^^ Beyond this, several models have been used by different groups of workers. For the sake of conceptual unity, and because of its breadth, we consider Charton's general treatment^^'^''' of the electrical effect Q^ induced by a substituent X on closed shell active sites ranging from cations, such as carbenium ions, to anions, such as carbanions, in systems with or without a skeletal group. According to this triparametric model, Q^^ is given by Eq. 2, Q^ = Laj^ + DGa^ + RG^^ + h
(2)
where GJ represents the electrical effect observed when one or more ^/7^-hybridized carbon atoms separates the active site from the substituent. In this type of system, delocalization of substituent valence electrons is thus minimal. This constant has been called the "inductive" or "field effect" constant. Charton refers to it as the "localized electrical effect constant." The constant a^ represents the resonance effect of the substituent (Charton's "intrinsic delocalized" effect). The constant G^ reflects the electronic demand of the system under scrutiny; h is a generalized intercept. It is important to notice that for a system in which the electronic demand remains constant, Eq. 2 reduces to the biparametric Eq. 3, Q^ = LGjx + DG^x + h
(3)
wherein a^ has the form of Eq. 4, G^ = ^G^ + G^
(4)
and r| is determined by the electron demand. This equation reflects a very important fact: the necessity of using resonance or delocalized effects appropriate to different kinds of reaction centers. Charton takes GJ^ as identical to GJ, defined^^ by means of Eq. 5,
38
G. CHUCHANI, M. MISHIMA, R. NOTARIO, and J.-L. M. ABBOUD
(5^ = ^^KJ\.56
(5)
using the pi^^ values for the 4-X-bicyclooctane carboxylic acids. In a thorough review of SE parameters, Hansch et al.^^ showed that these a^ values are very close to those, Gp, m&2iS\ix\xi%field/inductive effects and obtained by Taft and Topsom^^'^^ by averaging values of determinations by numerous methods. These values shall be used here, for the sake of consistency with previous studies,^^'^"^'^^'^^ on gasphase reactivity of ionic species. It is a reassuring fact that the Gibbs energy changes for the ionization of 4-X-bicyclooctane carboxylic acids in the gas phase are linearly related to Op to a very high degree of precision.^^ Different o^ values are appropriate for situations involving different electron demand. Here, the following parameters shall be used: 1. OR, as determined by Taft^^ through Eq. 6: aR = a p - a ;
(6)
2. aj^+, also determined by Taft,^^ largely on the basis of the a^ parameters (in turn obtained by Taft and coworkers^^ from the ^^C NMR spectra of monosubstituted benzenes). They are appropriate for the treatment of electrondeficient systems. aj^+ values are fairly close to the corresponding a^'s. The main difference is that G^ = 0 for electron-acceptor (+R) groups. 3. o^- parameters are appropriate for the study of electron-rich systems.They are based on SE on gas-phase acidities of neutral acids such as phenols and anilines.^^'^^ For electron-acceptor groups, c^- and c^ are practically indistinguishable. The differences appear in the case of strong electron-donor (-R) groups. The general Taft-Topsom treatment of substituent effects (referred to hydrogen) on a thermodynamic or kinetic property Pr in the gas phase involves the contributions of polarizability (P), field (F) and resonance (R) effects and is given by Eqs. 7 or 8 depending on whether the systems involved are electron-deficient or electron-rich, respectively: 6Pr = Pr(substituent) - Pr(H) = p^^a^ + ppCp + PR+GR^
(7)
5Pr = Pr(substituent) - Pr(H) = f^j3^ + PpGp + PR-C^-
(8)
4. a° are the "normal" substituent parameters, defined by Taft^^ and quantifying substituent effects in systems wherein direct interaction between the substituent and the reaction center is absent. An important alternative biparametric model used in this work is that developed by Yukawa and Tsuno (Y-T) in 1959.^^ It was originally intended to deal with the influence of the para p-donor substituents on reactions that are more electrondemanding than the ionization of benzoic acid. These authors suggested that the values of a"*" - o would provide a scale of enhanced resonance effects and modified the Hammett equation to incorporate this feature in Eq. 9.
Gas-Phase Reactivities
39
log(^//^o) = P(^ + r^^M
(^)
where the enhanced resonance effect (a"*"- a) is written as AG^+. The o"*" parameters are those defined by Brown and Okamoto^^ on the basis of the solvolytic rates of cumyl chlorides; r"^ measures the contribution of the enhanced resonance effect of -R substituents. Later, this equation was modified^'* and the normal substituent constant a° was used instead of a in Eq. 10,
where Aa^+ is now (c^ - a°). This form of the equation may be held to be conceptually more correct than the original one since the a scale itself involves enhanced resonance effects. When r"^ = 0, log (K/K^) = pa°, while if r"^ = 1, it corresponds to straightforward correlation with a^. This modification of the parameter scale does not affect the original meaning and the applicability of the equation. The same idea leads to Eq. 11 for describing the enhanced resonance effect of +R substituents on an electron-rich reaction system such as the ionization of phenols (protonation of phenoxide anions),^^ log(/i://^o) = P(^° + ^"^^/?-) (^^) where Aa^- equals a" - a°. The r~ value indicates the contribution of the enhanced p-7i interaction between Sipara p-acceptor substituent and a negative charge. In this review the Y-T Eq. 10 is mostly applied to the study of substituent effects on the stabilities of electron-deficient systems. With this equation, the concept of varying resonance demand of reactions was introduced into the field of correlation analysis of SE. In the general application of this equation, the r"^ value has been found to widely change with the reaction, and not to be limited to values lower than unity. Indeed, values significantly higher than one are met in reactions more electrondemanding than the solvolysis of r^r/-cumyl chlorides. These r"^ values shed light on the nature of the transition state, and have been widely applied to the assignment and interpretation of reaction mechanisms. A thorough review on the use of the Y-T equation and the concept of varying electron demand has recently been published.^^ A fundamental contributor to SE in many gas-phase reactions of charged species is polarizability.^^ Physically, it reflects the stabilization of the charge (positive or negative) by the substituent through ion-induced dipole interaction. In the TaftTopsom scheme this effect is quantified by the parameter a^. We present in Tables lA and IB the values of the various parameters used in this study. Most of them are taken from refs. 15, 27, 28, and 29. It is a remarkable fact that for effects other than polarizability,^^ no serious differences exist between substituent constant values appropriate for gas and solution phases, except for some particular substituents which have strong specific interaction with the solvent (e.g. hydrogen bonding).^^'^^ This allows us to directiy compare results of correlation analyses of SE in the gas phase and in solution.
40
G. CHUCHANI, M. MISHIMA, R. NOTARIO, and J.-L. M. ABBOUD Table 1A. Substituent Parameters^
Substituent
^F
%
%
PhC+-(Me)2 > PhCH+Me > PhC+=CH2 > PhCH2+ > PhC+(CF3)Me > PhCH+CFa
-0.4
0.0
5.2
7.5
12.0
16.2
19.5
Figure 4 shows the plot of the relative stabilities of substituted benzyl cations against those of the corresponding a-cumyl cations. This plot can be regarded as being a gas-phase o^-plot. There is neither a simple linear relationship nor a monotonic curvature as seen for the substituent effect on the solvolysis of this system.^ ^'^^ In this figure, a good linear relationship with a slope of unity is observed for meta substituents and para 7i-electron-withdrawing substituents, but all para 7C-donor substituents significantly deviate upward from the line of unit slope. The linear relationship with unit slope for nonconjugative substituents clearly suggests the same contribution of inductive/field effects to both systems. Therefore, significant deviations of para 7i-donor substituents must be due to different contribution of resonance effect between both systems. The same pattern of LFER can be observed for the relative stabilities of l-aryl-l-(trifluoromethyl)ethyl cations shown in Figure 5. The upward deviations of para 7C-donor substituents in these figures are systematic, i.e. the stronger the para 71-donor substituent, the greater the deviation, suggesting that the resonance stabilization from para 7i-donor substituents must be greater in the benzyl cation and l-aryl-l-(trifluoromethyl)ethyl cation systems than that in the a-cumyl cation. These trends are consistent with those observed for the gas-phase basicities of aromatic carbonyl compounds as shown below.
Gas-Phase Reactivities
59
20 p-OMe p-SMe 3-CI-4-OMe
Q
3-F-4-OMS 3-CI-4-SMe p-f-Bu 3-CN-4-OMe 3-CN-4-SMe
1
5
0)
2
0
5 o 3.5-F2
-10
m-N02 3.5-(CF3)2
-15
-15
-10 -5 0 5 10 AGB of a-methylstyrenes / kcal mor^
15
Figure 4, Plot of the relative chloride ion affifinties of substituted benzyl cations against relative gas-phase basicities of the corresponding a-methylstyrenes.
The Y-T Eq. 10 could be equally applicable to treatment of these substituent effects as shown in Figures 6, 7, and 8. The correlation results for the stabilities of benzylic carbocations, given by well-behaved substituents, are summarized in Table Q 29,59-70 rpj^^ resonance demand (r"*") value significantly varies with substitution at the benzylic carbon, from 1.00 for the stable a-cumyl cation system to 1.53 for the highly electron-deficient 2,2,2-trifluorophenylethyl cation system. It is found that r"*" increases along with a decrease of the stability (AAG^^j^) of the unsubstituted member of the respective series of benzylic carbocations. Including an ^-p-hybridized carbocation, a vinyl cation, there is an excellent linear relationship between these two quantities with a correlation coefficient of 0.997 and a standard deviation of ±0.02 (Eq. 35 and Figure 9): r+ = 0.0261AAG^^H+l-00
(35)
This correlation clearly demonstrates that the resonance demand substantially varies with the intrinsic stability of a given carbocation, showing a continuous
60
G. CHUCHANI, M. MISHIMA, R. NOTARIO, and J.-L. M. ABBOUD
p-MeO p-MeS
.
3-CI-4-MeO 3-F-4-MeO
o E
s
S
3-CI-4-MeS
3,4-Me2 1 p-f-Bu H p-Me r) i*'^ 3,5-Me2 ^ r
•J3
s
'
^ ' ^ m-Me
3-CI-4-Me
It 0
^
p-Cl m-F
—
>•
^
H
>^
^ ^
> ^ to
y/
^
10
c o
CO
QL fl
m-CI
/7T-CF3
Q) to
• ^ ^ -10 -
3.5-F2 1
-10
1
—
1 0
1
1 10
J
Relative stabilities of a-cumyl cations / kcal mor'' Figures. Plot of gas-phase stabilities of 1 -aryl-1 -(trifluoromethyl)ethyl cations against the corresponding a-cumyl cations: Open circles; para 7i-donor substitutes, closed circles; meta substitutes.
Table 9. Results of the Y-T Analysis for Gas-Phase Stability of Benzylic Cations ArC(R^)R^
R'
Gas-Phase Stability
R"
AAGl^^
p'
r"
CF3
H
19.5
- 1 0 . 6 (-14.2)
1.53
CF3
Me
16.2
- 1 0 . 0 (-13.7)
1.41
H
H
12.2
-10.3 (-14.0)
1.29
H
Me
4.9
-10.1 (-13.8)
1.14
Me
Me
0.0
-9.5 (-13.0)
1.00
Me
Et
-0.4
- 9 . 5 (-13.0)
1.00
=CH2'^
7.5
-10.3 (-14.0)
1.18
=CH-CH3'^
5.7
- 9 . 7 (-13.2)
1.12
14.4
- 9 . 9 (-13.5)
1.39
=CH-CF3
Notes: ^ In kcal mol \ Relative stabilities of the unsubstituted member of respective series, based on free energy changes of proton-transfer or chloride ion-transfer equilibria. ^ Values in p.arentheses are obtained by multiplying the p of log K/K^ by the factor 2.303RT/1000, i.e., kcal mol"^ a"^ unit. ^ 1-Phenylvinyl cations.
Gas-Phase Reactivities
61 p-OMe
15 L X
^
A
IVJ^MA
^-" ' O
o E
3-Cl-4-OMe ^
-8
10 h p-f-Bu 3.4-Me2 p-Me
c o
o-r*4-vjMe
•
^—o
3-CM-SMe
\
•
3-CN-4-OMe
r\
A
VCN-^-^M**
3.5-Me2 m-Me
N
0 U h
H N
c
m-C\
i "^
\^ \
.5
P-CF3
CO
i -10 o
3'5"F2
h
%
m-N02 \
p-CN ^^^^2
3.5-(CF3)2
-15 h^ 1
-1.0
_i
-0.5
1 0.0 a-scale
\^
1
1
0.5
1.0
1
Figure 6. The Y-T plot of gas-phase stabilities of substituted benzyl cations against: a^ (open circles), a° (closed circles), and a with r= 1.29 (squares).
spectrum of the r"*" values. This fact also suggests that the origin of the varying resonance demand is the intrinsic stability of the parent carbocations. In addition, the variation of the r"^ value can be described with the a° and AG^+ substituent constants of the a-substituents (R^ and R^) with a satisfactory precision (r = 0.9992, ^J = ±0.01),Eq. 36, r-" = 0.45 Zo° + 0.40 ZAa^^ + 1.28
(36)
where Za° = a°(R^) + a°(R^) and Aa^^ = Aa^^R^) + Aa^^R^). This result indicates that the r"^ value as well as the intrinsic stability of the parent carbocation are affected by both field/inductive and 7i-electronic effects of the R^ and R^ substituents, in spite of the variation in the central carbon from the primary to tertiary character. This correlation may further have practical use to estimate an r"^ value for a new system of unknown resonance demand. Furthermore, it was found that the r"*" values are correlated linearly with theoretical parameters given by ab initio molecular-orbital calculation at the RHF/6-31G(d)
62
G. CHUCHANI, M. MISHIMA, R. NOTARIO, and J.-L. M. ABBOUD
level, such as the charge (Mulliken populations) on ihQ para position of the phenyl ring and the Wiberg bond order or bond length of Ph-C"^^^ which are associated with the concept of a resonance interaction. Thus, the r^ value has physical significance for characterizing the intrinsic nature of a carbocation itself. The n delocalization of the charge into the aryl n ring competes with the stabilization from the a-substituent(s). This conclusion is also consistent with the fact that the r"^ value for the gas-phase stability of the conjugate acid of the R-substituted benzoyl (ArCOR) system decreases along with an increase in the electron-donating ability of the R-substituent, as discussed later. Substituent Effects in Benzenium and Phenonium Cations There are other important kinds of 7i-delocalized cationic systems—for example, benzenium ion and phenonium ions, which intervene as intermediates in the
15
10
p-MeO r
rn
L\-, iJ
O
^
f^>^_ vj
\.
l
^
^ —V^)
A
^
«-M^Q
1
#- •"*'*-
^^^__ ^ ^ 3-CI-4-MeS
3-CI-4-MeO 3-r-4-MeO
3.4-Me2 p-f-Bu
L
o 6
p-Me
ri/^
\
A
3,5-Me2
CO
o
h
o
i
«o
m-CI
h
1
-10 -1.0
-0.5 0.0 o-scale
\
3.5-F2 .
\
1
0.5
Figure 7, The Y-T plot of the intrinsic stabilities of 1-aryl-1-{trifluoromethyl)ethyl cations against: a"^ (open circles), a ° (closed circles), and a with r = 1.41 (squares).
Gas-Phase Reactivities
63
P-NH2
20 U
A
w
i ''
'r
\
p-MeO r~l / ^
A
TJ 0 \ ^
I 10
\
h
t o >. 5
• p-MeS '^
V, ri 0 \J \^
0
€ ^
O/^Me
n 0-8 1Ii
OUe r/ r+» 0.049 ASE(ph)-0.2
0.6 JO
0.4
1
10
NMeg 1
LJ
20 ASE^pf^j/kcalmol"''
30
1 —
1
0.2 Figure 19. Plot of the r"^ values against the stabilization effect of the phenyl group.
Gas-Phase Reactivities
75
Table 13. Gas-Phase Basicities (GB) for Thiocarbonyl and Carbonyl Compounds GB (kcal mot^:^
Substituents Y
x(co)y^
X(CS)y^
214.2 209.0
218.1 213.7
(3) NHCH3
N(CH3)2 N(CH3)2 NHCH3
208.3
213.2
(4)1-CioHi5
1-^10^15
205.5
209.4
(5)H
N(CH3)2
203.8
208.0 (207.9)"^
(6) CH3O
N(CH3)2
(7) C-C3H5
C-C3H5
201.9 201.4
205.7 207.1
X (1)N(CH3)2 (2) CH3
(8) NH2
NH2
201.0"^
205.1
(9) t-C4H9
t-C4H9
198.4
202.2
(10) camphor
thiocamphor
197.3
201.7
(11)CH3
OC2H5
191.4
197.0
(12) H
H
162.3
177.0
(13) F
F
142.1
159.0
Notes: * All valijes in kcal moM. ^ Values from ref. 82. ^ Values from ref. 84. ^Values from ref. 83.
importance. Notice that the formal analogy suggested by Figure 20 is somewhat misleading since high level ab initio calculations^^ reveal that charge redistributions undergone by protonated carbonyl and thiocarbonyl compounds are very different. H. Solvent Effects on Selected Proton Transfer Equilibria Benzoyl Systems
The relative basicities of the benzaldehydes^^ and acetophenones^^ in aqueous solution are plotted against the corresponding values in the gas phase in Figures 21 and 22, respectively. In an analysis of the solvent effects, it may be convenient to separate the solvent effects into two classes, i.e. common solvent effects on a whole system and specific solvent effects arising from a specific interaction with the particular substituent. Since the latter solvent effects correspond to solvent modifications of the substituents, in order to explore the solvent effects on a whole system it would be reasonable to exclude particular substituents, such as the hydroxyl group, from a comparative analysis, -5AG° versus -5AGg^3. In fact, by excluding these substituents we can find a good linear relationship in both Figures 21 and 22.
76
G. CHUCHANI, M. MISHIMA, R. NOTARIO, and J.-L. M. ABBOUD
130
-i—I—I—I—\—1—I—I—I—I—I—1—I—I—r 140 150 160 170 180 190 200 210
220
AG°H+ (CO) / kcal mor^ Figure 20.
- ACH^ (CS) VS. - ACH+ (CO), for reactions 40 and 41.
The existence of such a linear relationship between gas and aqueous solution phases suggests that there is a common set of substituent constants for the respective series in both phases. Table 14 shows the results of the analysis of these substituent effects in aqueous solution, using a larger number of substituents than that of the present comparative analysis, -5AG° versus -5AG° .^^ The r"^ values for the benzaldehyde and acetophenone series in aqueous solution practically agree with those for the corresponding gas-phase basicities, being consistent with the graphical analysis described above. Considering the similarity of the electron-donating ability between the hydroxyl and methoxyl groups and between NMe2 and NH2 groups, it is likely that the r"^ values of the benzoic acid and benzamide series in aqueous solution are also identical to those in the gas phase. Consequently, the r"^ value is essentially the same in aqueous solution and gas phases. That is, the degree of stabilization of the positive charge through n delocalization into the aryl ring relative to that by an inductive/field effect is independent of the solvation of the
Gas-Phase Reactivities
77
-10
Figure 21,
-5
0 6 -MQO(gi^)/kcalmoM
10
Aqueous solution versus gas-phase basicities of substituted benzaldehydes.
"^ * A
v
y 1 rL
Jj
L
P^ X^
L
I-'
H
/
# mO\tm
/o
1
n>M#
0
^/o
r
p-M«
/ O X^
8 ^h
pOM«
"^ m-a
^r
1 /l>N02 y ^ -2
r ^x yO 1
-10
Figure 22. nones.
5AG0,q = 0^46fiAQOg- 0.16 P-N02
,.
1.
1
-5
0
5
Aqueous solution versus gas-phase basicities of substituted acetophe-
78
G. CHUCHANI, M. MISHIMA, R. NOTARIO, and J.-L. M. ABBOUD Table 14, Results of the Y-T Analysis of Basicities in Aqueous Solution^ Benzamide
p^ r""
-1.243 (-1.67) 0.36
Benzoic Acid -1.146 (-1.56) 0.55
Acetophenone -2.200 (-2.99) 0.76
Benzaldehyde -1.764 (-2.37) 1.16
Notes: ^ Taken from ref. 24. ^ Values in parentheses are obtained by multiplying the p of log K/K^ by the factor 2.303RT/1000, i.e., kcal mol"^ a~^ unit.
cation, and the r"^ value is a function of the structure of the ion. On the contrary, the p values of the solution basicities are remarkably smaller than those of the gas-phase basicities. This is easily explained by the effective dispersion of the positive charge of the ion to solvent molecules. In conclusion, the solvation of a cation reduces the central charge, and this lowers the response to substituent perturbation, essentially without changing the nature of the intramolecular charge-delocalization. Aliphatic and Alicyclic Carbonyl and Thiocarbonyl Compounds
Experimental evidence exists showing that most ketones, esters, amides and ureas also protonate on the carbonyl oxygen in acidic solutions.^^'^"^'^^ The same is known to happen for the homologous thiono compounds. At variance with the gas-phase results, whenever a direct comparison can be carried out between the pK^s of the corresponding conjugated acids (as it is the case for amides/thioamides) one finds that the thiocarbonyl compound is more basic by 1.5-2.0 pK units. This is a consequence of solvation effects (p^^ values are referred to a standard state of pure water). The matter is discussed in detail in refs. 83, 85, 89. I. Correlation between Carbocation Stability in the Gas Phase and Kinetics of Carbocation Formation Reactions in Solution Solvolysis of Benzylic Substrates I Ph-t~L
slow
•
+ R1 Ph-CcT"
tast
•
Product
R2
The p and r"^ values for the Sj^^l solvolysis of a series of benzylic substrates are summarized in Table 15,26,72,90-95 j^. ^^ ^^^^^ ^^^^ ^^^ ^ values for the solvolysis are significantly reduced compared with those for the gas-phase carbocation stabilities. This is reasonably interpreted by the solvent stabilization of the transition state and intermediate cation in the solvolysis. Most importantly, the r"*" value for the Sj^l solvolysis is found to be in complete agreement with that for the gas-phase stabilifies of the corresponding benzylic carbocafions.
Gas-Phase Reactivities
79
Table 15, Results of the Y-T Analysis for the Solvolysis of the Benzylic Substrates ArC(R^)(R^)L
Solvolysis
/
R'
R"
CF3
H
-6.05
1.53
1.53
CF3 H
Me H
-6.29
1.39
1.41
-5.20
1.30
1.29
H
Me
-5.45
1.15
1.14
Me
Me
-^.59
Et
-4.69
1.00 1.04
1.00
Me =CH2^
-4.10
1.20
1.18
CH2CH2 - (k^-process)
-3.87
0.63
0.63
P
4a.
1.00
Notes: ^ 1-Phenylvinyl tosylates. b 2-Phenylethyl tosylates.
Since the solvation of a cation reduces the central charge to lower the response to substituent perturbation, essentially without changing the magnitude of the r"^ value, as noted already, the identity of the r"^ value between the carbocation stabilities and solvolysis rates means that the degree of the charge-delocalization in the rate-determining transition state of the solvolysis is very close to that of the carbocation intermediate. This result provides an important information on the analysis of the substituent effects in the solvolysis. The extremely large r"^ value of 1.53, observed for the solvolysis of l-aryl-2,2,2-trifluoroethyl tosylates, is not a correlational artifact, but must be the resonance demand reflecting a highly electron-deficient cationic transition state of the limiting Sj^l ionizing process in the same manner as that of the solvolysis of ordinary benzylic substrates to give relatively stable carbocations. Similarly, the exalted r"*" value of 1.3 obtained for the solvolysis of benzyl tosylates with electron-donating substituents is not a correlational artifact arising as a result of the non-linearity caused by the k^-k^ mechanistic transition as suggested by Shorter,^^ but must be an intrinsic feature characterizing the nature of the transition state of Z:^ solvolysis of benzyl substrates. The less stable primary benzyl cation should have an inherent resonance demand distinctly higher than the value of r"^ =1.0 of the tertiary a-cumyl cation system. Furthermore, the r"*" value of 0.63 for the phenonium ion is also in complete agreement with the value observed for the corresponding solvolysis via a phenonium ion intermediate. The intermediate r"*" value is characteristic of its unique bridged structure. The agreement of the r"*" value between the cationic transition state and an intermediate cation for all series of the benzylic systems, including a phenonium ion and phenylvinyl cations, leads us to the conclusion that the geometry of the transition state in the ionizing process of the Sj^l solvolysis, which is a highly
80
G. CHUCHANI, M. MISHIMA, R. NOTARIO, and J.-L. M. ABBOUD
endothermic reaction, closely resembles the high-energy product, an intermediate cation. Clearly, these results have confirmed that the r"*" value is an inherent nature characteristic of the carbocation structure itself. Thus, the intrinsic behavior of carbocations in the gas phase provides an important basis for better understanding of the real features of the transition state of organic reactions in solution. Acid-Catalyzed Hydration Reactions of Olefins
Acid-catalyzed hydration of a carbon-carbon double or triple bond, reaction 43, is an alternative route to generate a carbocation intermediate in solution.^^ I ^ Ph-6=CH2 + H^
slow •
iL^CHo Ph-CCp 3
fast _ _ ^
„ _, , Product
(43)
The results of the Y-T analysis of the substituent effects of acid-catalyzed hydration of the styrene and phenylacetylene substrates in acidic media are summarized in Table 16. The p values as large as those of the ordinary benzylic Sj^^l solvoly sis are consistent with the currently accepted mechanism of a rate-determining formation of benzylic carbocation. On the contrary, the r"*" values for the
Table 16. Results of the Y-T Analysis for Acid-Catalyzed Hydration of Double Bond and Triple Bond^ System PhCH=CH2
^hyd
''hyd
rU
hya gas
1.14
-3.11^ -3.94'' -3.30'' -3.56^
0.80^ 0.70"^ 0.79'' 0.94^
-5.45^
0.6/ 0.74'^'S
1.00
0.74 0.82 0.74
0.59
PhC(Me)=CH2
-3.36''S
PhC(CF3)=CH2 PHC=CH
-4.77^
1.15^
1.41
^.30'
0.87'
1.18
-4.20^'^
0.92^'j
Notes: ^ Calculated using data in the literature. In aq H2SO4 at 25 °C unless otherwise noted. ^ Ref. 97a. ^ Ref. 97b. "^ Ref. 97c. « In HCIO4 at 25 °C, Ref. 97d. ^The addition of CF3COOH in CCI4, Ref. 97e. 8 Ref. 97f. ^ Ref. 97g. ' In acetic acid-water-sulfuric acid at 50.2 °C, Ref. 97h. i Ref. 97i.
0.83 0.61 0.69 0.82
0.78
Gas-Phase
Reactivities
81
hydration are noticeably smaller than those of the corresponding cations in the gas phase and of the solvolysis. Although the data used for the present correlation involve only a few substituents, the small r"^ value seems unlikely to be a correlational artifact, because the reduction of the r"^ value is observed for all substrates. The disagreement of the r"^ value between the hydration rates and the gas-phase carbocation stabilities or solvolysis rates therefore suggests that the structure of the transition state of the acid-catalyzed hydration is appreciably different from the corresponding stable cationoid intermediates or Sj^l transition state with respect to 71 delocalization of the positive charge at the reaction center. These results demonstrate that the Yukawa-Tsuno equation is applicable to the gas-phase substituent effects on the intrinsic stabilities of benzylic cations in exactly the same manner as to the solution-phase substituent effects. Solvolysis of Bridgehead Derivatives
We report in Table 17 the standardized rates of solvolysis (as Alogk values in 80% EtOH at 70 °C, relative to l-adamantyl-/?-toluene-sulfonate) of the tosylates of a group of bridgehead and heavily hindered tertiary groups. The thermodynamic stabilities of the corresponding carbocations, as defined by Eq. 37, are also given. Figure 23 is a plot of Alog/: against AG/°37^.Thecorrelationspans231ogunitsfor k. Taking into account that at 70 °C one order of magnitude in rate constants corresponds to 1.57 kcal mol"^ in Gibbs energy of activation, this amounts to 36.1 kcal mol"^ and almost 50 kcal mol"^ in Gibbs energy of bromide exchange. It covers
Table 17,
E x p e r i m e n t a l Values o f AG?37. a n d A l o g k^^^^
Compound
^^(3?)'^
^^^S Koh
(1) 2-re/t-butyl-2-bromoadamantane
15.9
8.8
(2) 9-fe/t-butyl-9-bromobicyclo[3.3.1 ]nonane
15.1
8.6
0.0
0.0
-8.1
-3.6
-10.6
-5.9^
(3) 1 -bromoadamantane (4) 1-bromobicyclo[2.2.2]octane (5) 4-bromohomocubane (6) bromocubane
14.5
-7.3
(7) 3-bromonoradamantane
-15.0
-6.9
(8) 1 -bromohomocubane
-23.7
-ILO*"
(9) 1 -bromonorbornane
-24.3
-10.1
(10) 6-bromotricyclo[3.2.1 .O^*^]octane
-29.6
-13.9"^
Notes: ^In kcal mol"\ ^Relative to 1 -bromoadamantane. ^Extrapolated from triflate solvolysis.
82
G. CHUCHANI, M. MISHIMA, R. NOTARIO, and J.-L. M. ABBOUD
r=0.9957 Slope=0.492(0.016)
lntercept=0.55(0.29) sd=0.77
m i l Hiiiiiiiii|ii
-40
-30
-20
i n i i i i i i i m i l l I I I II n i i i i i i i
-10
0
10
20
AG°P7) / kcal mol"'' Figure 23.
Differential effects on solvolysis rates, Alog k^o\^ vs. AC(37).
kcal mol"^ and almost 50 kcal mol"^ in Gibbs energy of bromide exchange. It covers practically the full experimental rate range for solvolytic bridgehead reactivities, including the previously not accessible 1-homocubyl (8"^), 1-norbomyl (9"^), and 6-tricyclo[3.2.1.0^'^]-octyl (10"^) cations. To our knowledge, this seems to be the widest range ever reported for a correlation of gas-phase data and solution kinetics. Correlation coefficient (0.996) and standard deviation of fit (0.77 on log k) are very satisfactory. The slope of the correlation between log k and the ion stabilities (-0.49) implies that 77% of the energy difference between the bromides and the respective cations are expressed in the rates of solvolysis. This slope compares nicely with that of -0.39 relating log k with strain changes between R"^ and R-Br.^^'^^ The self-consistency of all these results fully supports the basic mechanistic concepts on bridgehead solvolysis.
Gas-Phase Reactivities
83
IV. REACTIONS INVOLVING NEUTRAL REAGENTS AND PRODUCTS A. Experimental Considerations
Kinetic experiments in gas-phase pyrolyses or elimination of neutral organic molecules may lead to complicated interpretations and erroneous Arrhenius parameters unless special precautions are taken, such as seasoning the reaction vessel and most of the times in the presence of a free radical inhibitor. In the following sections only homogeneous gas-phase processes are considered. The literature coverage is careful but by no means exhaustive. Previous studies are briefly reviewed and reexamined from the standpoint of the Taft-Topsom model. B. Esters
The mechanism generally accepted for the gas-phase pyrolysis of esters of carboxylic acids may be represented as in reaction 44:
9
, ,
^o^^^-^ r 1^^
?^ vft-
—
'-?^^^
I
I
— ^-^°°^ ' -^-S- ^^^
For molecular cis elimination, the presence of a P-hydrogen at the alkyl moiety of the ester is necessary. Excellent reviews^^ have accounted for the substituent effects in several series of aliphatic and aromatic carboxylic esters. Substituents in Aliphatic Systems
P'Substituted ethyl acetates: CH3COOCH2CH2Z. The pyrolysis of acetates with alkyl and polar substituents separated from the C^^-O bond by at least three methylene groups (Table 18,1-16) was considered to be subject to a slight steric acceleration.^^^ The best approximate linear correlation was obtained by plotting log k/k^ against Hancock's steric parameter, E^ values (5 = -0.12, r=0.916, at400°C). Electron-withdrawing substituents Z, directly attached to the P-carbon of ethyl acetate reduced the pyrolysis rate according to their electronegative character^^ (Table 18, 1, 2, 18-23). A linear correlation of log k/k^ versus Taft's original inductive effect parameter, a*,^^ was obtained with a p* value of -0.19 (r = 0.961) at 400 °C. Likewise, plotting of log k/k^ against Qj values also gave an approximate linear relationship with a slope Pj = -1.03 (r=0.960) at 400 °C. Notice that although o* essentially reflects field/inductive effects, it also includes a small but significant resonance effect. The negative slope of the lines suggested, in both cases, a transition state somewhat deficient in electrons.
G. CHUCHANI, M. MISHIMA, R. NOTARIO, and J.-L. M. ABBOUD
84
Table 18, Kinetic Parameters for ZCH2CH2OAC Pyrolysis, at 400 °C
z
E^ kj mor^
log A, s ^
10\j, s~^
7 0 \ , s"'
oirH
200.4
3.33
204.1
12.55 12.77
10.00
(2) CH3
8.51
(3) CH3CH2
199.5
12.50
10.47
4.26 5.24
(4) CH3CH2CH2 (5) (CH3)2CH
194.1
12.20
13.80
202.5
12.73
10.23
6.90 5.12
(6) CH3CH2CH2CH2
200.8
12.54
9.12
4.56
(7) (CH3)3C
194.1
12.34
19.05
9.53
(8) CH3CH2CH(CH3)
211.9
13.62
15.31
7.66
(9) (CH3)2CHCH2 dOc-CeHii
203.1
12.82
11.86
5.93
207.4
13.20
14.44
7.22
dDc-CsHg
208.1
13.30
13.30
6.65
(12)CH30CH2
203.3
12.69
8.13
4.07
(13)C6H50CH2
198.0
12.52
14.22
7.11
(14)C6H5CH2
203.1
{15)CH3COCH2 (16)CH30CH2CH2 (17)(CH3)3SiCH2^
198.9
12.80 12.74
10.91 20.14
5.46 10.07
209.5
13.37
12.88
6.44
189.6
12.49
58.88
29.44
(18) F
211.2
(19) CI (20) CH3O
202.0 199.9 200.4
12.68 12.14
1.95 2.88 2.82 3.47
0.98 1.44 1.41 1.74
(21)CH3CH20 (22) QHsO (23) (CH3)20 (24) CH3S (25) CH2=CH (26) CH=C (27) CeHs
179.0
11.96 12.09 12.50 13.90 11.27
200.8 197.3
13.20 13.12
23.99 41.69 64.57
191.6
12.48
41.34
206.6 220.4
2.95 6.17
1.48 3.09 12.00 20.85 32.28 20.67
(28) NC
171.9
11.51
147.9
(29) CH3CO
153.9
10.90
912.0
(30) (CH3)3Si^
175.4
12.19
380.2
190.1
(31)(CH3CH2)3Si^
173.5
12.17
501.2
250.6
73.95 456.0
(32) (CH3CH2)3Ge^
178.0
12.35
338.8
169.4
(33) C6H5(CH3)2Si^
174.7
12.19
426.6
213.3
(34) CH3SCH2CH2
192.1
12.30
Notes: ^ Values taken from ref. 99b. ^ Values taken from ref. 101.
24.55
12.27
Gas-Phase Reactivities
85
7i-bonded substituents at the (3-carbon caused a very large increase in rates (Table 18, 25-29), due to resonance effect. Moreover, the P-organometallic substituents were found to strongly accelerate the elimination process (Table 18,30-33) because of a combination of increased acidity of the |3-hydrogen, stabilization of the incipient positive carbon by carbon-metal hyperconjugation, and steric acceleration-'oi Given the large size of the data base of substituents, it is interesting to examine their effects by using the Taft-Topsom treatment of substituent effects, Eqs. 7 and 8. In this case, Eq. 45 is obtained: log k/k^ = - (0.450 ± 0.041) a ^ - (1.29 ± 0.11) G^
(45)
At 400 °C, r = 0.959, and sd = 0.086 (Table 18,1-10,12,14,16,18-23). Substituent parameter values for 11, 13, 15, and 17 were not available, and the CH3S group, as already described, assists anchimerically the elimination process. Consequently, they were not included into the treatment. The negative value of p^^ indicates the elimination reaction to be favored by the polarizability of the P-substituent Z, while the size of negative pp suggests the stabilization of the transition state by field/inductive effect. The influence of a^ as G^+ or a^- is insignificant. The series of+R substituents (Table 18,1, 26, 28-30) yielded Eq. 46: log k/k^ = - (1.81 ± 0.02) a„ - (0.38 ± 0.03) c^ + (7.34 ± 0.12) a^-
(46)
At 400 °C,r = 0.999,5^ = 0.015 This result implies appreciable polarizability and resonance effect on the rates. The high quality of the correlation is not an artefact due to the use of three parameters with a limited set of data. Indeed, the use of two parameters (excluding the small value of Pp) leads to an excellent correlation (Eq. 47): log k/k^ = - (1.74 ± 0.13) a„ + (6.70 ± 0.74) a^-
(47)
At 400 °C, r = 0.995,5^ = 0.101 Phenyl and vinyl substituents were not included due to lack of coplanarity with the reaction center. No parameters are available for substituents 31-34 of Table 18. It is interesting that these satisfactory correlations with c^ do not contradict previous regression equations involving steric parameters, the reason being at least for alkyl groups that o^ and E^, £f, and i) parameters are significantly correlated. At this point, it is difficult to ascertain whether the physical contribution arises from one of these two effects or a combination thereof.
G. CHUCHANI, M. MISHIMA, R. NOTARIO, and J.-L. M. ABBOUD
86
a-Substituted ethyl acetates: CH3COOCH(Z)CH3. a-alkyl substitution (Table 19,1-9) enhanced the elimination rates of these acetates, this effect being attributed to steric acceleration.^^^ The quality of the correlations obtained when plotting log k/k^ against Taft's steric parameter E^ values (5 = -0.21, r = 0.858 at 320 °C) and Charton's \) values (\|/ = 0.46, r = 0.842 at 320 °C) were rather modest. They showed, however (within the experimental uncertainties of product distribution analyses), that the greater the bulkiness of the a-alkyl the larger the k values. This is reasonable, because the hybridization change at both C^^ and Co atoms from sp^ to sp^, releases the steric interactions between the substituents of these C atoms. The effect of electron-withdrawing substituents directly attached to the a-carbon was believed to be electronic in nature. Thus, plots of log k/k^ versus a* or Oj values approximate straight lines, indicating that field/inductive effect has a significant effect on elimination rates (p* = -0.32, r = 0.878 and pj = -2.18, r = 0.898, at 320 °C).
Table 19. Kinetic Parameters for ZCH(OAc)CH3J Pyrolysis,at 320 °C Z
E^ kj mor^
logAs'^
10\^, S-'
(1-olefin)
(1)CH3
193.8
13.42
2.24
1.12
(2) CH3CH2
197.4
13.70
2.06
1.16
(3) (CH3)2CH
190.6
13.12
2.15
1.63
(4) (CH3)3C
184.3
12.54
2.03
2.03
(5) (CH3)3CCH2
181.2
12.87
8.14
2.52
(6) CH2=CHCH2
178.2
12.34
4.42
1.10
(7) CH3CH2CH2
182.8
12.73
4.27
1.98
(8) CH3CH2CH(CH3)
180.7
12.60
4.84
3.37
(9) C-C3H5
176.9
12.19
4.07
2.12
1.68
1.68
(10)CH2=CH
174.9
11.63
(11)cis-trans-CH3CH=CH
183.6
13.11
8.70
8.70
(12)C6H5
182.8
12.75
4.47
4.47
(13)CH3COCH2
156.4
11.88
(14)CH30CH2
194.9
13.05
0.77
127.4
— 0.44
(15)CH3CO
202.7
13.40
0.35
0.35
(16)COOCH3
209.5
13.45
0.10
0.10
(17)Cl3C
193.7
12.12
0.11
0.11
(18)CICH2
197.4
12.95
0.37
0.24
(19)FCH2
197.8
12.83
0.26
0.19
(20) NC
203.3
12.88
0.09
0.09
(21)(CH3)2NCH2
185.9
12.66
1.94
1.19
(22) C6H5CH2
180.0
12.53
4.75
1.10
(23) C6H5CH2CH2
179.8
12.33
3.12
0.44
Gas-Phase Reactivities
87
Alkyl groups Z at the P-carbon in CH3COOCH(CH2Z)CH3, showed alkyl-alkyl interactions in the cis conformation and alkyl-hydrogen interactions in the trans conformation. In the former, the k value decreased due to steric hindrance, while in the latter the rate increases because of steric acceleration. When Z is an electron-withdrawing substituent, the rate decreased (Table 19, 13-21). When plotting log k/k^ versus c* and QJ values good linear correlation are obtained (p* = -0.26, r = 0.996, and p, = -1.39, r = 0.995, at 320 °C). In view of the experimental difficulties for the analysis of product distribution of a-substituted ethyl acetates, it is possible that the elimination process proceeds by kinetic control with some degree of equilibration which may not be completely ruled out. a-Substituted tertiary acetates: CH3COOC(CH3)2Z. Table 20 reports the kinetic parameters for the gas-phase pyrolysis of tertiary acetates,^^^ CH3COOC(CH3)2Z. The alkyl group (Table 20,1-7,9,12) affected the elimination processes, likely through steric acceleration. This was deduced by correlating log k/k^ against the steric parameters, E^ values of Taft (5 = -0.55, r = 0.956 at 280 °C) and ^ values of Hancock (5 = -0.38, r = 0.964 at 280 °C). When considering the polar Z groups directly attached to the a-carbon, their effects were found to be electronic in nature (Table 20, 1, 13-17). This conclusion was reached when
Table 20. Kinetic Parameters for CH3COOC(CH3)2Z Pyrolysis, at 280 °C
z
10\ s-^ (1-olefin)
E^kjmor^
log As ^
(DCHa (2) CH2CH3
167.2 168.7
(3) CH2CH2CH3 (4) CH2CH2CH2CH3
169.8 166.1 170.2 172.0 154.1 170.2
13.13 13.46 13.85
21.88 33.88 64.57
14.59 24.60 41.20
13.35 13.59 14.45 12.42 13.64
45.97
32.00
162.18 73.42 37.15
(5) CH(CH3)2 (6) C(CH3)3 (7) CH2CH(CH3)2 (8) CeHs (9) CH2CH2C6H5
W^ky s-^
32.33 162.18
151.5
11.97
45.85
50.81 37.15 30.54
(10)CH=CH2
169.8
13.59
35.48
35.48
(11)CH2CH=CH2 (12)c-C3H5
171.0
13.69
34.67
18.72
170.7
13.95
67.61
67.61
(13)CH2COCH3
160.6
12.30
—
13.49
(14)COCH3
180.9
13.47
2.40
2.40
(15)COOCH3
174.6
12.42
0.85
0.85
(16) CN
198.6
14.45
188.8
13.86
0.49 1.07
0.49
(17)CCl3
1.05
88
G. CHUCHANI, M. MISHIMA, R. NOTARIO, and J.-L. M. ABBOUD
plotting log klk^ versus a* and Oj values (p* = -0.45, r = 0.950 and pj = -3.11, r = 0.964, at 280 °C). This result was taken to indicate that the greater the electronwithdrawing character of the polar substituent, the slower the elimination rate is. In the case of the multiple or 7i-bonded substituents such as CH2=CH and C^^ as Z (Table 20, 8 and 10), the rates were affected by the simultaneous steric and resonance effects. The data of Table 20 give a very good correlation (Eq. 48) by means of the Taft-Topsom method: log kJk^ = - (3.24 ± 0.27) a„ - (2.16 ± 0.46) ap - (6.8 ± 1.8) o^^
(48)
At 280 °C,r = 0.965,5^ = 0.26 The phenyl datum is excluded due to lack of coplanarity with the reaction center. The polarizability and the field inductive effects are significant, although the main contributor is Oj^+, which indicates the existence of an electron-deficient center. This fact helps explain why some neutral substrate are rather unstable even at room temperature. Tertiary acetates are more sensitive to polarizability effects than primary ones, and this is reflected when the substituent is directly attached to the reaction site. More significant is the size and the sign of Pj^+ which is quite comparable to that for the correlation with Pj^- for the substituent series in ZCH2CH2OAC with +R groups. Both cases provide support to the concept that the C^^-0 bond polarization in the transition state is the limiting factor followed by the Cp-H bond assistance in the elimination process of these esters. Acylsubstituted carboxylic esters: ZCOOR. Data for the homogeneous unimolecular gas-phase pyrolysis of ethyl^^"^^ (ZCOOCH2CH3), isopropyl^^^^ [ZCOOCH(CH3)2] and tert-buiyl^^ [ZCOOC(CH3)3]a-substituted carboxylic esters are given in Table 21. Correlating log k/k^ versus a* values yielded for ethyl ester, p* = 0.315 and r = 0.976, at 400 °C; for isopropyl ester, p* = 0.464 and r = 0.963, at 330 °C; and for tert-bntyl ester, p* = 0.635 and r = 0.972, at 250 °C. It is important to point out that the k values for several isopropyl a-alkyl-substituted esters given in the above mentioned work^^^^ have been estimated. In this respect, the reported^^^ rate coefficient at a single temperature was now used to determine the E^ parameter by taking log A = 13.10 (Table 21). This value is believed to be reasonable for a six-membered cyclic transition state for the elimination of these isopropyl esters. These studies^^^'^^^^'^^ supported the general concept that electron-withdrawing groups at the acyl side of ethyl, isopropyl, and tert-butyl esters enhance the elimination rate, while electron-releasing groups appear to reduce it.^^^ In addition to these facts, the slopes of the lines for the above-mentioned esters indicated, by extrapolation to one temperature (PT2/PTI ~ ^/^2)» ^^^^ ^^^ negative nature of the acidic carbon and polarity in the transition state increase slightiy from primary to tertiary esters.
Gas-Phase Reactivities
Table 21.
89
Kinetic Parameters for ZCOOR Pyrolysis E^kjmor^
logAs~^
1(fk^, s-^
R=CH2CH3 (ZCOOCH2CH3), at 400 *»C 1)CH3
200.4
12.55
9.93
2) CH3CH2
202.9
12.72
9.40
3) CH3CH2CH2
207.1
13.04
9.27
4) (CH3)2CHCH2
202.5
12.70
9.64
5) (CH3)3CCH2
207.1
13.04
9.27
:6) {CH3)3C
184.1
11.24
8.96
7)C6H5
199.5
12.70
16.49
8) C6H5CH2
200.0
12.60
11.98
9) rrans-CH3CH=CH
195.9
12.25
11.13
10)FCH2
194.0
12.57
32.66
11)F2CH
195.5
12.81
47.86
12)F3C
184.0
12.13
70.80
13)F3CF2C
183.1
12.16
98.18
14)F3CF2CF2C
183.6
12.29
121.15
15)CICH2
197.0
12.70
25.77
16)Cl2CH
193.9
12.62
37.30
17)Cl3C
185.1
12.27
80.29
18)CICH2CH2
196.8
12.54
18.48
19)CICH2CH2CH2
198.7
12.67
17.75
20) BrCH2
195.7
12.62
27.04
21)BrCH2CH2CH2
205.2
12.83
8.95
22) HOCH2
201.4
12.75
13.17
23) NCCH2
191.8
12.29
24) CeHsNH^
169.4
13.30
189.5
12.70
25) CeHsO^
25.32 14188 97.7
COOCH((:H3)2 ), at 330 X 1)CH3
191.1
13.21
4.54
2) CH3CH2
189.9
13.06
4.08
3) CH3CH2CH2
193.7
13.39
4.07
4) (CH3)2CHCH2
189.5
13.01
3.94
5) (CH3)3CCH2
197.0
13.65
3.85 4.79
6) (CH3)3C
189.5^
13.10
7) FCH2
182.8
12.83
9.91
8) CICH2
179.0
12.63
13.34
9) BrCH2
181.1
12.84
14.26
10)ICH2
181.1
13.09
25.35
11)HOCH2
179.9
12.56
9.48
12)CH30CH2
187.8
13.04
5.93 {continued)
90
G. CHUCHANI, M. MISHIMA, R. NOTARIO, and J.-L. M. ABBOUD Table 21, Continued
(13)C6H5CH2
194.1
13.63
6.58
(14) NCCH2
180.3
13.01
24.72
(15)CICH2CH2
180.8
12.57
8.13
(16)Cl2CH
176.2
12.78
32.96
(17)Cl3C
178.3
13.55
(18)CH3CH2CH2CH2
190.9^
13.10
3.63
(19)(CH3)2CH
190.3^
13.10
4.17
(20) (CH3CH2)2CH
190.1^
13.10
4.27
(21)C6H5CH2CH2
189.8^
13.10
4.57
(22) (C6H5)2CH
187.5^
13.10
7.24
(23) CH3CH=CH
190.0^
13.10
4.37
(24) C6H5CH=CH
189.3^
13.10
5.01
(25) (CH3CH2CH2)2CH
190.2^
13.10
4.17
(26) F3C'
171.5
12.70
69.18
(27) CeHs^
187.0^
13.10
7.96
(28) CeHgNH^
166.1
12.10
51.47
180.3
13.50
76.12
(29) CeHsO^ R=C(CH3
127.6
COOC(C H3)3), at 250 °C
(1) CH3
166.0
13.06
3.04
(2) CH3CH2
160.7
12.56
3.24
(3) CH3CH2CH2
163.9
12.77
2.51
(4) (CH3)2CHCH2
170.5
13.42
2.45
(5) (CH3)3CCH2
174.3
13.77
2.29
(6) (CH3)3C
169.1
13.44
3.55
(7) (CH3)3Si
181.1
14.63
3.51
(8) C6H5CH2
164.7
13.15
5.05
(9) Q H g
165.4
13.63
6.97
(lOQHsNH
167.1
14.03
22.05
dDCeHsO
153.2
13.20
79.43
(12)CH30CH2
168.2
13.49
4.90
(13)BrCH2
154.6
12.64
15.85
(14)CICH2
153.1
12.49
15.85
(15)Cl2CH
150.0
12.67
48.98
(16)Cl3C
141.1
12.41
(17)F3C
105.4
10.58
(18)NCCH2
137.8
11.31
Notes: ^ Values taken from ref. 104b. ^The obtained f^ value by scaling log/\ = 13.10. '^ Values taken from ref. 105b. ^ Values taken from ref. 105c.
208.9 11220 35.48
Gas-Phase Reactivities
91
The interposition of a methylene group between the substituent and the carboxylate reaction center greatly reduces resonance interactions. Moreover, the crucial C ^ • • • O ^ bond is rather far from Z, which means that polarizability effects are minimal and may be neglected. Consequently, and according to the Taft-Topsom treatment, the field inductive effect Qp, appears to be the main factor affecting the elimination rates of these esters (Eqs. 49-51): ForZCOOCHXH,, log ik/it^ = (2.09 ± 0.11) Gp
(49)
At 400 °C, r = 0.979, sd = 0.078 ForZCOOCH(CH3)2, log k/k^ = (2.98 ± 0.22) Gp
(50)
At 330 °C, r = 0.958, sd = 0.145 AndforZCOOC(CH3)3, log )fe/ito = (3.76 ± 0.23) Qp
(51)
At 250 °C, r = 0.979, sd = 0.l35 Estimation of Pp at a single temperature as above, confirms the increase of the negative character of the acidic carbon in the transition state from primary to tertiary esters. Substituents in Cyclic Systems
The sequence of relative rate coefficients for gas-phase monocyclic acetates are presented in Table 22.^^ The pattern is analogous to that found by Sicher^^^ for amine oxide eliminations.
Table 22. Kinetic Parameters for CH3COOZ Pyrolysis, at 330 °C
z
E^ k} mor^
log A, s ^
/o'/c,, s-^
(1)(CH3)2CH (2) C-C5H9
191.1
13.20
4.47
179.8
12.68
12.88
(3)c-CeH„
203.9
14.02
2.29
(4)c-C7Hi3
178.3
12.62
15.14
(5)c-C8Hi5
177.5
12.81
26.92
(6) c-C^o'^19 (7) C-C12H23
168.7
12.55
87.10
181.0 177.1
13.09
25.70
12.46
13.18
(8)c-Ci5H29
92
G. CHUCHANI, M. MISHIMA, R. NOTARIO, and J.-L. M. ABBOUD Table 23, Ring Strain of CycloaikyI Acetate Pyrolyses
Acetate
E^kjmor^
E^ kj mor^ ^
cyclohexyl
203.9
0.0
cyclopentyl
179.8
26.4
cycloheptyl
178.3
26.8
cyclooctyl cyclodecyl
177.5 168.7
41.4 52.7
cycled odecyl
181.0
18.4
cyclopentadecyl
177.1
6.3
Eykjmor^^
AEy kJ mor^""
5.9 24.7
0.0
22.6 25.1
10.1
7.6
37.5
22.2 21.1
— —
— —
Notes: ^ E^ = strain energy in cycloalkane. ^ fg = strain energy in cycloalkene. ^ Afs = (fs -"^s) - KEs-'E^ ^or cyclohexyl].
The relative low k value of the cyclohexyl acetate is probably due to a reflection of the difficulty of the six relevant atoms to assume an optimum planar or chair conformation in the transition state. It was considered that this requirement for a cyclic array of six key atoms was the most important factor in determining the relative pyrolysis rate of the other members of these series. The strain energy data given in Table 23 indicated that the strain energy difference, AE, except for cyclodecyl, increased in the same sequence as the rate given in Table 22. After a study of the rates of pyrolysis of cycloalkyl chlorides, Dakubu and Holmes^^^ concluded that ring strain may affect the rates in two ways: (1) strain enhances the energy of the ground state relative to the transition state thereby lowering the activation energy, and (2) the presence of strain in a ring system facilitates the attainment of the geometry of the transition state. This study ^^^ did not report the kinetically controlled product ratio of cis- and trans-olcfm and so, extensive speculation concerning the reasons for the enhanced rate of pyrolysis of cyclodecyl acetate was not warranted. It was thought possible that part of both the Baeyer strain and the intra-annular repulsions are relieved in proceeding to the transition state for elimination. Substituents in Alicydic Systems
The results for the gas-phase unimolecular elimination of 4-substituted isobornyl acetates are given in Table 24.^^^ The schematic representation of this elimination reaction is shown in Scheme 1. The rates were followed by CH3COOH titration. Electron-withdrawing polar substituents at C4 caused a decrease in k values with pj = -0.70 and r = 0.903, at 340 °C. The effect of these polar substituents on the elimination rate was to be modest. Negative result of Pj was associated to that of P-substituted ethyl ace-
Gas-Phase Reactivities
93
//
+
CH3COOH
/
tates.^^ In the case of this work,^^^ however, there is an interposition of a tertiary carbon containing the substituent. In the application of Taft-Topsom treatment, it is rather surprising to find the c^+ parameter to be of paramount importance in the CH3COOH elimination (Eq. 52). log yk^ = - (0.76 ± 0.07) Gp - (1.75 ± 0.26) a^^
(52)
At 340 X , r = 0.977, sd = 0.072 This result seems to indicate that the isobomyl moiety bears an overall positive charge which is stabilized by the electron-donating group and destabilized by field effect. Substituents in Aromatic Systems
P'Aryl ethyl acetates: CH^COOCHICHIC^HAZ. The kinetic parameters for the gas-phase thermal decomposition of P-aryl ethyl acetates^^^ are shown in Table
Table 24. Kinetic Parameters for 4-Substituted Isobornyl Acetates Pyrolysis,
at 340 °C Substituent
E^kjmor^
log A, s '
70%, s~^
(1)H
189.2
12.82
5.01
(2) CH3
186.7
12.72
6.50
(3) QHs (4) CH3CO
181.5
12.57
12.76
190.9
12.79
3.35
(5) CI
191.8
12.89
3.53
(6)CN
192.6
12.66
1.78
(7) NO2
191.5
12.56
1.75
94
G. CHUCHANI, M. MISHIMA, R. NOTARIO, and J.-L. M. ABBOUD
25. On plotting log klk^ values against Hammett's a, a reasonable correlation was obtained with p = 0.2 at 377 °C. This work suggested electron-supplying substituents at the aromatic nuclei decreased the rate, while electron-withdrawing substituents increased it. Use of the Y-T Eq. 11, yielded a better linear relationship. Apparently, resonance interactions of the substituent with the C^-H are important for the overall reactivities. If the 4-Cl substituent of Table 25 is excluded from the correlation against the resonance parameter a", a good relationship is obtained (Eq. 53). log it//:^ = (0.16 ±0.017) a"
(53)
At 377 °C,r = 0.978,5^ = 0.011 In spite of the limited number of substituents and small differences in k values, their influence on the P-hydrogen assistance for elimination may apparently be rationalized as above. a-Aryl ethyl acetates: CH3COOCH(C6H4Z)CH3. The effects of a considerable number of substituents at the aromatic rings of a-ary 1 ethyl acetates pyrolyses have been reported in various papers.^^"^"^^^ An interesting feature of a-aryl ethyl acetates pyrolysis was thought to serve as a model reaction for determining quantitative electrophilic reactivities in the absence of solvents, catalysts, etc.^^^^^^ Glyde and Taylor investigated the gas-phase elimination kinetics of several polymethyl^^^ and polychloro-substituted^^^ a-aryl ethyl acetates. The methyl and chloro substituent effects were found to be not additive. In addition to these studies, several papers on the effect of heteroaromatic and heterocyclic groups at a-position of ethyl acetates were published.^^"^^^^
Table 25. Kinetic Parameters for CH3COOCH2CH2C^H4Z Pyrolysis, at 377 °C
z
E^ kj mor^
logAs'^
10\,\s-'
(1)2-CF3
191.2
12.55
15.19
(2) 3-CF3
189.9
12.41
13.99
(3) 3-F (4) 4-CI
189.9
12.39
13.37
191.2
12.41
(5)H
191.6
12.48
11.00 12.00
(6) 2,3,4,5-6-F5
191.6
12.46
11.46
(7) 4-F
191.6
12.46
11.46
(8) 4-CH3
192.4
12.50
10.84
(9) 4-CH3O
192.4
(10)2-F
192.8
12.51 12.44
11.09 8.77
Note: * Our calculated /c-values from the parameters of this table disagree with data reported in ref. 113.
Gas-Phase Reactivities
95
In many cases, the pyrolysis experiments were carried out at a single temperature. Collecting such information into a single large table and extrapolating to one common temperature yielded unreliable and contradicting rate coefficients. Consequently, very poor correlations were unfortunately observed. However, most of these studies reached to the conclusion that electron-donating substituents in the benzene ring increased the k values and the electron-withdrawing substituent reduced them. a-Aryl-a''Methyl Ethyl Acetates: CH3COOC(CH3)2C6H4Z. In contrast with the effect of the aryl group in a-aryl ethyl acetates, the influence of substituents in the aromatic ring of a-aryl-a'-methyl ethyl acetates was better described^^^ (Table 26). As in the a-aryl ethyl acetates, the electron-donating substituents enhanced the rate while electron-withdrawing substituents decreased it. These tertiary esters showed large elimination rates because of the more positive character of a-carbon in the transition state. A good Hammett correlation with the original a"" values was obtained (p"" = -0.74 at 550 K (277 °C)). The good correlation and the corresponding interpretation of substituent effect described above is confirmed with a^ and a^ from Table 1A (Eq. 54). log k/k^ = - (0.86 ± 0.05) Q-'
(54)
At 277 °C,r = 0.993,5^ = 0.03 Substituted ethyl benzoates: ZC6H4COOCH2CH3. The relative rates of elimination of substituted ethyl benzoates were determined in a flow system at 515 ^C^^"^ (Table 27). Linearity of the correlation against Taft's a° (p° = 0.21 at 515 °C) values was better than that against Hammett a values. The authors claimed that this means that the build up of negative charge in the C - 0 bond of the ester on going from reagents to the transition state in ester pyrolysis is much smaller than that
Table 26. Kinetic Parameters for CH3COOC(CH3)2C^H4Z Pyrolysis, at 277 °C
z
E^ kj mor^
logAs'^
W\,,s-'
(1)4-CH3
156.4
12.87
103.0
(2) 3-CH3
159.8
13.03
70.79
(3)H
160.6
13.00
55.46
(4) 4-CI
161.9
13.08
(5) 3-CI
166.1
13.22
50.18 27.64
(6) 3-pyridyl
165.2
13.23
34.44
(7) 4-pyridyl
173.2
13.56
12.80
(8) 2-pyridyl
174.0
13.56
13.22
96
G. CHUCHANI, M. MISHIMA, R. NOTARIO, and J.-L. M. ABBOUD Table 27. Relative Rate of ZC^H4COOCH2CH3 Pyrolysis, at 515 °C (Method: Flow System) Z
Relative Rate
(1)4-NH2
0.86
(2)4-OH
0.91
(3) 4-OCH3
0.96
(4) 3-OH
0.98
(5) H
1.00
(6)3-NH2
1.01
(7)4-CH3
1.03
(8)3-OCH3
1.07
(9)3-CH3
1.08
(10)4-Br
1.25
(11)3-Br
1.27
(12)3-CI
1.28
(13)4-CI
1.28
(14)3-1
1.32
(15)4-1
1.32
(16)4-N02
1.42
(17)3-N02
1.50
corresponding to the ionization of benzoic acid. The relevant correlation equations are: log i^^^ = (0.21 ± 0.02) a°
(55)
At515°C,r = 0.931,5^ = 0.03 logifc/^o = (0.21±0.03)a
(56)
At515°C,r = 0.923,5^ = 0.05 The difference between the quality of fit in Eqs. 55 and 56 is not large enough to permit any conclusion to be drawn. Substituted isopropyl benzoates: ZC6H4COOCH(CH3)2. A series of metaand para-substituted isopropyl benzoates^^^ at the single temperature of 337.4 °C were pyrolyzed and, as expected, the results were similar to those for substituted ethyl benzoates.^^"^ The rate of formation of propene is increased by electronwithdrawing substituents and reduced by electron-releasing substituents (Table 28). The log k/k^ correlated well with Taft's a° values. The p° = 0.33 was reported to be slightly higher than that observed in ethyl benzoates (p° = 0.20).^^"* The authors
Gas-Phase Reactivities
97
Table 28. Rate Coefficients for ZqH4COOCH(CH3)2 Pyrolysis, at 337.4 ^C 7 0 % s-^ (1)4-C(CH3)3
10.8
(2)4-CH3
11.0
(3)3-NH2
11.0
(4)4-OCH3
11.1
(5)3-CH3
11.9
(6)H
12.4
(7)3-OCH3
12.6
(8)4-F
13.8
(9) P-naphthyl
14.0
(10)4-CI
15.6
(11)3-F
16.0
(12)3-a
16.4
(13)3-N02
20.9
(14)4-N02
21.6
were surprised by the fact that ethyl benzoate pyrolyses showed a resonance-free a° correlation as there is no insulating methylene bridge between the reaction center and the benzene ring, especially as the transition state involves a degree of charge separation formally similar to that in the benzoate anion. In this respect, the difference in resonance stabilization between the reagent and the transition state becomes important when the carboxylic anion is fully developed; with incipient species the resonance effect may apparently be small. Among this series of aryl esters, several interesting pyrolytic eliminations of isopropyl (hetero)aryl carboxylate esters were described^^^ where the definition of new a° substituent constants of hetero-substituents were reported. Since isopropyl benzoates pyrolyzed at much lower temperature than ethyl benzoates, the transition state is more polar in nature. Therefore, the substituent at the aromatic rings must show a more pronounced effect on the reaction center. The Hammett equation gives a good correlation (Eq. 57): log k/k^ = (0.310 ± 0.010) o
(57)
At 337.4 °C, r = 0.987, sd = 0.02 Taft's G° values perform slightly better: log k/k^ = (0.310 ± 0.003) a° At 337.4 °C, r = 0.993, sd = 0.01
(58)
98
G. CHUCHANI, M. MISHIMA, R. NOTARIO, and J.-L. M. ABBOUD
Substituted tert'butyl benzoates: ZC6H4COOC(CH3)3' Earlier work on the pyrolysis of substituted tert-butyl benzoates at 274.4 °C was found difficult to analyze because of nonreproducible rates.^"^^ However, a later investigation on the kinetic studies of tert-butyl benzoates showed less difficulties and normal Arrhenius parameters were obtained.^^^ The log k/k^ gave a good correlation with a° values with p° = 0.58 corrected to 600 K (327 °C) (Table 29). The magnitude of this value compared to previous reported value (corrected to 600 K) for ethyl^^"* and isopropyl^^^ of 0.26 and 0.34, respectively, was assumed to confirm that the transition polarity of esters along the series increased along the order primary < secondary < tertiary with the biggest polarity differences occurring between secondary and tertiary esters. In the study for the pyrolysis of tert-huiyl heteroaryl carboxylate esters,^-^^ the Hammett correlations with the literature a° values of heteroaryl substituents showed a reaction p° constant compatible with the ethanoate molecular frame rather than with the carboxylate structure. The data of Table 29 for tert-butyl benzoates leads to an excellent Taft's o° correlation, (Eq. 59). The result with a values is quite fair (Eq. 60). log it/ko = (0.62 ± 0.02) a°
(59)
At 311.9 °C,r = 0.996,5^ = 0.02 log i^^ = (0.62 ±0.04) a
(60)
At 311.9 °C, r = 0.989, sd = 0.05 a-Arylethyl benzoates: C6H5COOCH(CH3)C6H4Z and tert-butyl-a'arylacetates: ZC6H4CH2COOC(CH3)3. Rate data for pyrolysis of a-arylethyl benzoates, C6H5COOCH(CH3)C6H4Z, given in Table 30,^"*^ gave a good correlation
Table 29. Rate Coefficients for ZC^H4COOC(CH3)3 Pyrolysis 1(fkyS~^ at 311.9''C
/A-rS'^ at 297.8 °C
(1)4-OCH3
3.23
1.44
(2) 4-CH3
3.42
(3) 3-CH3
3.58
1.48 1.57
(4)H
3.83
1.66
(5) 3-OCH3
4.37
1.88
(6) 4-F
4.86 5.94
2.23 2.62
(7) 4-CI (8) 3-CI
6.60
3.02
(9) 3-NO2
11.17
5.00
(10)4-NO2
11.80
5.08
Gas-Phase Reactivities
Table 30.
99
Kinetic Parameters for qH5COOCH(CH3)C6H4Z Pyrolysis, at 641 K
(368 "O E^kjmor^ (1)4-CH3 (2) 3-CH3
167.6 175.4
logAs'^
.4/. a
lO^k ; , S
12.15
308.7
12.64
220.7
(3)H
173.4
12.38
176.5
(4) 4-CI
175.9 180.7
12.53
156.0
12.70
93.72
(6) 4-CF3
181.1
12.63
74.00
(7) 3-NO2
176.1
12.11
57.12
(8) 4-NO2
177.8
12.21
52.26
(5) 3-CI
-/
with c^ values with p"^ = -0.68 at 641 K (368 °C). This result suggested the p factor to be between those for acetates and phenyl carbonates, and nearer to the value of the former. Previous work of Smith and coworkers^ ^^^ on a series of a-arylethyl benzoates, had laid major emphasis on obtaining LFER involving 'm
(COO)i
|yjg(o.5'3aooo)
15,000
jUODD)
0 0 4,300
Yn, (COO)2
|yjg(o.i43'30,ooo) JUODD)
0
-3,300 Vm |yj/,)(0.091-30,000) 3,000 0 jUODD)
-O-
coo-o- interaction
-18,000 -10,000 119,400
asymmetry correction Total y „ Molar we ight of the structural unit = 348 Hence T^,= 119,400/348 = 343K Observed T„ = 338 K
7^ = 414.2(1-0.627/0
(77)
They do not quote predicted melting points using this equation, but show graphically that a plot of T^^ against l/t gives a good straight line (cf. the work of Buckley and Kovacs^^^). Mekenyan et al.^^ used a graph-theoretical approach to the calculation of a range of physical properties, including melting point, of polymers. They developed a total of 29 equations for the prediction of melting points of various polymers. Their approach is based on the Wiener index, which is a topological index relating to the number of bonds between each pair of atoms in the molecule. Since this tends to oo
infinity as AZ -^ oo, the authors used a modification W of the Wiener index in their correlations. Two examples of their correlations are given below: Polyethylene mp = 693.675-6439.3 W+ 11746.3 W^ n=l7
r = 0.996
5 = 6.51
(78)
The Prediction of Melting Point
165
Polycapramide
mp = 507.751 - 1297.3 W+ 3424.36 W^ n=ll
r = 0.909
5=1.61
(79)
However, the method does not give good predictions of the melting points of infinite chain length polymers; for polyethylene it yields 123.3 °C, whereas the observed value is 138 °C. Mandelkern and Stack^"^^ have given an excellent critical discussion of the theoretical and experimental basis for determining the melting temperatures of long-chain molecules. They confirmed the validity of the Flory and Vrij approach,^^ and suggested that some earlier correlations (e.g. that of Wunderlich and Czornyj^'*^) may not be as good as claimed because of errors in experimental melting points of very long-chain alkanes. Mandelkern and Stack pointed out that the Flory and Vrij approach is the correct one where molecular crystals are formed, but for real polymer chains of finite length, molecular crystals cannot be formed and a different analysis is required. They proposed the use of Flory's equation^^^ (Eq. 72), but stated that because the parameters involved are molecular weight-dependent, it is not possible to extrapolate to the melting point of the infinite chain length polymer using solely the melting temperatures of equilibrium crystallites formed by chains of finite length. Cantor and Dill^"^^ have pointed out that most liquid n-alkanes comprising 9-14 carbons freeze to a "rotator" phase a few degrees above the temperatures at which they fully crystallize. They developed a statistical mechanical theory to predict melting from the rotator phase, and, although they did not tabulate results, showed graphically that experimentally observed melting points were extremely close to their predicted values, from C^ to CggQ. Starkweather^"^, again using the Flory and Vrij approach,^^ was able to predict the melting points of perfluoroalkanes and poly(tetrafluoroethylene). Using differential scanning calorimetry, he calculated that a perfectiy crystalline, chain-extended, monodisperse high polymer should melt at 347 °C, which compares well with an experimental value of 346 °C. Copolymers present a rather different problem, and require a different approach. Frushour^"^^ developed an equation to predict the melting points of polyacrylonitrile copolymers: n-\
1/7-™-!/?;;=!:^^,
(80)
where T^ = melting point of the homopolymer, n = copolymer order, X- = mole fraction of the f^ monomer, and ATj = corresponding melting point depression
166
JOHN C. DEARDEN
constant. Using this equation, Frushour was able to predict the melting points of a range of copolymers and terpolymers to better than 1 degree in most cases. Tanaka^"*^ (and references cited therein) has modeled the melting points of atactic polypropylene and propylene/ethylene copolymers,
where R = universal gas constant, h^ = heat of fusion per molar structural unit of major component, h^ = heat of transition per molar structural unit due to quasicrystals in the amorphous regions, a^ = molar surface free energy at the ends of a crystal, a = a constant relating to number and mean lengths of blocks composed of crystallizable units and ^ = crystal length. No comparisons of observed and predicted melting points were given by Tanaka. Polikarpov et al.,^"^^ studying polyorganocarbosilanes, devised the equation. (82)
i/7'm=E^,Ay,/SAv, R
I where AV- = incremental volume of r unit of
Si
(CH2)3 , K- = 18.5/?/zD.,
R
R = radius of atom in question, z = coordination number and D- = bond length between atoms. For eight different polyorganocarbosilanes the average error of the predicted melting points was 7 degrees. earlier et al.^"^^ used a new concept, the percentage of rigid chain length (PRCL), as a means of predicting the melting points of poly(aryl ether ketone)s and poly(aryl ether sulfone)s. For example, for poly(aryl ether ketone)s they obtained: mp = 9.7937 PRCL - 202.33 n=lO
r = 0.996
(83)
s not given
They point out that the enthalpy of melting is fairly constant due to the isomorphism of the diphenyl ether and diphenyl ketone groups, so variations in entropy of melting are largely responsible for the variation in melting point; it is this factor which is believed to be largely responsible for the rectilinear correlation observed in Eq. 83. Tan and Rode^"*^ investigated the relationship between the melting points of oligomethylenes and quantum chemical properties calculated using CNDO/2. They found an excellent correlation with the sums of charges on carbon (QC) and hydrogen (QH) atoms, respectively:
The Prediction of Melting Point
167 n
mp = 484.4 + 278687.8 ^ i
2n+2
QC/n + 503772.4 ^ QH./{2n-^2) i
n = 4l r = 0.999 5=1.72 (84) The average error in the predicted melting point for 22 compounds not in the training set was 0.60 degrees. Tan and Rode observed that the sums of charges correlated well with the number (n) of methylene groups in the oligomers, leading toEq. 85: mp = 141.4 + 7918.6/n - 10535.2/(n + 1)
(85)
The authors claim that their method yields better predictions of the melting points of oligomethylenes than does that of Somayajulu.^^ Sumpter and Noid^^^ have used an unsupervised back-propagation neural network to predict the melting points of a range of polymers using descriptors defined by a combination of molecular connectivity indices, chemical composition and lUPAC nomenclature. For a set of 56 unspecified polymers selected from 11 different families, their method predicted melting points with a standard error of 21 degrees, compared with 26 degrees using partial least squares regression, 24 degrees using locally weighted regression, 26 degrees using ridge regression, 30 degrees using polynomial partial least squares regression, and 40 degrees using kernel regression. They reported the experimental melting point range of the polymers as 230-266 K, which seems incorrect. Burkhardt et al.,^^^ using comparative molecular field analysis, were able to predict the melting points of 11 polypropylenes with steric descriptors of the metallocenes used in the polymerization process; a four-component correlation yielded r = 0.987 and ^ = 7.05 degrees. It is hoped that these few examples of studies concerned with the prediction of melting points of polymers will have given a reasonable, albeit brief, overview of this field which, although important, is perhaps of only peripheral interest to the environmental chemist and pharmaceutical formulator. Inorganic Substances
In contrast to the relatively large number of papers dealing with the prediction of melting points of organic compounds, there has been very little work done concerning inorganics. Gold and Ogle^^ examined the accuracy of three methods in predicting melting points of inorganic compounds. They reported mean percentage errors (± 95% confidence limits in degrees) as follows: Method of Lorenz and Herz^^ for 42 compounds: 26.63% (±111 degrees) Method of Benko^^ for 35 compounds: 12.88% (± 77.94 degrees) Method of Prud'honmie^^ for 37 compounds: 6.26% (± 63.72 degrees)
168
JOHN C. DEARDEN
However, as commented earlier, Gold and Ogle's method of calculating percentage error is open to doubt. Wachalewski,^^ whose work on the prediction of melting point of organics has already been discussed (see Eq. 33) also applied his method to a series of 19 simple inorganic compounds, with an average error of 28.2 degrees. Sharma,^^^ starting from thermodynamic theory, developed an equation for predicting the melting point of "simple nonpolar liquids," meaning essentially the inert gases, T^ = r&(v''^-l)/V''^
(86)
where 7^ = characteristic temperature, 5 = lattice distortion parameter, and V = reduced volume. For four inert gases and nitrogen, the average error in predicted melting point was 6.5 degrees, with the error rising with atomic size. It is not known whether the method can be applied to inorganic nonpolar liquids. An interesting series of studies has been carried out by Kutolin and coworkers. Kutolin et al.^^^ developed the following equation for the melting points of binary compounds of lanthanide rare-earth elements, mp = 246.91^° + 48 E^(X) - 578.1 m/n + 2482.2
(87)
where dff = number of electrons in ^2 orbit of the lanthanide rare-earth element, Ep(X) = Fermi energy of the element X, m = number of atoms of X in molecule, and n = number of atoms of lanthanide rare-earth element in molecule. For 11 such compounds, the average error in predicted melting point was found to be 46.5 degrees. In an extension to this work, Kutolin and Kotyukov^^"^ and Kutolin et al.^^^ used Chebeychev functions (orthogonal functions analogous to principal components) derived from electronic parameters such as Fermi energies to develop a series of equations for the melting points of binary compounds and sesquioxides of rareearth elements. For 12 such compounds, the average error in predicted melting point was 100.3 degrees, suggesting that the method is not so satisfactory as that of Kutolin et al.^^^ referred to above. Kutolin et al.^^^ simplified their 1978 approach when they correlated the melting points of refractory metal dihydrides with a single electronic parameter, the Fermi energy level (£p) of the metal dication: r ^ = 17.537 £:p+1570
(88)
For seven such dihydrides, the average error in predicted melting point was found to be 80.4 degrees, with a range of 1 to 199 degrees. A number of workers have attempted to predict the melting points of superheavy elements. Keller et al.^^^ used the Lindemann equation^^ to predict such melting points. Kazragis et al.^^^'^^^ developed several equations for the prediction of melting points of metallic elements. For example, they developed Eq. 89 from the
The Prediction of Melting Point
169
melting points of Mo, Tc, Ru, Rh, Pd, Ag, and Cd, and used it to predict the melting points of some superheavy elements. T^ = 2888 + 15.334 w^ - 2.6124 w^
(89)
In Eq. 89, w = number of outer s and d electrons in an atom. The actual and predicted melting points for the training set elements are:
Element 7^ (observed) Tj^ (predicted)
Mo 2893 2876
Tc 2473 2743
Ru 2523 2532
Rh 2236 2226
Pd 1827 1809
Ag 1235 1267
Cd 594 582
Another equation developed by Kazragis et al., but with no predictions given, is, 7'm = ^ e ° ' ' ' - u ' " ^ ' ' ' ' / W ' i i r
(90)
where c^ = electron density in the conduction band, r^ = ionic radius, / = internuclear distance, and W^^ = ionic charge in metallic state. Kutolin et al.^^^ used electronic parameters to calculate the melting points of elements with such high atomic numbers that they were as yet undetected or had not had their melting points determined. They reported the following equation, although no evidence was offered for its validation on elements whose melting points are known, mp = 0.80726 x-^ + 130.5505 X2X^ + 144.28273 x^x^ - 42.68037 x^x^ - 299.86699 (91) where jCj = atomic number, X2 and x^ = number of electrons in outer sublevels s and d respectively, x^ = periodic table group of element, x^ = quantum number for the M^ shell, and x^ = magnetic quantum number. They reported the predicted melting points of 16 elements from atomic numbers 104 to 160. Bonchev and Kamenska^^^ used the Shannon information index to predict the melting points of the 113-120 transactinide elements. They commented that their predictions were similar to those of Keller et al.^^^ Gomez et al.^^^ have attempted to predict the melting points of some face-centered cubic noble and transition metals using the calculated tight-binding potential. Their predictions can be described as only fair, having a mean error of 341.7 degrees for nine metals. Li et al.,^^^ using neural networks, found that five descriptors (electronegativity difference, valence electron density difference, electron-atom ratio, metallic radius ratio, and average melting point of constituent elements) could model the melting points of AB-type intermetallic compounds with a mean error of 14.8 degrees for 11 such compounds.
170
JOHN C. DEARDEN
Reddy et al.^^"^ observed that the melting points of tetrahedral semiconductors could be correlated with the arithmetic mean of the nuclear effective charge (z) on each of the atoms: mp = 4 3 2 2 - 9 7 1 I
(92)
no statistics given For 18 such compounds Eq. 92 gave an average error of 86.6 degrees. Bosi^^^ developed a theory for predicting the melting points of alkali metal halides and alkaline earth oxides. He derived the following equation: T
zV
^
(93)
where z^ and z~ are the charges on the cation and anion respectively, e^ is the dielectric constant, K is the Boltzmann constant, r^ and r^ are the anionic and cationic radii, respectively, A//f is the latent heat of fusion, and A//j is the lattice energy (latent heat of sublimation); these latent heats were obtained from the literature. For 20 alkali metal halides, the average error in prediction of melting point was 59 degrees, while for 5 alkaline earth oxides it was 136 degrees. However, the average percentage error was about the same for the two series, since the alkaline earth oxides have much higher melting points than do the alkali metal halides. Kang et al.^^^ have used artificial neural networks and pattern recognition with chemical bond parameters to predict the melting point of CsClMn04 as 677 °C, reportedly in agreement with experiment. Horvath^^'* has briefly reviewed the prediction of melting points of inorganic compounds. It is apparent that there is as yet no consistent method for the prediction of melting points of inorganic compounds, even those of relatively simple composition. Clearly there is scope for much more work in this field, perhaps through the application of molecular orbital theory.
V. CONCLUSIONS Melting point is a readily measurable property of importance in many ways. As such, its prediction has attracted much interest. Early quantitative work concentrated on hydrocarbons and homologous series, and numerous equations were developed relating melting point to chain length. The odd-even alternation in melting point was generally dealt with through the use of separate equations for odd and even chain lengths. Melting points of these compounds can now be predicted with high accuracy. Hydrogen bonding is an important factor in melting point, and must be taken into account if good predictions are to be made. Few methods so far devised have incorporated hydrogen-bonding contributions; undoubtedly the best to date is the
The Prediction of Melting Point
171
group contribution method of Simamora and Yalkowsky.^^^ Even their method has a rather high standard error of prediction, and further work is needed to reduce this. Homopolymers represent the ultimate extrapolation of homologous series, and several of the equations devised to predict the melting points of homologous series have been used successfully to predict the melting points of polymers. Copolymers represent a more difficult problem, but there have been several reasonably successful attempts to predict their melting points, although generally on a more empirical basis. Elements and inorganic compounds have come in for quite a lot of attention, and a number of different approaches have been used, based generally on functions relating to electronic structure. There is, however, still no general method available for the prediction of melting points of inorganics. No work appears to have been done on the estimation of melting points of metallorganic complexes.
ACKNOWLEDGMENTS I am grateful to Prof. P.J. Duke and the late Prof. C. Silipo for translating some Russian and Italian texts respectively for me.
REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21.
Glasstone, S. Textbook of Physical Chemistry, 2nd. edn.; D. Van Nostrand: New York, 1946, p 461. Bean, V. E.; Wood, S. D. J. Chem. Phys. 1980, 72, 5838-5841. Berry, R. S. Sci. Amer. 1990, 263 (2), 50-56. Yalkowsky, S. H.; Valvani, S. C. / Pharm. Sci. 1980, 69, 912-922. Yalkowsky, S. H.; Banerjee, S. Aqueous Solubility Methods ofEstimationfor Organic Compounds; Marcel Dekker: New York, 1992; p 62. Meylan, W. H.; Howard, P. H.; Boethling, R. S. Environ. Toxicol. Chem. 1996, 75, 100-106. Hansch, C; Leo, A. Substituent Constants for Correlation Analysis in Chemistry and Biology; Wiley Interscience: New York, 1979, pp 18-43. Rekker, R. F; Mannhold, R. Calculation of Drug Lipophilicity; VCH: Weinheim, 1992. Meylan, W H.; Howard, R H. J. Pharm. Sci. 1995, 84, 83-92. Lipnick, R. L. In Practical Applications of Quantitative Structure-Activity Relationships (QSAR) in Environmental Chemistry and Toxicology; Karcher, W.; Devillers, J., Eds.; Kluwer Academic Publishers: Dordrecht, 1990, pp 281-293. Mackay, D. Personal communication, 1990. Partington, J. R. An Advanced Treatise on Physical Chemistry; Longmans Green & Company 1952, Vol. 3, pp 545-555. Trouton, R Phil. Mag. 1884,18, 54-57. Abramowitz, R.; Yalkowsky, S. H. Pharm. Res. 1990, 7, 942-947. Yalkowsky, S. H. Ind Eng. Chem. Fundamentals 1979,18, 108-111. Chickos, J. S.; Hesse, D. G.; Liebman, J. R J. Org. Chem. 1990, 55, 3833-3840. Chickos, J. S.; Braton, C. M.; Hesse, D. G.; Liebman, J. R J. Org. Chem. 1991, 56, 927-938. Dannenfelser, R. M.; Surendran, N.; Yalkowsky, S. H. SAR QSAR Environ. Res. 1993,1,273-292. Abramowitz, R.; Yalkowsky, S. H. Chemosphere 1990, 21, 1221-1229. Tsakanikas, R D.; Yalkowsky, S. H. Toxicol. Environ. Chem. 1988,17, 19-33. Huggins, M. L. J. Phys. Chem. 1939,43, 1083-1098.
172
JOHN C DEARDEN
22. Rory, P. J.; Vrij, A. J. Am. Chem. Soc. 1963, 85, 3548-3553. 23. Dannenfelser, R.-M.; Yalkowsky, S. H. Ind. Eng. Chem. Res. 1996, 35, 1483-1486. 24. Partington, J. R. An Advanced Treatise on Physical Chemistry; Longmans Green & Company: London, 1952, Vol. 3, pp 461-462. 25. Skau, E. L.; Arthur, J. C ; Wakeham, H. In Physical Methods of Organic Chemistry, 3rd edn.; Weissberger, A., Ed.; Interscience Publishers: New York, 1959, Part 1, pp 287-334. 26. Fumiss, B. S.; Hannaford, A. J.; Smith, P W. G.; Tatchell, A. R. Vogel's Textbook of Practical Organic Chemistry, 5th edn.; Longman: Harlow, 1989, pp 240-236. 27. Ford J. L.; Timmins, P. Pharmaceutical Thermal Analysis; Ellis Horwood: Chichester, 1989, pp 108-135. 28. Partington, J. R. An Advanced Treatise on Physical Chemistry; Longmans Green & Company: London, 1949, Vol. 1, pp 498-501. 29. Camelley, T. Phil. Mag. Sen 51882,13,112-130. 30. Partington, J. R. An Advanced Treatise on Physical Chemistry; Longmans Green & Company: London, 1952, Vol. 3, pp 462-463. 31. Mills, E. J. Phil. Mag. Sen 51884, 77, 173-187. 32. Baeyer, A. Berichte 1877,10,1286-1288. 33. Kipping, E S. J. Chem. Soc. 1894, 63,465-468. 34. Longinescu, G. G. J. Chim. Phys. 1903,1, 296-301. 35. Tsakalotos, D.-E. Compt. Rend. Acad. Sci. Paris II1906,143,1235-1236. 36. Lindemann, R A. Physik. Z 1910,11,609-612. 37. Robertson, R W. J. Chem. Soc. 1919, 775,1210-1223. 38. Prud'homme, M. J. Chim. Phys. 1920, 78, 359-361. 39. Lorenz, R.; Herz, W. Z Anorg. Allgem. Chem. 1922, 722 (2), 51-60. 40. Lyman, W. J. In Environmental Exposure from Chemicals; Neely, W.B.; Blau, G. E., Eds.; CRC Press: Boca Raton, FL, 1985, Vol. 1, pp 13-47. 41. Taft, R.; Stareck, J. J. Phys. Chem. 1930,34, 2307-2317. 42. Malone, G. B.; Reid, E. E. J. Am. Chem. Soc. 1929, 57, 3424-3427. 43. Partington, J. R. An Advanced Treatise on Physical Chemistry; Longmans Green & Company: London, 1952, Vol. 3, pp 463-465. 44. Beacall, T. Rec. Trav. Chim. 1928,47, 37-44. 45. Gamer, W. E.; Madden, R C ; Rushbrooke, J. E. J. Chem. Soc. 1926, 2491-2502. 46. Gamer, W. E.; King, A. M. J. Chem. Soc. 1929, 1849-1861. 47. Gamer, W. E.; Van Bibber, K.; King, A. E. J. Chem. Soc. 1931,1533-1541. 48. Timmermans, J., Les Constantes Physiques des Composes Organiques Cristallises; Masson et Cie: Paris, 1953, pp 256-273. 49. Powell, R. E.; Clark, C. R.; Eyring, H. J. Chem. Phys. 1941, 9, 268-273. 50. Mekenyan, O.; Dimitrov, S.; Bonchev, D. Eur. Polym. J. 1983, 79,1185-1193. 51. Austin, J. B. /. Am. Chem. Soc. 1930,52,1049-1053. 52. Lovell, E. L.; Hibbert, H. J. Am. Chem. Soc. 1939,61,1916-1920. 53. Merckel, J. H. C. Proc. Roy Acad. Amsterdam 1937,40,164-173. 54. Meyer, K. H.; van der Wyk, A. Helv. Chim. Acta 1937,20,1313-1320. 55. Moullin, E. B. Proc. Camb. Phil. Soc. 1938,34,459-464. 56. Seyer, W. R; Patterson, R. R; Keays, J. L. J. Am. Chem. Soc. 1944, 66,179-182. 57. Etessam, A. H.; Sawyer, M. R / Inst. Petrol. 1939,25, 253-262. 58. Gray, C. G. J. Inst. Petrol. 1943,29, 226-234. 59. Smittenberg, J.; Mulder, D. Rec. Trav Chim. 1948, 67, 813-825. 60. Fortuin, J. M. H. Rec. Trav. Chim. 1958, 77, 5-16. 61. Keyes, R. W. Phys. Rev 1959, 775, 564-567. 62. Benko, J. Acta Chim. Hung. 1959,27, 351-361. 63. Gold, P I.; Ogle, G. J. Chem. Eng. 1969, 76(1), 119-122.
The Prediction of Melting Point
173
64. Broadhurst, M. G. / Res. Nat. Bur. Stds. 1962, 66A, 241-249. 65. Broadhurst, M. G. J. Res. Nat. Bur Stds. 1966, 70A, 481-486. 66. Grigor'ev, S. M.; Pospelov, V. M. Sb. Nauchn. Tn, Ukr Nauchn.—Issled. Uglekhim. Inst. 1965, No.16, 153-173. 67. Eaton, E. O. Chem. Technol. 1971, 362-366. 68. Wachalewski, T. Postepy Fiz. 1970, 27, 403-412. 69. Syunyaeva, R. Z. Chem. Technol. Fuels Oils 1981, 77, 161-164. 70. Mackay, D.; Shiu, W. T.; Bobra, A.; Billington, J.; Chan, E.; Yeun, A.; Ng, C ; Szeto, F. U.S. Environmental Agency Report PB 82-230939; Athens, Georgia, 1982. 71. Seybold, P. G.; May, M. A.; Gargas, M. L. Acta Pharm. Jugosl. 1986,36, 253-265. 72. Kier, L. B.; Hall, L. H. Molecular Connectivity in Structure-Activity Analysis; Research Studies Press: Letchworth, 1986, pp 1-24. 73. Westwell, M. S.; Searle, M. S.; Wales, D. J.; Williams, D. H. J. Am. Chem. Soc. 1995, 777, 5013-5015. 74. Hanson, M. P.; Rouvray, D. H. In Graph Theory and Topology in Chemistry; King, R. B.; Rouvray, D. H., Eds.; Elsevier: Amsterdam, 1987, pp 201-208. 75. Adler, N.; Kova5ie-Beck, L. In Graph Theory and Topology in Chemistry; King, R. B.; Rouvray, D. H., Eds.; Elsevier: Amsterdam, 1987, pp 194-200. 76. Needham, D. E.; Wei, I.-C; Seybold, P G. J. Am. Chem. Soc. 1988, 770,4186-4194. 77. Pogliani, L. J. Phys. Chem. 1995, 99, 925-937. 78. Somayajulu, G. R. Int. J. Thermophys. 1990, 77, 555-572. 79. Kreglewski, A. Bull. Acad. Polon. ScL, Ser. Sci. Chim. 1961, 9, 163-167. 80. Kreglewski, A.; Zwolinski, B. J. J. Phys. Chem. 1961, 65, 1050-1052. 81. Riazi, M. R.; Al-Sahhaf, T. A. Ind Eng. Chem. Res. 1995, 34, 4145-4148. 82. Cherqaoui, D.; Villemin, D.; Kvasnicka, V. Chemom. Intell. Lab. Systems 1994, 24, 117-128. 83. Todeschini, R.; Gramatica, P.; Provenzani, R.; Marengo, E. Chemom. Intell. Lab. Systems 1995, 27, 221-229. 84. Todeschini, R.; Gramatica, P SAR QSAR Environ. Res. 1997, 7, 89-115. 85. Marano, J. J.; Holder, G. D. Ind. Eng. Chem. Res. 1997, 36, 1895-1907. 86. Cramer, R. D. /. Am. Chem. Soc. 1980,102, 1837-1849. 87. Cramer, R. D. J. Am. Chem. Soc. 1980,102, 1849-1859. 88. Charton, M.; Charton, B. I. In QSAR in Design ofBioactive Compounds; Kuchar, M., Ed.; J.R. Prous: Barcelona, 1984; pp 41-51. 89. Dearden, J. C ; Rahman, M. H. Mathl. Comput. Modelling 1988, 77, 843-846. 90. Verloop, A.; Hoogenstraaten, W; Tipker, J. In Drug Design; Ariens, E. J., Ed.; Academic Press: New York, 1976, Vol. 7, pp 165-207. 91. Dearden, J. C. Sci. Total Environ. 1991,109/110, 59-68. 92. Abraham, M. H. Personal communication, 1990. 93. Murugan, R.; Grendze, M. P.; Toomey, J. E.; Katritzky, A. R.; Karelson, M.; Lobanov, V.; Rachwal, P CHEMTECH1994, 24 (9), 17-23. 94. Mason, D.; Bernstein, J. Mol. Cryst. Liq. Cryst. 1994,242, 179-191. 95. Abramowitz, R., PhD. Thesis, University of Arizona, 1986. 96. Yalkowsky, S. H.; Krzyzaniak, J. E; Myrdal, P B. Ind Eng. Chem. Res. 1994, 33, 1872-1877. 97. Tesconi, M.; Yalkowsky, S. H. In Estimating Chemical Properties for the Environmental and Health Sciences: a Handbook of Methods; Boethling, R. S., Mackay, D., Eds.; Ann Arbor Press: Chelsea, MI, 1999, in press. 98. Bhattacharjee, S.; Rao, A. S.; Dasgupta, P Computers Chem. 1991, 75, 319-322. 99. Medic-Sarie, M.; Nickolie, S.; MatijeviC-Sosa, J. Acta Pharm. 1992,42, 153-167. 100. Charton, M.; Charton, B. I. Abstn 27th M.A.R.M., Am. Chem. Soc. 1993, 129-130. 101. Charton, M.; Charton, B. J. Phys. Org. Chem. 1994, 7, 196-206. 102. Charton, M. Personal communication, 1997.
174 103. 104. 105. 106. 107. 108. 109. 110. 111. 112. 113. 114. 115. 116. 117. 118. 119. 120. 121. 122. 123. 124. 125. 126. 127. 128. 129.
130. 131. 132. 133. 134. 135. 136. 137. 138. 139.
JOHN C. DEARDEN Pogliani, L. J. Phys. Chem. 1996,100, 18065-18077. Pogliani, L. Med. Chem. Res. 1997, 7, 380-393. Todeschini, R.; Gramatica, P. Quant. Struct.-Act. Relat. 1997,16,120-125. Chiorboli, C ; Gramatica, P; Piazza, R.; Pino, A.; Todeschini, R. SAR QSAR Environ. Res. 1997, 7, 133-150. Todeschini, R.; Vighi, M.; Finizio, A.; Gramatica, P SAR QSAR Environ. Res. 1997, 7, 173-193. Yalkowsky, S. H.; Valvani, S. C.; Roseman, T. J. J. Pharm. Sci. 1983, 72, 866-870. Rubino, J. T. J. Pharm. Sci. 1989, 78, 485-489. Thomas, E.; Rubino, J. Int. J. Pharm. 1996,130, 179-183. Anderson, B. D.; Conradi, R. A. J. Pharm. Sci. 1985, 74, 815-820. Przezdziecki, J.; Sridhar, T Am. Inst. Chem. Eng. J. 1985, 31, 333-335. Walters, A. E.; Myrdal, P B.; Yalkowsky, S. H. Chemosphere 1995, 31, 3001-3008. Horvath, A. L. Molecular Design: Chemical Structure Generation from the Properties of Pure Organic Compounds; Elsevier: Amsterdam, 1992, pp 144-157. Joback, K. G. S.M. Thesis, Massachusetts Institute of Technology, Cambridge, MA, 1984. Joback, K. G.; Reid, R. C. Chem. Eng. Comm. 1987, 57, 233-243. Reid, R. C ; Prausnitz, J. M.; Poling, B. E. The Properties of Gases and Liquids, 4th edn.; McGraw-Hill: New York, 1987, pp 25-26. Simamora, P; Yalkowsky, S. H. SAR QSAR Environ. Res. 1993,1, 293-300. Simamora, P; Miller, A. H.; Yalkowsky, S. H. J. Chem. Inf Comput. Sci. 1993, 33, 437-440. Simamora, P; Yalkowsky, S. H. Ind Eng. Chem. Res. 1994, 33, 1405-1409. Krzyzaniak, J. F; Myrdal, P B.; Simamora, P; Yalkowsky, S. H. Ind Eng. Chem. Res. 1995, 34, 2530-2535. Constantinou, L.; Gani, R. Am. Inst. Chem. Eng. J. 1994,40, 1697-1710. Tu, C.-H. J. Chinese Inst. Chem. Eng. 1994, 25,151-154. Tu, C.-H.; Wu, Y-S. J. Chinese Inst. Chem. Eng. 1996, 27, 323-328. Yalkowsky, S. H.; Dannenfelser, R.-M.; Myrdal, P.; Simamora, P.; Mishra, D. Chemosphere 1994, 28, 1657-1673. Yalkowsky, S. H.; Myrdal, P.; Dannenfelser, R.-M.; Simamora, P. Chemosphere 1994, 28, 1675-1688. Lyman, W. J.; Reehl, W. F; Rosenblatt, D. H. (Eds.). Handbook of Chemical Property Estimation Methods; McGraw-Hill: New York, 1982. Lyman, W. J.; Potts, R. G.; Magil, G. C. User's Guide to CHEMEST; Arthur D. Little: Cambridge, MA, 1984; pp 4.9.1-4.9.9. Grain, C. F; Lyman, W. J. Interim Report on Task 29 of Environmental Protection Agency Contract No. 68-01-6271; U.S. Environmental Protection Agency, Office of Toxic Substances: Washington DC, 1983. Boethling, R. S.; Campbell, S. E.; Lynch, D. G.; LaVeck, G. D. Ecotoxicol. Environ. Saf 1988, 15, 21-30. Lynch, D. G.; Tirado, N. F; Boethling, R. S.; Huse, G. R.; Thom, G. C. Sci. Total Environ. 1991, 109/110, 643-648. Hunter, R.; Faulkner, L.; Culver, F; Hill, J. QSAR, Structure-Activity Based Chemical Modeling and Information Software; Montana State University: Bozeman, Montana, 1985. Syracuse Research Corporation MPBPVP PC-based program ver. 1.25; Syracuse, NY, 1997. Stein, S. E.; Brown, R. L. J. Chem. Inf Comput. Sci. 1994, 34, 581-587. CambridgeSoft Corporation ChemPropPro PC-based program; Cambridge, MA, 1998. Hory, P J. J. Chem. Phys. 1949,17, 223-240. Eby, R. K. J. Appl. Phys. 1963, 34, 2442-2445. Hay, J. N. J. Polym. Sci., Polym. Chem. Ed. 1976,14, 2845-2852. Buckley, C. P; Kovacs, A. J. Colloid Polym. Sci. 1976, 254, 695-715.
The Prediction of Melting Point
175
140. Van Krevelen, D. W. Properties of Polymers: Their Estimation and Correlation with Chemical Structure, 2nd edn.; Elsevier: Amsterdam, 1976, pp 112-127. 141. Wunderlich, B.; Czomyj, G. Macwmols. 1977,10, 906-913. 142. Mandelkem, L.; Stack, G. M. Macromols. 1984, 77, 871-878. 143. Cantor, R. S.; Dill, K. A. Macromols. 1985,18, 1875-1882. 144. Starkweather, H. W. Macromols. 1986,19, 1131-1134. 145. Frushour, B. G. Polym. Bull. 1984,11, 375-382. 146. Tanaka, N. Sen-i Gakkaishi 1986, 42, T606-T609. 147. Polikarpov, V. M.; Matukhina, E. V.; Polyakov, Yu. P.; Matveichev, P. M.; Ushakov, N. V.; Bespalova, N. B.; Razumovskaya, I. V.; Antipov, E. M. Vysokomol. Soedin., Sen A 1991, 33, 1088-1092. 148. earlier, V.; Devaux, J.; Legras, R.; McGrail, R T. Macromols. 1992, 25, 6646-6650. 149. Tan, T. T. M.; Rode, B. M. J. Polym. Sci.: Part B: Polymer Phys. 1996, 34, 2139-2143. 150. Sumpter, B. G.; Noid, D. W. J. Thermal Anal. 1996,46, 833-851. 151. Burkhardt, T. J.; Murata, M.; Vaz, R. J. Macromol. Symp. 1995, 89, 321-333. 152. Sharma, B. K. Indian J. Phys. 1979, 53B, 174-182. 153. Kutolin, S. A.; Vashukov, I. A.; Kotyukov, V. I. Izvest. Akad. Nauk SSSR, Neorg. Mater 1978,14, 215-218. 154. Kutolin, S. A.; Kotyukov, V. I. Izvest. Akad. Nauk SSSR, Neorg. Mater. 1979, 75, 96-99. 155. Kutolin, S. A.; Kotyukov, V. I.; Komarova, S. N.; Smimova, E. G.Zhur Fiz. Khim. 1980,54,35-39. 156. Kutolin, S. A.; Smimova, E. G.; Komarova, S. N. Zhur Fiz. Khim. 1982, 56, 2799-2802. 157. Keller, O. L.; Burnett, J. L.; Carlson, T. A.; Nestor, C. W. J. Phys. Chem. 1970, 74, 1127-1134. 158. Kazragis, A. Document deposited with VINITI (All-Union Institute of Scientific and Technical Information), VINITI No. 1223-78,1978 (C.A. 91:112659). 159. Kazragis, A.; Bergman, G. A.; Raudeliuniene, A.; Liksiene, R. Document deposited with VINITI (All-Union Institute of Scientific and Technical Information), VINITI No. 1398-79, 1979 (C.A. 92:203820). 160. Kutolin, S. A.; Kotyukov, V. I.; Kotlevskaya, N. L. Zhur Fiz. Khim. 1980, 54, 633-637. 161. Bonchev, D.; Kamenska, V. J. Phys. Chem. 1981, 85, 1177-1186. 162. Gomez, L.; Dobry, A.; Diep, H. T Phys. Rev. B1997,55,6265-6271. 163. Li, C ; Guo, J.; Qin, P; Chen, R.; Chen, N. J. Phys. Chem. Solids 1996, 57, 1797-1802. 164. Reddy, R. R.; Kumar, M. R.; Rao, T. V. R.; Ahammed, Y. N. J. Phys. Chem. Solids 1994, 55, 523-524. 165. Bosi, L. G. Fis. 1987, 28, 265-268; Phys. Status Solidi A 1987,101, Kl 11-Kl 14. 166. Kang, D. S.; Wang, X. Y; Li, C. H.; Zhau, Q. B.; Liu, H. L.; Chen, N. Y Acta Chim. Sinica 1997, 55, 463-466.
This Page Intentionally Left Blank
THE APPLICATION OF THE INTERMOLECULAR FORCE MODEL TO PEPTIDE AND PROTEIN QSAR
Marvin Charton
I. Introduction A. Intermolecular Forces B. The Intermolecular Force (IMF) Equation C. Side-Chain Effect Composition D. The IMF Equation for Peptide and Protein Bioactivity II. The Bioactivity Mechanism A. Transport B. Receptor-Substrate Binding . C. Chemical Reaction III. Peptide Bioactivities A. Types of Structural Variation in Peptides B. Oxytocin Analogue Uterotonic Inhibitors of Oxytocin C. Peptide Renin Inhibitor QSAR IV. Protein Bioactivities A. Limitation of the Model in Protein QSAR B. Types of Protein Bioactivity Data Sets
Advances in Quantitative Structure Property Relationships Volume 2, pages 177-252. Copyright © 1999 by JAI Press Inc. All rights of reproduction in any form reserved. ISBN: 0-7623-0067-1 177
178 178 178 181 181 182 182 182 183 183 183 184 188 208 208 208
178
MARVIN CHARTON
C. Human Growth Hormone (hGH) D. Subtilisin BPN' E. Hirudin F. L. casei Thymidylate Synthase G. 7: r/iermop/i//M5 Glutamyl-tRNA Synthase H. Rat Trypsin I. Human Growth Hormone II V The IMF Method as a Bioactivity Model A. Peptide and Protein Bioactivities B. The Hansch-Fujita Model VI. Appendix: Statistics Reported for the Correlations Abbreviations References
209 213 235 237 240 242 247 248 248 248 250 251 251
I. INTRODUCTION A. Intermolecular Forces Many phenomena depend on the difference in intermolecular forces between initial and final states. Partition, distribution, solubility, phase changes such as melting point and boiling point; chromatographic properties such as retention times in gas chromatography, relative flow rates in paper and thin layer chromatography, and capacity factors in high performance liquid chromatography; charge transfer and hydrogen bonding complex formation are examples, as are bioactivities. In the Hansch-Fujita method of modeling bioactivity the most important parameter is a measure of hydrophobicity-lipophilicity such as log P where P is the partition coefficient, or log k\ where k' is the high-pressure liquid chromatography capacity factor. These quantities are composite parameters that depend on intermolecular force differences.^""^ Composite parameters represent two or more different structural effects; pure parameters represent a single effect. In modeling bioactivities and other properties hydrophobicity or lipophilicity parameters can be replaced by parameters that represent intermolecular forces. This method has been successfully applied to the properties and bioactivities of amino acids, peptides, and proteins,"^'^ and to opiate receptor binding of 4'-substituted naloxone phenylhydrazones.^ Here we present a detailed description of the application of the method to some examples of peptide and protein bioactivities in order to show how to use it. B. The Intermolecular Force (IMF) Equation 2 x is a measurable quantity of interest that varies with molecular structure; e is the intermolecular force energy; X the variable structural feature; and i and f indicate the initial and final states. Then: Q^ = E^-E. = As
(1)
Application of IMF Model
179
The intermolecular forces and their parameterization are summarized in Table
Intermolecular Force Parameterization Parameterization of the intermolecular forces described in Table 1 results in the inter/intramolecular force (IMF) equation. In its most general form [3] it is, Qx = ^^tx + ^^dx + ^^ex + ^^x + ^^x + H.n^x + f^2\x + li^ + B^^n^^ + B^^n^^ + Sy^^ + 5^
(2)
where: • Oix is the localized electrical effect parameter. It is identical to the GJ and Op constants.^ • G^x is the intrinsic delocalized electrical effect parameter.^ • G^^ is the electronic demand sensitivity electrical effect parameter.^ • a is a polarizability parameter. ^"^ It is defined by the equation, MRy-MR„
MRy
(3)
where MR^ and M/?^ are the group molar refractivities of X and H, respectively. There are many other polarizability parameters which can be used, they all have the dimensions of volume and are highly colinear in each other. • rifj and n^ are hydrogen bonding parameters;'* n^ is equal to the number of OH or NH bonds in X, n^ is equal to the number of lone pairs on O or N atoms in X. This parameterization is deficient as it accounts for the probability of hydrogen-bond formation but does not account for the intensity of the interaction. It does frequently give reasonable results however.
Table 1. Intermolecular Forces and the Quantities Upon which They Depend Intermolecular Force Molecule-molecule Hydrogen bonding (hb) Dipole-dipole (dd) Dipole-induced dipole (di) Induced dipole-induced dipole (ii) Charge transfer (ct) Ion-molecule lon-dipole (Id) Ion-induced dipole (Ii) Note:
Quantity Ehb Dipole moment Dipole moment, polarizability Polarizability Ionization potential, electron affinity Ionic charge, dipole moment Ionic charge, polarizability
Abbreviations for intermolecular forces are in parentheses.
180
MARVIN CHARTON
• / is the ionic charge parameter.^ It takes the value 1 when the substituent is ionized and 0 when it is not. • AZj) and rip^ are charge transfer parameters',^"^ n^ is 1 when X acts as an electron donor and 0 when it cannot; n^ is 1 when X can function as an electron acceptor and 0 when it cannot. • \|/ is a steric effect parameterization.^"^^ Steric Effect Parameterizations
There are several possible parameterizations of the steric effect.^"^^ Steric effects depend on the position in the side chain and it is necessary for the parameterization to account for this. The simplest is monoparametric. An example of such a parameter is \), a composite steric parameter based on van der Waals radii (ry) that emphasizes the steric effect at the first atom of the side chain. Thus: S\\f = Sv
(4)
Monoparametric steric parameters have a fixed dependence on side-chain position; this is why they are composite. The side chain is numbered starting with the atom which is bonded to the rest of the amino acid residue. Accounting for steric effects anywhere in the side chain requires additional parameters. This is feasible only when a sufficiently large data set is available. There are four multiparametric models available to choose from:^"^^ 1. The simple branching (SB) equation: (5) /=i
This model accounts for the steric effect at each atom of the side-chain skeleton (longest chain) by counting the number of branches (atoms other than H) bonded directly to it. With amino acids, peptides, and proteins it is generally unnecessary to go further than the third skeletal atom in the parameterization. The SB equation uses the pure parameters /Zj, /t2, and riy^'^^ The model applies only to skeletal atoms that have or are assumed to have tetrahedral geometry. It assumes that the effect of all branching atoms attached to a skeletal atom is the same. Due to the existence of nonequivalent conformations this assumption is only a crude first approximation. Another problem associated with the SB equation is that a high degree of collinearity with a is generally found. 2. The extended branching (EB) equation:
Application of IMF Model
181
This method distinguishes between the first, second, and third branches on a tetrahedral atom at the expense of many more parameters. Few peptide or protein data sets are large enough to permit its use.^'^^ 3. A hybrid model which is a combination of the \) steric parameter and the simple branching equation:^^ (7) 5\|/ = 5\) + ^ a.n. 1=1
4. The segmental model: (8) /=i
where D • is the steric parameter of the smallest face of the i-ih segment of the side chain. The i-th segment consists of the f-th atom of the longest chain and all the groups attached to it.^^ C. Side-Chain Effect Composition For comparing structural effects in different data sets we make use of the percent contribution of each independent variable in the regression equation, C-,^ defined as, llOOapcl
(9)
where a- is the regression coefficient of the i-ih independent variable and x- is its value for the reference residue. His is the reference side chain in these studies. It was chosen because it has a value other than zero for each parameter in the correlation equation. Comparisons of side chain structural contributions refer therefore to those of the His side chain. D. The IMF Equation for Peptide and Protein Bioactivity A dependence on charge transfer interactions in modeling the properties and bioactivities of amino acids, peptides, and proteins is rarely found. Amino acid side chains are bonded to an sp^ hybridized carbon atom, therefore no terms in a^^ or o^^ are necessary. The amino acid moiety has a large dipole moment making the term in ^i unnecessary. Then IMF equation takes the form: Q^ = LG^ + -^ A % + H^rifj^ + H^n^^ + li^ •^Sy^f^-^B'
(10)
182
MARVIN CHARTON
II. THE BIOACTIVITY MECHANISM In order to justify the application of the IMF model to bioactivities, it is necessary to consider the mechanism of bioactivity. The mechanism given here is a modification of that proposed by McFarland."^'^^ The bioactivity is considered to be dependent on one or more of the following steps: transport, receptor interaction, and chemical reaction. A. Transport
The bioactive substance (bas) enters the organism at some point. It then moves through an aqueous phase to a receptor (rep) site with which it is to interact. This movement may involve diffusion through the medium or random binding to a biopolymer molecule such as plasma protein that carries it. During transport the bas is likely to cross one or more biomembranes. The crossing of a biomembrane begins with the transfer of the bas from the initial aqueous phase (j)^ to the anterior membrane surface (ams). It then proceeds to the posterior membrane surface (pms) either by diffusion or by binding to a lipid-soluble membrane carrier molecule (mem) which transports it. The bas is then transferred from the surface to a second aqueous phase (t)f. Each step in this process is equivalent to a transfer from one phase to another and is therefore a function of the difference between intermolecular forces involving medium and bas in the initial and final phases. B. Receptor-Substrate Binding
The interaction between receptor and substrate occurs in two stages: recognition and tight complex formation. Recognition
The rep must distinguish the substrate from all of the other chemical species present in the medium which surrounds it. The rep consists of some number of functional groups attached to a molecular framework that is part of a biopolymer. These functional groups have a particular orientation in space. To be recognized, a substrate must have functional groups that are capable of interacting with those of the rep and have the proper spatial arrangement to do so. Recognition results in the formation of a loose substrate-receptor complex (bas—rep) bound by intermolecular forces. The interactions involved in recognition are directed. Examples of strongly directed interactions are hydrogen bonding and salt bridge formation. Recognition therefore depends on the difference between the intermolecular forces involving both the bas and the rep with the aqueous phase (e-^ and e.^), and the intermolecular forces between substrate and receptor in the loose complex (e,^).
Application
of IMF Model
183
Tight Complex Formation
Conformational changes occur in the substrate and/or the receptor that maximize the intermolecular forces between the two. This results in an increase in binding energy that accompanies the formation of a tight complex, bas-rcp. The process is a function of the difference in intermolecular forces between the initial loose complex and the final tight complex and the difference in conformational energy between the initial and final conformations of bas—rep and bas-rcp respectively. C. Chemical Reaction
The tight complex proceeds along a reaction coordinate to a transition state (bas—rep)* that decomposes into a receptor-product complex (rcp-prd) by the formation and/or cleavage of covalent bonds. The rcp-prd complex then dissociates into solvated receptor and solvated product. The overall mechanism is summed up in Scheme 1. Each step in the sequences described above involves a difference in intermolecular forces between an initial and a final state. The IMF equation was designed to model such differences. Then it should be capable of modeling bioactivities.
III. PEPTIDE BIOACTIVITIES A. Types of Structural Variation in Peptides
Peptides can undergo substitution at one or more of several different sites."*'^ The types of substitution are: 1. One amino acid residue is replaced by another at a given position in the peptide. This is represented by Aax' where Aax is the residue with side chain X and / is its position in the peptide. 2. Substitution at the amino terminus of a linear peptide is represented by X^. 3. Substitution at the carboxyl terminus is represented by X^.
1. bas((|)i) ^ bas((j)2) ^ 2. bas((|)2) ^ 3. bas-rcp ^
bas-plp ^ bas-ams bas-pms ^ bas-mcm bas—rep ^ bas-rcp (bas-rcp)* ^ rcp-prd ^ rep + products
Scheme 1. Abbreviations: bas, bioactive substrate; ([), phase; pip, plasma protein; mem, membrane carrier molecule; ams, anterior membrane surface; pms, posterior membrane surface; bas—rep, loose substrate-receptor complex; bas-rcp, tight substrate-receptor complex; (bas=^rcp) , transition state; rcp-prd, receptor-product complex.
184
MARVIN CHARTON
3. Substitution at the carboxyl terminus is represented by ^ . 4. Substitution at the nitrogen atom of a peptide bond is represented by X^'^ where / and; are the positions of the residues attached to the atom undergoing substitution. 5. One or more amino acid residues in a peptide may be replaced by groups other than a-amino acids. This is represented by X^'-' where ij,... designates the positions of residues which are being replaced. 6. The H atom bonded to the a C atom of the i-ih residue may be replaced by a substituent R. This is represented by R\ 7. Chiral substitution in which the normal configuration of the amino acid to be replaced by its enantiomer, designated C. Consider the peptide l a and its derivative lb: Ser-Ala-Thr-His-Asp-Arg-Phe-Ile-Val-Tyrla tBuOC02NHSer-NHCMe2(C=0)-thr-His-(Et)-Asp-Arg-AaxNHCH2CH=CHCH2CO-Tyr-C02Ph l b The X^ substitution is the rBuOC02 group, the X^ substituent is the OPh group, the X^^'^ substituent is the Et group, the NHCH2CH=CHCH.C0 group is X^^'^ the side chain of the amino acid in position 7 is variable, R substitution occurs at position 2, and C^ substitution at position 3. B. Oxytocin Analogue Uterotonic Inhibitors of Oxytocin
/7A2 values^^'^^ (A2 is the IC5Q or ID5Q) for 155 structural analogues of oxytocin exhibiting an inhibitory effect on oxytocin in isolated rat uterus in the absence of magnesium were studied. Generally these substrates were nonapeptides substituted at all positions except 5; some of them had X^ substitution as well. Free-Wilson Analysis
The data set was first subjected to a Free-Wilson analysis^"^'^^ thus determining the side chain effects at positions 1 and 2. The Free-Wilson equation is,
p
where a is the contribution of the side chains in position p to pA2 and a^ the contribution of the invariant part of the substrate. Then,
j
p
where A. is the difference between/7A2 for the i-ih substrate and the algebraic mean of the /7A2 values for the data set; a- is the contribution to the activity of the7-th
Application of IMF Model
185
side-chain j is in the position p in the i-ih substrate and 0 otherwise; and a^^ is a residual representing the deviation of the data point from the line. The sum of all the side-chain contributions / at the position p is normalized by the equation:
1
J
Use of the Free-Wilson method gave the side-chain contributions pA^^ and pA2x reported in Table 2}^ Structural Dependence of the Side-Chain Contributions Substitution at position 1 of the peptide involved three different sites as the amino acid residue at this position has the form, ZNHCRX(C=0) where X and Z are substituents and R may be H or Me. If a substituent is part of a disulfide bridge it is considered to be the X group. If only one substituent is present it is considered to be the X group. Note that not all of the substitutions at position 1 involve amino acids. None of the X groups had OH or NH bonds, or except for Mpa(O) had lone pairs on O or N atoms; and no X group was likely to ionize. Thus, the X group parameterization was a^, %, and v^. The Z group required all of the IMF parameters. The R group was accounted for by the parameter n^^ which took the values 1 when R was Me and 0 otherwise. All parameters used are given in Table 3. The correlation equation was: P^2x = ^x^ix -^ ^x^x + ^x^x + ^z^iz + ^zf^z + ^z^z + ^i«//z Table 2. pA^^ and pA^^^ Values pyA;;^ Values X^, P^}Xf SpA^ AcCys, - 0 . 2 8 , 0.13; AcPen^ - 1 . 9 8 , 0 . 2 1 ; BaCys, - 0 . 1 0 , 0.22; Bta, - 0 . 2 2 , 0.16; CmCys, - 0 . 3 8 , 0.10; Cys, - 0 . 1 4 , 0.06; Dpe, 0.23, 0.07; GlyCys, - 0 . 5 8 , 0.22; Mep, 0 . 4 1 , 0.08; MgCys, - 0 . 1 6 , 0.22; Mma, - 0 . 7 5 , 0 . 2 1 ; M m p , - 0 . 3 1 , 0.08; M p a ( 0 ) ^ - 0 . 8 2 , 0.26; MsCys, 0.09, 0.16; Pen, 0.18, 0.08; p e n ^ - 0 . 5 9 , 0 . 2 1 ; PvCys, -0.1 7, 0.22; SarCys, - 0 . 9 1 , 0.22; TgCys, - 1 . 8 6 , 0.23; Mpp, 0 . 4 1 , - .
p/\2x Values /\ax,p/\2x, 5 p ^ ; D b t , - 0 . 3 3 , 0 . 1 6 ; l l e , - 0 . 4 5 , 0.23; L e u , - 0 . 1 3 , 0 . 1 7 ; l e u , - 1 . 5 2 , 0 . 2 4 ; Phe, - 0 . 1 1 , 0.10; phe, - 0 . 4 6 , 0.18; Phe(4-Ab), - 0 . 2 1 , 0.22; Phe(4-Et), 0.30, 0.14; phe(4-Et), 0.88, 0.12; phe(F5), - 0 . 7 4 , 0 . 2 1 ; Phe(4-Me), 0.30, 0.15; Phe(4-Pa), - 0 . 2 9 , 0.22; trp, -0.01,0.13; Tyr, - 0 . 2 4 , 0.06; tyr, - 0 . 0 5 , 0.25; Tyr(Bu), - 0 . 6 9 , 0.16; Tyr(Et), 0.17, 0.07; tyr(Et), 0.82, 0.14; Tyr(3-I), 0.02, 0.16; Tyr(Me), 0.28, 0.07; Tyr(3-Me), - 0 . 2 2 , 0 . 2 1 ; tyr(3-N02), -0.73,0.21. Note: "Excluded from the correlation.
186
MARVIN CHARTON
Table 3. Amino Acid Side-Chain Parameters for Groups in Positions 1 and 2 ^IX
^
%_
^
^
^z
'^HZ
"nZ
Jz_
^Me
Residue(l) AcCys
0.12
0.128
0.62
0.28
0.139
0.50
1
3
0
0
AcPen
0.09 0.12
0.221
1.24
0.139
0
0.208
1 1
0
0.62
0.50 0.50
3
0.128
0.28 0.35
-0.01 0.12
0.093 0.128
0.56 0.62
3 0 4
0 0
0 0
0.12
0.128
1
0 1
0
Cys Dpe
0.09
0
0
GlyCys
BaCys Bta CmCys
Mep
0.000
0.00
0
0.108 0.044
0.50
0.62
0.00 0.23 0.17
0.35
3 2
0.221
1.25
0.00
0.000
0.00
0
0
0.12
0.128
0.62
0.30
0.173
0.50
3
4
0
0
0.09
0.318
1.95
0.00
0.000
0.00
0
0
0
0
0
MgCys
0.12
0.128
0.62
0.32
0.334
0.50
3
0
0
0
Mma Mmp
0.27
0.082
0.60
-0.01
0.046
0.52
0
0
0
1
0.12
0.128
0.62
-O.01
0.046
0.52
0
0
0
1
Mpa
0.12
0.128
0.32
0.00
0.000
0.00
0
0
0
0
Mpa(O) MsCys Mpp Pen
0.21 0.12 0.11
0.126 0.128
0.77 0.62
0.00 0.42
0.000 0.162
0.00 0.80
0 1
0
0 0
0 0
0.339 0,221
1.50 1.24
0.00 0.17
0.000 0.044
0.00 0.35
0 2
5 0 1
0 1
0 0
0.128
0.62
0.28
0.279
0.50
1
0
0
0.128 0.128
0.62 0.62
0.30 0.32
0.219 0.533
0.50 0.50
2 5
0 0
0 0
PvCys SarCys TgCys Residue(2) Dbt lie Leu Phe Phe(4-Ab)
0.09 0.12 0.12 0.12
^
^
'ji_
"HX
^nX
0.06
0.456
0.7
1
2
1
-0.01 -0.01 0.03 0.04
0.186 0.186 0.290 0.503
1.02
0 0 0 1
0 0 0 3
0 0 0 0
^IX
0.98 0.70 0.70
Phe(4-Et)
0.03
0.383
0.70
0
0
0
Phe(F5)
0.12
0.285
0.70
0
0
0
Phe(4-Me)
0.03
0.336
0.70
0
0
0
Phe(4-Pa)
0.04
0.470
0.70
1
3
0
Trp
0.00
0.409
0.70
1
0
0
Tyr
0.03
0.298
0.70
1
2
0
Tyr(Bu)
0.03
0.489
0.70
0
2
0
Tyr(Et)
0.03 0.04
0.391 0.427 0.344
0.70
0 1
2
0
2
0
0 1 1
2 2
0 0 1
Tyr(3-I) Tyr(Me) Tyr(3-Me)
0.03 0.03
Tyr(3-N02)
0.06
0.344 0.360
0.70 0.70 0.70 0.70
6
3 4 10
Application of IMF Model
187
+ H2nnZ + fiz + B,^e^Me + B'
(14)
The best regression equation was obtained on the exclusion of the data points for AcPen, Mpa(O), and Phe, it is, pA21 = 2.99(±0.932)a;^+ 1.42(±0.667)a^2 -3.42(±1.10)a2- 0.317(±0.0967)n^2 +
014Q(±0-0609)A2„2
+ 0.416(±0.253)i2- 0.514(±0.198)
(15)
100/?^ 86.08; F, 11.34; S^^^, 0.251; 5^, 0.477; n, 18. r: a^^^ oc^, 0.750; a^^' «//Z' 0.676; Ci^, n^^, 0.777; a^, n^^^' ^•'^^l* ^c^, n^^, 0.916; n^^' '^nZ' ^•^^'^- ^a;^* ^^-'^^ Q/Z' 3^-8» ^ocZ» 1^-^' ^/i//Z' 8-^5' ^nnZ' ^'^^^ ^i' ^l'^'
Statistics obtained for a regression equation are reported directly below it throughout this work. The r values given are those zero-th order partial correlation coefficients that are significant at confidence levels equal to or greater than 90.0 %. The C values show the composition of the substituent effect for the reference residue, His. phe was not included in the correlation because it differs in configuration from the other members of the set. The failure of the Mpa(O) replacement to fit the model may be due to the inability of the model to account for the hydrogen-bonding capacity of the sulfonyl group.^ The failure of the AcPen replacement to fit the model may be due to the inability of the model to account for the hydrogen bonding of the carbonyl group. The results suggest that the Z group is involved in binding by ii (dispersion) interactions, and the X group by hydrogen bonding and van der Waals (vdW) (dd, di, ii) interactions. The R group has no apparent effect on the activity. Substitution at position 2 involves only the replacement of one amino acid residue by another. Again, the parameters used are reported in Table 3. The correlation equation used was: M2X = ^^^X + ^^X
+ ^1«//X + ^2«nX + ^^X + ^X^X + ^^
(1^)
Substitution at this position is complicated however by the inclusion in this data set of both D- and L-amino acid residues. The configuration was parameterized by an indicator variable that took the value 1 for the D configuration and 0 for the L configuration. The argument for this approach is that a difference in configuration should result only in a difference in the tightness of the tight complex which might be expressed by a constant. As the approach was totally unsuccessful the difference in binding between enantiomers is not constant. The data set was therefore separated into a D subset and an L subset and these subsets were separately correlated with Eq. 16. On exclusion of the data points for Phe and Tyr(Bu) the best regression equation obtained for the L set is.
188
MARVIN CHARTON
P\L-X
= -8.40(±3.37)a^ - 0.420(±0.0624)n^^ - 2.89(±0.530)%+ 2.51(10.478)
(17)
where 100/?^ 92.40; AlOOi?^ 90.71; F, 32.44; 5^^,, 0.0889; 5°, 0.338; n, 12. r: o,, I, 0.896. Q p 21.6; C„^, 13.5; C^, 64.9. As a and \) are highly collinear (r = 0.800) the dependence on \) may indicate the presence of both steric effects and polarizability. Furthermore, there is little variation in the steric parameter within the data set. Thus the substituent is probably involved in vdW interactions and hydrogen bonding as well as exerting a steric effect. The best regression equation for the D enantiomers is, P\D-X
= 9.54(±2.24)a - 0.964(±0.473)i - 3.21(±0.737)
(18)
lOOR^, 79.74; AlOO/?^ 76.36; F, 9.839; S^^,, 0.434; S^, 0.569; n, 8. Q , 69.5; C^, 30.5. It seems likely though not certain that ii and li interactions are involved. C. Peptide Renin Inhibitor QSAR
Much has been published on bioactivities of peptide renin inhibitors. ^^-^,^ They are of interest in the treatment of hypertension. The data sets studied are peptide analogues of human angiotensinogen, 2, and of aspartyl proteinase pepstatin, 3. The data are reported in Table 4. Sets 51 and 58-62 involve residues 8, 9,10, and 11 of angiotensinogen. The residues Leu^^-Val^^ in these sets are replaced by a nonpeptide structural unit, an example of X^ substitution. Residue 8 may undergo replacement by either another residue or a nonpeptide fragment. Residue 9 may vary, and there may also be X^ or X^ substitution (see Table 4). Sets 53-57 involve the residues 8 through 12 of angiotensinogen with 10 and 11 replaced by statine and X^ substitution at residue 12. Set 52 consists of derivatives of pepstatin in which residue 1 is Phe or Trp, residue 2 varies, and both X^ and X^ substitution occur. H^N-Asp^-Arg^-VaP-Tyr^-Ile^-Hi^-Pm'^-Phe^ -Hi^-Leu^^-Val^
^-Ile^^-HiP-PRN 2
Iva-Vd^-Vaf-Sta-Ala-OH
The interactions due to the side chain of an amino acid residue on the bioactivity of the peptide are given by Eq. 10. If the peptide is substituted at its amino (X^) or
Application
of IMF Model
Table 4.
189
D a t a U s e d in t h e C o r r e l a t i o n s o f Peptide Renin I n h i b i t o r s
PPB51 IC50 (nM), Boc-Phe-Aax^-NHCH(cHx)CHOHCH2SOnAk, human renal renin, substrate—pure human angiotensinogen, maleate buffer (pH 6.0)^. Aax , n, Ak, IC50; His, 0, cHx, 4.0; His, 0, iBu, 6.5; His, 0, iPr, 4.0; His, 0, Me, 10; His, 2, cHx, 2.5; His, 2, iPr, 2.0; His, 2, Me, 4 0 ; Ala, 0, iPr, 9.9; Ala, 2, iPr, 70; Leu, 2, iPr, 4.0; Phe, 2, iPr, 30; Thr, 2, iPr, 8.0; Ser, 2, iPr, 4 0 ; Hse, 2, iPr, 20; (Bzl)Thr, 2, iPr, 6.0; (Bzl02C)0rn, 2, iPr, 60; (BzlOjQLys, 2, iPr, 100; (Ac)Lys, 2, iPr, 300. PPB52 - l o g IC50, X -Aax -Aax -Sta-Ala-Sta-C02R, enriched human plasma renin, substrate—endogenous angiotensinogen . x"^, Aax^, - l o g ICSQ; BOC, His, 7.57; Boc, Cpg, 7.43; Boc, Nva, 7.29; Boc, Val, 7.19; Boc, Phg, 7.03; Boc, Ser(Et), 6.95; Boc, Nie, 6.92; Boc, Chg, 6.82; Boc, Ser(Bzl), 6.76; Boc, Phe, 6.72; Boc, Thg, 6.60; Ser(Pym), 6.50; Boc, Gin, 6.50; Boc, Met, 6.50; Boc, Ser, 6.36; Boc, Cha, 6 . 3 1 ; Boc, Asn, 5.92; Boc, Tyr(Bzl), 5.85; Boc, Asn(Ph), 5.85; Boc, Met(02), 5.00; Boc, Trp, 6.92; Iva, Phe, 6.50; Iva, Nva, 7.38; Iva, Nie, 7.55^• Ac, Phe, 5.85; Ac, Nva, 6.59; Boc, Nie, 7.52S- Cbz, Phe, 6.00^• Cbz, Phe, 5.46; Cbz, Trp, 5.85; Cbz, Val, 6.75; Cbz, Val, 6 . 8 2 ^ Boc, Trp, 6.82"^; Boc, His, 6.85"^. PPB53-58 53, Boc-Phe-His-Sta-Leu-NHW; IC50 (nM), hog kidney renin; 54, IC50 (nM), human plasma renin; 55, Kj (nM), purified human kidney renin, substrate-angiotensinogen, radioimmunoassay; 56, Kj (nM), purified human kidney renin, substrate—Synthetic tetradecapeptide, radioimmunoassay; 57, Kj (nM), purified human kidney renin, substrate—Synthetic tetradecapeptide, fluorimetric assay^ W, IC5o(53), IC5o(54), Kj(55), Ki(56), Kj(57); CH2Ph, 35, 26, 70, 55, 18; CH2CH2Ph, 170, 164, 120, 350, 36; (-)CHPhCH2Ph, 23, 1326, 700, 68, 28; (+)CHPhCH2Ph, 6.9, 1 5 1 , 280, 38, 29; CH2C6H40Me-4, 6.7, 33, 12, 100, - ; CH2C6H4CI-4, 5.0, 8 1 , 280, 50, 290; (-)CHMePh, 22, 2 1 , 36, 20, 27; (+)CHMePh, 14, 49, 100, 6, 19; ()CHMe(1-CioH7), 14, 5 1 , 140, 13, 0.98; (+)CHMe(1-CioH7), 11, 484, 600, 230, 130; CHMeCH(OH)Ph, 480, 1 78, 220, 36, 67; CHPhCH(OH)Ph^, 90, 134, 110, 0.20, 0.12; CHPhCH(OH)Ph^ 24, 8 4 1 , 350, 47, 28; (CH2)5NCH2Ph, 280, 127, 97, 0.04, 0.064; 4 (10,11-dihydro-5H-dibenzo[a,d]cycloheptenyl^ 5.8, 569, 320, 520, 120. PPB58 IC50 (fiM), purified human renal renin, substrate—pure angiotensinogen, Boc-Phe-Aax^NHCH(CH2Ak)CHOH-CH2CH2W-Z'. Aax^ Ak, W, Z, IC50; Ala, iPr, C O , iPe, 2.4; Ala, iPr, C H O H , iPe, 3.8; Ala, iPr, S, iPe, 5.5; Ala, iPr, S, CH2CH2Ph, 4.2; Ala, iPr, S, iBu, 4 . 1 ; Ala, iPr, S, iPr, 4.8; Ala, iPr, SO, iPe, 5.2; Ala, iPr, SO2, iPe, 2.4; Ala, iPr, SO2, CH2CH2Ph, 1.8; Ala, iPr, SO2, iBu, 3.2; Ala, iPr, SO2, iPr, 1.6; His, cHx, SO2, iPr, 0.0076; His, cHx, SO2, Et, 0.10; Ala, cHx, SO2, iPr, 0.076; Ala, cHx, SO2, Et, 0.14; Leu, cHx, SO2, iPr, 0.014; Phe, cHx, SO2, iPr, 0.020. PPB59 IC50 (nM), purified human renal renin, substrate—pure angiotensinogen, Z (C=0)-Phe-Aax^NHCH(CH2Ak)CHOH-CH2(C=CH2)(C=0)-NHZ^ z'^, A a x ^ Ak, Z, IC50; tBuO, Ala, cHx, iPe, 10; tBuO, Ala, cHx, iBu, 10; tBuO, Ala, iPr, iPe, 200; tBuO, Ala, iPr, iBu, 400; tBuO, Ala, iPr, C H O H i B u , 6000; tBuO, Ala, cHx, CH2CHX, 50; tBuO, Ala, cHx, CH2CH2Ph, 150; tBuO, Ala, cHx, Me, 50; tBuO, Ala, cHx, CH2CH2NMe2, 400; tBuO, Ala, cHx, CH2CMe2NMe2; 25; tBuO, His, cHx, iPe, 1.5; tBuO, Leu, cHx, iPe,4; tBuO, His, cHx, iBu, 3; tBuO, His, cHx, CH2CMe2NMe2, 5; tBuO, Phe, cHx, CH2CMe2NMe2, 8.5; EtO, His, cHx, iPe, 3; EtO, Leu, cHx, iBu, 5; tBuCH2, His, cHx, Me, 2; Me, His, cHx, iPe, 4 ; EtO, Leu, cHx, CH2CMe2NMe2, {continued)
190
MARVIN CHARTON Table 4, {Continued}
PPB60 ICsoinM), Boc-Phe-His-NHCH(CH2cHx)CHOH-CH2CZ^Z^{C=0)-NHAk, purified human renal renin, substrate—pure angiotensinogen^' . Z ,Z , Ak, IC50; O H , Me, iPe, 5.5; Me, O H , iPe, 50; O H , CH2N3, iBu, 1 ; CH2N3, O H , iBu, 20; O H , iBu, iBu, 30; O H , CH2CI, iBu, 0.8; CH2CI, O H , iBu, 20; O H , CH2NH2, iBu, 15; CH2NH2, O H , iBu, 35. PPB61 IC50 (fiM), Boc-Phe-Aax^-NHCH(CH2iPr)CHOH-CH2WZ, purified human renal renin, substrate—purified angiotensinogen, maleate buffer (pH 6.0). Aax^, W Z , IC50; His, SPh, 0.96; Ala, SPh, 8; Ala, SCH2Ph, 10; Ala, SCH2CH2Ph, 4.5; Ala, SCCHjjjPh, 1; Ala, SiPr, 0.7; Ala, SiBu, 1.5; Ala, SiPe, 1.5; Ala, StBu, 3; Ala, ScHx, 0.8; Ala, ScPe, 1; Ala, OiPr, 7; Ala, CH2iPr, 2; Ala, OiBu, 7; Ala, CH2iBu, 3.5; Ala, SO2CHX, 2; His, OiPr, 1.5; His, CH2iPr, 1.5; His, OiBu, 0.65; His, CH2iBu, 0.60; His, SiPr, 0.081; His, S02iPr, 0.20; His, S02iBu, 0.35; His, S02iPe, 0.50; His, SO2CHX, 0.090; His, ScHx, 0.035. PPB62 IC50 (nM), XZCH(C=0)-Aax^-NHCH(CH2cHx)CHOH-CH2SOnAk, purified human renal renin, substrate—pure angiotensinogen, maleate buffer (pH 6.0)^. X, Z, Aax , n, Ak, IC50; BZIOCH2, t B u 0 2 C N H , His, 2, iPr, 75; BzlOCHMe, t B u O j C N H , His, 2, iPr, 5.5; BzlOCHMe, Et02CNH, His, 2, iPr, 20; 4-MeOC6H4CH2, t B u 0 2 C N H , His, 2, iPr, 3.0; PhO, H, His, 0, cHx, 430; Bzl, Bzl, His, 0, cHx, 20; Bzl, Bzl, His, 2, cHx, 40; Bzl, Bzl, Leu, 2, iPr, 25; Bzl, Bzl, His, 2, iPr, 70; BzlOCHMe, NH2, His, 2, iPr, 300; PhCH2, tBuCH2CONH, His, 2, iPr, 3.0; PhCH2, Et02CNH, His, 2, iPr, 5.0; BzlOCHMe, iPr02CNH, His, 2, iPr, 10; PhCH2, t B u 0 2 C N H , His, 2, iPr, 2.0; PhCH2, t B u 0 2 C N H , His, 2, cHx, 2.5; PhCH2, t B u 0 2 C N H , His, 0, cHx, 4.0; PhCH2, t B u 0 2 C N H , His, 0, iPr, 4.0. Notes: ^Ref. 1; ''Ref. 2; Aax^ is Phe and R is Me unless otherwise noted. ^R is H. ^Aax^ is Trp. ^Ref. 3. ^Erythro. ^Threo. ^Not included in the correlation. 'Ref. 4. ^Ref. 5. ''The group in italics is behind the plane of the paper while that in boldface is in front of the plane of the paper. ^Ref. 6.
carboxy (X^) terminus as well, additional terms are required in the IMF equation. It is also necessary to parameterize any structural variations that occur in the X^ units. The sets studied were correlated with an appropriate form of the IMF equation. The parameter values^'^'^^ used for amino acid side chains are given in Table 5; those used to parameterize X^, X^, and X^ substitution are given in Tables 6 and 7. For each data set the best regression equation obtained and the appropriate statistics are reported. The statistics reported are described fully in Appendix 1. Structural Effects in Angiotensinogen Derivatives
The structure of the angiotensinogen derivatives studied is summarized in Table 8. In sets 51,58, and 59, Leu^° is replaced by the fragment NHCH(CH2 Ak)CH0H (where Ak = alkyl), the side chain of residue 9 is varied, and Val^^ is replaced by X^^K In set 51, X^^^ is CH2SO„Ak', and Ak is constant and equal to cyclohexyl. X^^^ is parameterized by HQ which is equal to the number of O atoms bonded to the
Application of IMF Model
191
Table 5. Amino Acid Side-Chain Parameters for the IMF Equation Aax Ala Asn Asn(Ph) Cha Chg Cpg Gin His Hse Leu (Ac)Lys (Cbz)Lys Met Met(02) Nie Nva (Cbz)Orn Phe Ser (Bzl)Ser (Et)Ser (Pym)Ser Thg Thr (BzDThr (Bzl)Tyr (Me)Tyr Trp Val
^IX
a
^H
^n
/
V)
-0.01 0.06 0.10 -0.01 0.00 -0.01 0.05 0.08 0.06 -0.01 0.01 0.01 0.04 0.11 -0.01 -0.01 0.04 0.03 0.11 0.11 0.11 0.11 0.19 0.09 0.09 0.03 0.03 0.00 0.01
0.046 0.134 0.377 0.303 0.257 0.214 0.180 0.230 0.108 0.186 0.323 0.568 0.221 0.217 0.186 0.139 0.522 0.290 0.062 0.352 0.155 0.328 0.230 0.108 0.398 0.588 0.344 0.409 0.140
0 2 1 0 0 0 2 1 1 0 1 1 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 1 0
0 3 3 0 0 0 3 1 2 0 3 5 0 4 0 0 5 0 2 2 2 3 0 2 2 2 2 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0.52 0.76 0.76 0.97 0.87 0.71 0.68 0.70 0.77 0.98 0.68 0.68 0.78 1.01 0.68 0.68 0.68 0.70 0.53 0.62 0.61 0.62 0.57 0.70 0.71 0.70 0.70 0.70 0.76
sulfur atom, u^i^-, which accounts for the steric effect of Ak', and a^^/, which represents its polarizability. The correlation equation is: Qx = ^tX + ^^X + ^l«HX + ^2«nX+ lix + SV^ + B^n^ + A^^a^, + 5^,1)^, + B"
(19)
The best regression equation is, log IC50 = -7.13(+3.08)o^;^ + 0.766(+0.261)«„;f - 0.636(-K).236)ix -3.48(±1.83)aA^ - 1.87(±0.944)a)^^ + 3.04(10.724)
(20)
192
MARVIN CHARTON Table 6. Other Parameter Values Used in the Correlations a,b
Group'
tBu02CNH
0.28
0.306
1
3
0
0.50
Et02CNH
0.28
0.214
1
3
0
0.50
H
0
0
0
0
0
CH2Ph
0.03
0.290
0
0
0
0.70
NH2 tBuCH2CONH
0.17
0.044
1
1
0.35
0.28
0.332
2 1
3
0.50
iPrCONH
0.28
0.240
1
3
0 0
tBuO
0.28
0.206
0
2
0
0.50 1.22
COiPe
0.30
0.288
0
2
0
0.50
CHOHiPe
0.09
0.294
1
2
0
CH2\Pr
0.01
0.140
0
0
0
0.53 0.52
CH2iBu
-0.01 0.27
0.186
0
0
0
0.52
OiPr
0.160
0
2
0
0.32
OIBu
0.28
0.206
0
2
0
0.32
SiPe
0.26
0.314
0
0
0
0.60
SCH2CH2Ph
0.26 0.26
0.488 0.268
0 0
0
0
0.60
0
0
0.26
0.222
0
0
0
0.60 0.60
5tBu
0.26
0.268
0
0
0
0.60
ScPe ScHx SPh 5CH2Ph S(CH2)3Ph
0.26 0.32 0.31
0 0 0
0 0
0 0 0
0.60 0.60 0.60
0.26
0.292 0.339 0.333 0.376
0
0
0
0.60
0.468 0.314
0
0
0
0.60
SOiPe
0.26 0.54
2
S02iPe S02CH2CH2Ph
0.58 0.58
0.311 0.415
0 0 0
4 4
0 0 0
0.66 1.03 1.03
502iBu
0.265
0
4
0
1.03
S02iPr
0.58 0.57
0.219
0
4
0
1.03
S02Et SO2CHX
0.59 0.57
0.172
0
4
0
0.336
0
4
0
1.03 1.03
CI
0.47
0
0
0
0.55
NH2 N3
0.17
0.050 0.044
2
1
1
0.35
0.43
0.092
0
1
0
0.35
Me
-0.01
0.046
0
0
0
0.52
Et IPr
-0.01
0.093
0
0
0
0.56
0.01
0.140
0
0
0
0.76
iBu
-0.01
0.186
0
0
0
0.98
tBu
-0.01
0.186
0
0
0
IPe
-0.01
0.232
0
0
0
SiBu SiPr
0
0
1.24 0.68 {continued)
Application of IMF Model
193 Table 6,
Group^' cHx Ph
a
^/
0.257
Continued "H
"n
0 0 0 0 2
/
t)
0 0 0
0.87 0.57 0.97 0.70 0.71 1.56
CH2CH2Ph
0.00 0.12 -0.01 0.02
CHOHiBu
0.09
0.336 0.248
0 0 0 0 1
CH2CH2NMe2
0.03
0.237
0
1
0 0 1
CH2CMe2NMe2
0.03
0.331
0
1
1
CH2CHX
0.243 0.303
0.68
Notes: ^WZ groups are shown with W in italics. For these groups the u value reported is for W alone. ''Nonstandard abbreviations: c, cycio; Re, pentyl; Hx, hexyl; Pn, phenylene; Nh, naphthyl.
100R\ 71.43; AlOO/?^ 62.64; F, 6.001; 5,^^, 0.391; 5°, 0.655; n, 18; r^ji a^, «^, 0.570; a^, i, 0.533; a, n^, 0.731; n^, n„, 0.490; a^^' ^^ ^-^^'^^«//»i* 0.495. C^, 15.2; C„^, 20.3; C,-, 16.8; C^, 34.8; C^^, 13.0. A plot of log IC5o^^,^ against log 1050^,^8 ^^ given in Figure 1. Before interpreting these results it is necessary to understand what is represented by the i parameter in this data set. The only set members with an ionic side chain are those for which Aax^ is His. On the basis of percent inhibitions that were reported for Orn and Lys in position 9 and IC5Q values that have been determined for their Cbz derivatives it seemed that the effect of the His side chain is not due to its charge. The / parameter in Eqs. 19 and 20 is actually functioning as an indicator
Table 7. Parameters for Sets 53-57 CHZ^Z^^
ZG,Z
^w
CH2Ph CH2CH2Ph
0.12
0.290
0.03
0.336
CHPhCH2Ph
0.15
0.580
CH2PnOMe-4
0.11
0.356
CH2PnCI-4
0.15
0.338
^"HZ
^^nZ
^'z
^1W
^2W
0
0 0
1
1
0
0 0
0
0
0
1 2
1 2
0
2
0
1
0
0
0
1
1 1
CHMePh
0.11
0.336
0
0
0
2
1
CHMe(l-Nh)
0.13
0.496
0
0
0
2
1
CHMeCHOHPh
0.09
0.392
1
2
0
2
2
CHPhCHOHPh
0.22 0.02
0.590 0.547
1
2
2
0
1
0 1
3 0.74
c[(CH2)5N]CH2Ph-1
1.5
Notes: ^Nonstandard abbreviations: c, cycIo; Pe, pentyl; Hx, hexyl; Pn, phenylene; Nh, naphthyl. ^hese parameters apply to both stereoisomers.
^3W
2b
.b .b 2" 1
194
MARVIN CHARTON Table 8, Structures of Angiotensinogen Derivatives
Set
X^
RpP'
/\ax^^
AkiRpi^^f
Rpl''
)f
CH2SOnAI
: Sa, n2, 0.746; 2a, ns, 0.702; Za, m, 0.598; Sa, ^2, 0.643; Ea, m, 0.777; Zn//, Z/in, 0.792; In//, ^2, 0.828; Znn, m, 0.577; n2, «3, 0.861. Cia, 26.9; Ci„//, 42.9; C^n. 11.9; C„,/, 7.98; Cn,. 10.3 while for IC5Q values obtained with human plasma renin (set 54) the best equation, which resulted on exclusion of the data point for (-)NHCHPhCH2Ph, is: log IC50 = -2.71(±1.52)Za^ + 4.79(±0.964)Za2-
0.255(±0.106)IAZ„2
-0.727(±0.154)n^^-0.775(±0.193)ni + 1.35(10.273)^2 - 2.06(±0.462)n3 + 2.80(±0.402)
(33)
Application of IMF Model
201
100R\ 93.30; AlOO^^ 86.60; F, 9.950; 5est, 0.191; 5^, 0.417; n, 13. nf. Sa, m, 0.741; Za, n3, 0.723; 2a, m, 0.575; Za, AZ2, 0.626; Za, ns, 0.745; Zn//, Zrt„, 0.787; Zn//, AX2, 0.882; Z«//, «3,0.567; Zn^, ^3, 0.640; n2, ns, 0.882. Ci^a. 2.53; Cza, 19.4; C^nn, 5.28; Cnc/, 7.53; G p 16.0; C^^, 27.9; Cn3, 21.3. Plots of IC5o^^|^, against IC5QOJ,5 are given in Figures 7 and 8. Sets 53 and 54 differ in their structural effect dependence. The former set is independent of a, n^, and AZ2- It is largely a function of hydrogen bonding as is shown by the sum of the C^^ and C^^ values. The latter set is dependent on polarizability and on all three branching steric parameters. The sum of the C^, C^j, C^2' ^^^ ^ns values amounts to well over 80% of the total structural effect. If this result is not an artifact resulting from the extensive collinearities among the variables and/or the small size of the data set it may be due to either the difference in the enzyme used, the difference in the substrate used in the assay, or both of these factors. Both sets showed a dependence on configuration of about the same magnitude. Support for a difference in the behavior of the enzymes as the cause comes from the results obtained on combining sets 53 and 54 by the Omega method [4,22]. This technique can be used to combine data sets with related biocomponents into a single set. If the relationship between the biocomponents is not close enough the method fails. The biocomponent parameter co was defined from the log IC5Q values for the compound with W equal to 10,ll-dehydro-5H-dibenzo[a,d]cycloheptenamido. This data point was not included in the correlations of sets 53 and 54 due to difficulty in its parameterization. The correlation equation is identical to Eq. 19 except for the addition of a term in 0(0. The best regression equation obtained for the combined set gives results that are very much poorer than those obtained independentiy for sets 53 and 54 showing that combination of the data sets is not justified.
Set 53 •
2.5• 2• loglC(50)c1.51- .».' 0.501 \ i 1 0.5 1 1.5 2 2.5 log IC(50)obs Figure 7.
log ICso^calc vs. log ICso^obs-
3
MARVIN CHARTON
202
Set 54
1.5 2 2.5 log IC(50)obs Figure 8. log ICso^caic vs. log ICso^obs-
The correlation equation for the three sets of K- values is the same as that used for sets 53 and 54. The best regression equations obtained for set 55 are: log K^ = 3.05(±1.28)Sa2- 0.500(±0.152)En^2 + 0.869(±0.347)n2 - 1.14(±0.586)n3 + 1.28(±0.365)
(34)
100/?^ 67.32; AlOO/?^ 57.52; F, 4.636; 5est, 0.325; 5°, 0.713; n, 14. nf. see set 53. Csa, 23.6; Ci^n, 19.7; C„2, 34.3; C„3, 22.4. For set 56: log K^ = -12.8(±3.24)2n^ + 12.1(±3.15)^2 - 12.4(db3.25)M3 + 2.06(±0.664)
(35)
100/?^ 61.76; AlOO/?^ 54.81; F, 5.384; 5est, 0.774; 5°, 0.732; n, 14. ny. see set 53. Ci^H, 25.9; Q^, 49.0; Cn,, 25.1. For set 57: log K. = -1.87(±0.659)2>z^ + 3.29(±1.31)n2 - 3.75(11.59)^3 + 1.99(±0.807)
(36)
100/?^ 48.38; AlOO/?^ 38.06; F, 2.812; 5est, 0.912; 5^, 0.863; n, 13. nf "Lciz, m, 0.747; Ea/z, ns, 0.702; Zaz, ni, 0.569; Eaz, ni, 0.629; Saz, ns, 0.769;
Application of IMF Model
203
Set 55 2.52log K(i)calc1.51- • 0.50 11
• • •
t
•
»
•
•• •
i
••
•
l
l
1.5 2 2.5 log K(i)obs
c
Figure 9. log /Cj^calc vs. log K; ,obs'
IriH, I^nn, 0.950; En//, «2, 0.824; Zn„, AZ2, 0.731; m, ns, 0.857. Cxnn, 26.6; C„„ 46.8; Cn2, 26.6. Plots of log Kf ^^j^ against log K. obs ^^^ shown in Figures 9—12. As sets 56 and 57 involve the same enzyme and substrate and differ only in the method of assay we have combined them by means of the Zeta method^'^"^ into a single set (set 56, 57). The K- values for the set member for which W is the 10,ll-dehydro-5//-dibenzo[a,d]cycloheptenamido group were used to define ^. The correlation equation used for this set is obtained from Eq. 18 by adding a term in Z^. The best regression equation obtained is:
Set 56
- 1 0 1 2 log K(i)obs Figure 10. log /Cj^calc vs. log K,
3
204
MARVIN CHARTON
Set 57
-1.5-1^.50 0.5 1 1.5 2 2.5 log K(i)obs Figure 11. log/Cj^calc vs. log
log K-j^^ = -11.2(±2.20)E«;g- 0.684(±0.355)/j^^- 0.712(±0.400)ni + 10.9(±2.13)n2- ll-2(±2.19)n3 + 3.64(±0.858)
(37)
100R\ 63.24; A100^^ 56.56; F, 7.227; 5est, 0.760; 5°, 0.687; n, 27. ny. Zo/z, Ixxz, 0.453; 5kxz, Sn/zz, 0.498; ECT/Z, AH, 0.746; So(z, na, 0.702; Eaz, m, 0.584; Zaz, "2, 0.637; Zaz, ns, 0.773; I^nnz, ^nz, 0.863; En//z, «2, 0.826; In,a:, «2, 0.646; ni, m, 0.479; n2, «3,0.860. C„HZ, 24.2; C„c/, 1.48; C„„ 3.08; Q , 47.0; C„3, 24.3.
Set 56+57 2 1.51-
•
«••«§••
• m m • •
logK(i)calc°-^" -0.5-1 -1 -15-
• •
1 i 1 1 2 - 1 0 1 2 log K(i)obs
Figure 12. log /Cj^calc vs. log /Cj ,obs'
1 c
Application of IMF Model
205
Although the results are poor they are probably at least qualitatively reliable. Most of the structural effect is due to some combination of steric effects and polarizability. The rest is due to hydrogen bonding. There may also be a dependence on configuration. Structural Effects in Pepstatin Derivatives
Values of -log IC5Q have been reported for a set of 34 pepstatin analogues varying in the amino acid residue in position 2 and to a lesser extent in the amino acid residue in position 1, in X^, and in X^. The effect of the side chain on the residue in position 2 is represented by the IMF equation. X^ substitution is modeled by the variables ^XN' ^nXN> ^^^ ^XN- ^ Substitution involving the replacement of OMe by OH is accounted for by the variable HQ^ which takes the value 1 when X^ is OH and 0 when it is not. The replacement of Phe by Trp in position 1 is represented by the parameter rij which takes the value 1 when the residue in position 1 is Trp and 0 when it is not. Thus the correlation equation is: Qx = ^^iX + A^X + ^l«™ + f^2\x + ^h + S^X
The best regression equation obtained is: -log /C50 = -7.51(±1.29)a^ - 1.87(±0.552)a;^ - 0.314(±0.107)n^;^ + 0.898(±0.254)z;^- 3.49(±0.659)%+ 1.20(±0.238)\)^^+ 8.64(10.524) (39)
100/?^ 75.26; AlOO/?^ 70.84; F, 13.69; Sesu 0.341; 5°, 0.558; n, 34. nf. ax, nnx, 0.568; nnx. rinx, 0.442. C^x, 9.78; Cax, 7.00; CnHX, 5.11; Qx, 14.6; Cy,x. 39.7; CuXM 23.8. A plot log IC^Q^^i^, against IC5QQ^,3 is given in Figure 13. Again the only residue in the data set with a charged side chain is His. The / parameter is therefore acting as an indicator variable for the presence of His. The results show that steric effects dominate the bioactivity in this set. A subset of set 52 in which only the side chain of the residue in position 2 varies was studied (set 52a). Boc and OMe are the X^ and X^ substituents respectively throughout the subset. The correlation equation is the IMF equation. The best regression was equation obtained with the inclusion of the data point for His, it is: -log IC50 = -3.36(±1.58)a^ - 0.21 l(±0.0590)n„;^ + 0.997(±0.316)/;^ - 2.65(±0.660)D;^ + 8.07(±0.503)
(40)
206
MARVIN CHARTON
Set 52 8 6 5-! log IC(50)c4 -] 3 1 0 I I I I I 5 5.5 6 6.5 7 7.5 8 log IC(50)obs Figure 13. log ICsccaic vs. log ICso.obs-
100/^^ 79.65; AlOO/?^ 76.06; F, 15.66; 5est, 0.306; S°, 0.517; n, 21. nf Gix, n„x, 0.463; mx, n„x, 0.452. Caix, 8.07; C„„x, 6.33; Qx, 29.9; C^x, 55.7. Figure 14 shows a plot of log IC^^^^^ against log ICjoj,!,^. The point for His was included in order to make comparisons with the results for the whole set. Again steric effects are the predominant factor in determining bioactivity in the set. The only differences in behavior between the subset and the whole set with regard to the effect of the side chain on the residue in position 2 are the lack of dependence of the former on polarizability and its dependence on n„ rather than n^.
Set 52a
5.5 6 6.5 7 7.5 8 log IC(50)obs Figure 14. log ICjccalc vs. log ICjo.obs-
Application of IMF Model
207
In the majority of the data sets studied the number of data points is insufficient to permit the determination of reliable QSAR. The results obtained for these data sets are best regarded as suggestive. What justifies the decision to analyze the data? A major function of correlation analysis is obtaining useful information concerning the effect of structural variation on measurable properties from experimental results. If the quality or quantity of the data do not permit the determination of QSAR they can nevertheless be used to obtain semiquantitative structure-activity relationships (SQSAR). These are useful as indications of the direction for further investigation. Design of Angiotensinogen Analogues
The only set which includes some meaningful degree of substitution at X^ shows no effects of it on bioactivity. For X^^ or Aax^ the activity of the inhibitor is increased (IC5Q is decreased) by incrementation in a^^ a^^' ^z» ^^^ % ' ^^ ^^ decreased by decrementation in a^^^ (set 62). Variation of the amino acid side chain Aax^ in position 9 was extensively studied in set 51 and to a very much lesser extent in sets 58 and 59. Activity is increased by incrementaton of G^^ (set 51), i)^ (sets 58, 59), % (set 58), and by the presence of His in this position. It is decreased by incrementation of AZ^ (set 51). Structural variation in X^^^ was limited to the substitution of cyclohexyl for isopropyl. Both sets 58 and 59 show that replacement of isopropyl by cyclohexyl increases activity. As this replacement results in incrementation of both D and a the reason for the increase in activity is unknown. It may be due to steric effects, to polarizability, or some combination of them. The study of the four substrates in which isopropyl is replaced by pentyl, isopentyl, 3-pentyl, and r-pentyl, all of which have the same value of a but range in u from 0.68 to 1.63, would answer this question. If all of these substrates exhibit the same activity the structural effect observed is due to polarizability. If there are large differences in activity the cause is steric effects. If there are small differences the structural effect probably involves both steric and polarizability contributions. For X^^^ incrementing a and \) for the alkyl group attached to sulfur in set 51 increases activity as does increasing the number of O atoms in the fragment W in set 58. In set 61 incrementation of D^ and decrementation of AZ^^ increases activity. In set 60 activity is increased by the incrementation of a^^ ^^^ decrementation of For X^ in set 59, activity is increased by incrementation of D^ and decrementation of o^2- ^^^ 58 shows no dependence on structural variation in X^. With regard to the results for structural variation in sets 54-57, activity is increased by incrementation of the hydrogen-bonding parameters n^ and n^ and the branching steric parameter ^3, and decrementation of the branching steric parameter n2-
208
MARVIN CHARTON
Parameterization of Configuration
Modeling configuration with the variable n .which takes the value 1 for achiral groups and for the configuration with the larger value of Q and is 0 for the enantiomeric or epimeric configuration has had some success. The underlying assumption in this parameterization is that the factors which maximize Q for one chiral configuration will be reflected in the achiral group which can arrange itself spacially so as to bind as strongly as possible to the receptor site. The enantiomeric configuration will be so arranged in space as to bind less strongly. As soon as the n .parameter is introduced into the model any achiral groups present in the set must be assigned values of either 0 or 1 and therefore they are automatically assigned to the same category as one of the two configurations of the chiral groups. Note that the rationale for this method is that the activity-determining step involves binding to a receptor. The validity of this parameterization requires further testing before it can be regarded as generally reliable.
IV. PROTEIN BIOACTIVITIES A. Limitation of the Model in Protein QSAR
It is important to review the disadvantages of the IMF model, particularly as applied to proteins.^ They include: 1. The protein data set almost always is restricted to the 19 structurally similar amino acid residues normally found in protein termed the protein basis set (PBS).The IMF equation for proteins requires from six to nine independent variables depending on the choice of steric effect parameterization. This results in two problems, (a) The data set has a low value, 1-2, of the ratio A^j)p/A^y. A^j)p is the number of degrees of freedom in the data set. A^y is the number of independent variables. The possibility of a chance correlation is fairly large, (b) There are collinearities among the independent variables which are inherent in the PBS. No method for avoiding them is available. This makes interpretation of the regression equation more difficult. 2. In general residues substituted at the a-amino N atom require two additional independent variables to account for their structural effect. The proline type of residue requires one. As its inclusion in the data set adds only one DF, the model cannot be applied to data sets which include it. B. Types of Protein Bioactivity Data Sets
There are two major types of protein bioactivity data sets: 1. Sets in which substitution occurs at a single position in the protein. In this type of data set a residue occupying a given position in the wild-type (native)
Application of IMF Model
209
protein that is known to be involved in the bioactivity is replaced by a number of other residues and the activities of these mutant proteins are determined. The activities can be correlated with an appropriate form of the IMF equation. An example of this type of data set is the determination by Alber and coworkers^^'"^^ of the relative activities of Phage T4 lysozomes substituted at position 86. 2. Residues which are known to be part of a receptor in a bioactive protein are substituted for and the activities of the resulting mutants are determined. In correlating this type of data set it is necessary to assume that only structural effects on the receptor site are of importance and therefore the effect of the side chains of nearby residues on that of the residue that has undergone substitution is negligible. This is a reasonable approximation if only interactions between the receptor and the bioactive species need be considered. The parameterization of the IMF equation in this case must represent the difference in effect between the side chain in the wild-type and that in the mutant. An example of this type of substitution is the report of Fersht et al. on kinetically determined AAG values for the binding of ATP and of Tyr to tyrosyl-rRNA synthetase.^^'"^^ A variation on this type of study is that of Cunningham and Wells^^ which involves alanine substitution of each residue in the protein. C. Hunfian Growth Hormone (hCH)
The alanine scanning mutagenesis method has been applied to residues that are within the three regions of hGH said to be involved in receptor recognition. They are helix 1 (residues 2-19), loop (residues 54-74), and helix 4 (residues 167191).^^ Binding dissociation constants, K^, were determined for each of the mutant human growth hormone-human growth hormone liver receptor site complexes substituted at residues which are part of the receptor. In two cases the alanine-substituted mutant could not be prepared and a different substitution product was used in its place. In this type of protein bioactivity study, only those residues which are either part of or interact with a receptor site need be considered. Residues which are not involved in binding act as a skeletal group to which the active residues are attached. It is assumed that their side chains take no part in the observed bioactivity. Cunningham and Wells have considered all residues for which the ratio of K^ ^^^^^^^ to ^d,wild-type is greater than 4 to be possibly involved in binding. There are 13 such values (Table 9). Correlation of this data set requires a modified form of the IMF equation: Gx^x = ^
+ Aot^ + H^n^ + H^n^ i-Ii^ + Sv^ + B^n^ + B^n^ + B^n^ + B^ (41)
210
MARVIN CHARTON
Table 9. K^ (nM) for Ala Substitutions of Receptor Residues in Human Growth Hormone(hGH); ^ Other Substitutions are Shown Leu6, 0.95; Phel 0, 2.0; Phe54, 1.5; Glu56, 1.4; Ile58, 5.6; Ser62, 0.95; Arg64, 7.11; Glu66, 0.71; Gln68, 1.8; Lys70, 0.82; Aspl 71, 2.4; Lysl 72, 4.6; Glul 74, 0.075; Thrl 75Ser, 5.9; Phel 76, 5.4; Argl 78Asn, 2.9; llel 79, 0.92; Cysl 82, 1.9; Vail 85, 1.5. Note: ^Ref. 6. Sets 12 and 13 include all substitutions with K^> 1.4 with the exception of Phel 76. Set 14 includes Met14, Ser62, and Glu66 as well.
where the superscript A indicates the difference between the parameter value of the final side chain and that of the initial side chain. Thus, when the amino acid residue Aax^ is replaced by the residue Aax^ the value of v"^ is given by: (42) where v is one of the independent variables in Eq. 3 while v^. and v^^-are its values for the initial and final side chains, respectively. Values of v^ for use with Eq. 41 are given in Table 10. The best regression equations were obtained on the exclusion of the data point for the Phel76Ala mutant. They are, log /i:^ = -3.79(±0.940)af + 0.138(±0.0423)A2^- 0.240(±0.134)nf 0.360(±0.141)n^ - 0.635(±0.164)nf + 0.856(±0.196)
(43)
100/?^ 82.53; A100/?2, 72.55; F, 5.670; 5,,^, 0.146; 5^, 0.591; n, 12 or: log K^ = -2.50(±0.987)af + 0.152 (±0.0518)n^ - 0.330(±0.132)n^ -f0.586(±0.0996)
(44)
100R\ 63.70; A100/?2, 55.63; F, 4.679; 5est, 0.182; 5°, 0.738; n, 12 For each of these equations the F test is significant at the 95.0% CL. Plots of log ^dcaic against log ^d,obs a^^ given in Figures 15 and 16. C^ values for Eqs. 43 and 44 are given in Table 11. Binding of mutant hormones to the hGH receptor is a function of hydrogen bonding, and possibly of van der Waals interactions and steric effects as well. Values of AQ, the difference between the observed and calculated values of log K^, for the residues not represented by Eq. 44 are reported in Table 12. Three of the residues in Table 12, Metl4, Ser62, and Glu66, have AQ values which are small
Application of IMF Model
211
Table 10, Amino Acid Side-Chain Parameter Difference Values^
* ^
Aax Lys6
a*
^H
*
/
D
*
*
*
^1
"2
"3
2
0 1
0.00 0.04
0.140 0.244
0
0 0
0 0
0.46
0
0.18
1 1
Met14
0.05
0.175
0
0
0
0.26
1
1
Phe54
0.04
0.244
0
0
0
0.18
1
1
PhelO
Glu56
0.08
0.105
1
4
1
0.16
1
1
Ile58
0.00
0.140
0
0
0
0.50
2
0
Ser62
0.12
0.016
1
2
0
0.01
1
Asn63 Arg64
0.07
0.088 0.245
0 1
Glu66
0.08 0.06
0.105 0.134
3 3 4
0.24
0.05
2 4
1
0.16
1 1 1
0 1 1
2
0
0.16
1
0.173
2
0.059
1
1 1
0.16 0.24
1 1
1 1
Aspl 71
0.01 0.16
3 1 4
Lysl 72
0.01
0.173
2
1
1
0.16
1
0 1
Glu174
0.08
0.140
1
4
1
0.16
1
1
-0.02
0.046
0
0
0
0.17
1
0.04
0.244
0
0
0
0.18
1
-0.02
0.157
2
0
0 1
0
0
2
0
0
Gln68 Lys70
Thrl 75Ser Phel76 Argi 78Asn
1
0.16
Ile179 Cys182
0
0.140
0
0
1 -0.08 0 0.50
0.13
0
0
0
Vail 85
0.02
0.082 0.094
0
0
0
Note:
0.10 0.24
0
0
0
0
0
1
1
2
0
These values are for correlations with Eq. 10 and its variants. Correlation of the logarithms of the entire data set of K^ values with Eq. 10 did not give significant results. Exclusion of the Phel 76Ala mutant gave the best regression equations.
enough to suggest that they can be combined with the members of set 9. Correlation of the combined set with Eq. 41 gives the regression equation: log K^ = -4.29(±0.971)af + 2.04(±0.963)a^ + 0.157(±0.0416)n^ - 0.329(±0.145)nf + 0.377(±0.148)n^ - 0.939(±0.187)A2^
+ 0.787(±0.216)
(45)
100R\ 85.96, AlOO/?^ 78.16; F, 8.161; 5est, 0.158; 5^,0.513; n, 15 A plot of log A^dcaic ^gaii^st log A^dobs ^^ given in Figure 17. The major difference between Eqs. 44 and 45 is that the latter shows some dependence on polarizability
212
MARVIN CHARTON
hGH
0.10.20.30.40.50.60.70.80.9 log K(d)obs Figure 15. log Kj^calc vs. log /Cd,obs-
whereas the former does not. It is important to recognize, however, that there is a strong collinearity between the polarizability parameter a^ and the steric parameter riy It is quite likely that the n^ term in Eqs. 44 and 45 represents polarizability at least in part. Correlation matrices for Eqs. 44 and 45 are set forth in Table 13. The coefficients of the other independent variables in Eq. 45 show no significant difference from those in Eq. 44. AQ values calculated from Eq. 45 for the remaining residues are also given in Table 12. As the calculated values for Lys70 and Phel74 are between three and four standard deviations away from the observed values, these residues may simply be outliers. These residues do not seem to be on the binding surface of hGH. Their effect is probably due to conformational changes which affect
hGH 0.8 0.7-1 0.6 0.5 log K(d)ca0.4 0.3-1 0.2 0.1 0 0.10.20.30.40.50.60.70.80.9 log K(d)obs Figure 16. log Kd,x + ^PlCpi + ^
('^^)
Again, the coefficients of and statistics for the best regression equations obtained for log (^ca/^m) (^^^ •^) ^^^ ^^^ ^^S (^^^m) (^^^ ^^ ^^ reported in Table 15. The correlation matrix for correlations of sets 3 and 4 with Eq. 48 is given in Table 21, outliers are reported in Table 18. The difference between sets 3 and 4 and sets 1 and
Table 16. Results of Correlationsa Hi
SH 1
-1 7.3
5.38
0.192
-1 3.8
5.1 6
0.229 0.1 70
-1.26
1Q 1M 1K
1.34 0.81 9
-0.539
0.0849
0.352
0.091 0
-0.401
2E
-1 2.6
3.24
-0.406 0.985 0.202
0.0929 0.135 0.0677
4.06
0.933
0.241 -0.387 0.266
1.71
0.71 0
L
SL
2Q 2M 2K 1 U 5
5h2
SA
1E
Set
2 3 4 5 6 7
-3.45 -7.85 -3.10
1.05
A
H2
8 9 10
-5.31
Set
Sl
1E
5.85
1.97
1Q
1.44
0.600
1M 1K 2E
1.20 1.84
0.543 0.509
1.69
0.0624
0.396
0.1 82
-0.967 -0.21 5
0.110 0.0451
0.842
0.161
0.0758 0.0790 0.0934
-0.291 0.1 10 -0.367
0.051 9 0.0344 0.675
0.481
0.1 52
4.231
0.1 30
0.422
0.143
-0.350
0.1 06 0.452
0.193
0.275
0.0835
-0.268
0.0569 4.264
0.1 29
0.273
0.108
-0.31 6
0.0865
s3
ss3
1'
3.21
1.50
0.71 3
1.51
1.69
0.767
SSI
sz
ss2
5.01
1.20
SI
0.281
2.1 0 1.01
i
1.35
56
SZI 56
0.699
0.497
0.520
0.232
1.07 -0.848 0.581
0.311 0.326 0.254
-
L P1
%p156
(continued
Table 16. Continued Set
2Q 2M 2K
2
CD
51
SS1
1.11
0.451
1 2 3 4 5 6 7 8 9 10
1.09
0.558
Set
Hip,
52
ss2
53
553
1'
56
0.566 0.823
5z156
-0.469
0.228 0.237 0.1 76
0.727 0.429
0.327 0.201
0.331
SHl P1
-.'1
SHZPl
'P 1
SIP1
ZP1
'ZPl56
0.822 1.04 1.30
0.0628 0.0902 0.181
0.352
0.114
0.1 94
B"
SBO
1002
702
1E
-1.51
2.73
94.47
91.01
1Q 1M 1K
-1.50 -0.374 8.30
1.27 1.71 1.79
90.44 85.69 70.65
86.10 80.92 66.45
2E 24 2M
-0.1 92 1.23 -0.773
0.1 72 0.980 1.04
96.65 69.01 84.54
94.55 64.58 79.39
2K
4.79
0.775
77.36
72.14
1
-3.1 6 -1.80
1.83 0.944
77.81 74.25
76.25 72.41
2
3 4
5 co
5 6
0.230
7 8 9
0.403
10
-0.196
Set 1E
F
1Q 1M 1K 2E 24 2M 2K 3 4 5 6 7
0.1 05 0.1 52
0.0903 sest
19.93 15.77 13.18 10.43 33.61 9.648 12.03 10.25 18.40
0.394 0.230 0.320 0.340 0.230 0.263 0.267 0.1 97
7.703
0.496
42.31 27.68
8
19.45 19.47
9
82.55
10
37.22
0.774 0.623 0.403 0.638 0.378 0.469 0.332
-0.807 2.1 6
0.736 0.442 0.1 87 0.828
-0.565
0.0570
-1.16
0.1 68
5.53
-0.227 -0.665 -0.269 -0.481 -0.117
0.0354 0.0828 0.0454 0.0638 0.0476
-0.882 -0.972 -0.590 -1.51 -1.11
0.107 0.240 0.137 0.202 0.146
-2.87 5.39 4.09 5.64 4.43
9 0.333 0.403 0.470 0.620 0.259 0.637 0.489 0.566 0.620
n
14 17 17 17 14 17 17 17 32
0.782 0.484 0.530
30 63
0.51 5
33
0.552 0.359
32 24 24
0.427
62
Cot
0 0 0 0 0 0 0
0 0 25.4 0 6.59 0 26.7 nd nd
0.258 0.169 0.1 66 0.1 35
66.34
64.02
47.06 78.77 75.1 2
43.1 3 77.31 72.90
78.27
75.1 7
74.26 88.72 84.81
71.50 88.20 83.36
ca
CnH
Cnn
ci
C",
19.0 22.3 0
8.00 7.1 8 4.29 5.90 7.40 5.74 3.63 10.1 7.1 5
7.51 4.73 4.89 0 7.25 6.10 4.38 2.87 5.94
8.02 0 4.82
34.8 12.7 14.6 26.7
0
17.5 0 0 19.6 0 0 0 7.63 0 19.5 0 0
0
0
11.0
10.7
0 10.4
0
0
6.32 0
7.25 0 0 18.3 0
0 0 16.7 0 0 0
0
6.51 0
0 0
0
12.0 0
0
0
0 0
0
0
0
0
nd nd (continued)
Table 16. Continued Set
cu2
1E
0 0 0 0 37.7 0 0 0 0 0 0 0 nd nd nd
1Q 1M 1K 2E 2Q 2M 2K
N
N 0
3 4 5 6 7 8 9 10 Set
11
nd L
6.05
CU,
c(T56
CLPl
CnHPl
CnnPl
CIPl
0
22.8 25.0
nd nd nd nd nd nd nd nd
nd nd nd nd nd nd nd nd nd nd
nd nd nd nd nd nd nd nd nd nd
nd nd nd nd nd nd nd nd nd nd
22.6
46.4
6.11 25.3 16.9 24.2
23.5 37.0 37.0 75.8
8.20
78.1
28.1 0
71.4
0 0 0 0 0 0
67.4 23.9 88.2 68.0 67.4 44.5
0 0 0
32.2 0 49.6
nd nd nd nd SL
3.40
12 13
nd nd nd nd
36.2 56.5
nd nd nd nd nd nd
15
6.1 3
3.66
13.7
SA
Hl
SH 1
-1 8.5
5.63
0.999
0.220
0.220 -1 3.1
5.26
0.698 0.255 0.887 0.235
A
14 16
9.22 0 15.3 0 0
-1 4.9
5.90
H2
'nX166
nd nd nd nd nd nd nd nd nd nd nd nd nd nd 0 0
5h2
-0.863
0.1 76
0.0640
-0.264
0.0374
0.1 75
-0.553
0.0898
0.0500
-0.239
0.0344
0.234 0.0527
-0.761 -0.223
0.1 81 0.0371
1 0.27
5 0.148
Set
h,
k!
Sl
SSl
s 3
SS3
11
2.37
0.658
4.25
1.64
12 13 14
0.529 2.1 2
0.31 3 0.660
3.02
1.58
15 16
1.78
52
SS2
3.66
0.71 9
56
1'
%156
0.404
0.1 69
0.375
0.1 71
0.379
0.1 77
1.84
Hlpl
'H1 P1
0.373
0.0842
0.1 99 -0.1 62
0.0809 0.0631
0.390
0.0909
Set
H2P1
SIP1
"ii
snii
B"
SBO
1002
A1002
11
-0.660
0.0467
-1.18
0.1 26
1.38
0.180
5.39
12
-0.248
0.0326
-0.747
0.0964
1.01
0.1 31
2.55
0.1 69 0.724
88.79 79.74
86.72 77.64
13
-0.545
0.0434
-1.19
0.1 29
14 15
-0.1 61 -0.661
0.0631 0.0506
-0.750 -1.23
0.1 01 0.1 38
0.981 0.721
0.1 32 0.1 03
5.38 2.83
0.1 74 0.725
87.76 79.10
86.01 76.94
1.28
0.197
-0.245
0.0349
-0.759
0.1 03
0.91 0
0.142
5.40 2.77
0.1 80
16
87.21 76.96
84.86 74.78
Set
F
11 12
38.1 8 32.05
'HZPl
'P 1
sest
9
0.472 0.372
0.371 0.481
65 65
2.98 0
21 .o 0
0 0 3.53 0
20.5 0
n
13
43.81
0.484
14 15
30.82 32.74
0.378 0.488
0.380 0.488 0.396
65 65
16
28.95
0.379
0.511
59
59
Cd
ca
19.8 0
n'
H
6.1 5 4.21 5.96
0.752 Cnn
Ci
C"1
5.32 5.05 4.72
1.66 0 0
14.6 10.1 18.1
5.88
5.49
6.39 5.28
5.48
0 0 0
12.8 0
5.02
0
(continued)
Table 16. Continued ~
N N p3
Set
CU,
CnHPl
CnnPl
C,PI
11
0
26.1
0
2.30
4.07
7.28
12
0 25.8
42.3 0
0 1.70
4.75
14.3
13
0 0
4.65
10.2
14
0
0
47.3
3.73
3.71
17.3
15
0
26.4
0
2.81
4.76
16
0
0
46.7
0
5.51
CU3
CS156
~
8.87 17.0
un'
8.52 19.3 8.37 16.6 9.20 20.4
Note: aL, A, H, . . . are the regression coefficients of the best regression Eq.; S, , S , ,S, . . . are their standard errors, 1OOR2 is the percent of the variation of the data accounted for by the regression equation; AlOOR' is the previous quantity adjusted for the number of independent variables; S,, is the standard error ofthe estimate, it is a measure of the error to be expected in a value of the dependent variable that is estimated from the regression equation. 9 is the previous quantity divided by the root mean square of the data. n is the number of data points in the set. Ci represents the percent of the data accounted for by the i-th independent variable when a reference substituent is used. nd means not determined.
Application of IMF Model
223
Table 17, Correlation Matrices for Sets 1E and 2E^ ^
1
«
^H
^n
'
^1
^2
^3
.011
.102
.863
.147
.481
.449
.094
1
.755
.253
.515
.821
.842
.960
.547
.444
.669
.675
.612
1
.274
.644
.620
.278
1
.452
.462
.480
1
.999
.882
1
.901 1
^156
.194 .180 .300 .279 .167 .155 .099 .092 .151 .140 .091 .084 .107 .099 .266 .247 1
Oj a "H
Hn
i ^1
1)2
Cl56
Note: ^Values in boldface are for set 2E only. The other values in this column are for set 1E.
Table 18. Correlation Matrices for Sets 1Q, 1M, 1K, 2Q, 2M, and 2K a
^/ 1
"H
/
^n
^1
^2
^3
.023
.120
.867
.311
.434
.517
.186
1
.769
.267
.489
.732
.821
.940
1
.478
.401
.559
.675
.626
1
.440
.575
.671
.383
1
.436
.530
.521
1
.834
.745
1
.910 1
^156
.090 .085 .192 .180 .036 .034 .000 .000 .051 .048 .168 .157 .042 .040 .168 .157 1
^1
a "H
"n
i
^2
^3
Cl56
Note: ^Values in boldface are for sets 2Q, 2M, and 2K only. The other values in this column are for sets ^Q, 1M, andlK.
224
MARVIN CHARTON Table 19. Outliers In Sets 1 Through 8
Set
Substrate
Enzyme 1
2
3
156 Gin
166 Lys
PI Glu
Gin Ser Glu Gin Ser
Lys Lys Lys
Lys Lys Glu
Lys Lys
Glu GluLys
Ser
Lys
Lys
Arg
Glu
Set
Enzyme
Substrate
4
156 Glu
166 Lys
PI Lys
5
Glu Glu Gin
Gin Met Lys
Met Met Lys
Gin
Lys
Glu
Glu
Lys
Glu
Gin Ser
Lys Lys
Glu Gly
Glu
Lys
Glu
6
Lys 8
Table 20, Correlation Matrices for Sets 1 and 2^
^/ 1 1
a .063 .081 1 1
"H
.147 .160 .762 .760 1 1
^
/
^/
.865 .865 .301 .315 .517 .527 1 1
.344 .369 .465 .455 .390 .383 .456 .474 1 1
.478 .491 .750 .750 .576 .574 .613 .621 .430 .427 1 1
^2
^3
.545 .562 .823 .822 .669 .667 .689 .700 .502 .498 .866 .866 1 1
.217 .236 .941 .941 .614 .610 .399 .413 .487 .477 .773 .773 .908 .908 1 1
Note: ^Values in boldface are for set 2, other values are for set 1.
^)56
.149 .211 .261 .176 .086 .022 .041 .118 .032 .073 .174 .056 .078 .021 .229 .141 1 1
Cp; .017 .057 .021 .059 .042 .009 .029 .021 .049 .131 .045 .078 .013 .034 .012 .061 .080 .104 1 1
^1
a "H
"n
1 ^1
^^2
Cl56
Application of IMF Model
225
2 is that the residue in SI56 is held constant in the former pair while it is allowed to vary in the latter. Although the results obtained for sets 3 and 4 are statistically significant they leave much to be desired. An alternative parameterization of the structural effects at PI of the peptide substrate was therefore considered. The effects can of course be represented by the IMF equation. For the four residues studied, steric effects are essentially constant, thus no steric parameterization is necessary. Both their electrical effects and their polarizabilities are essentially constant. It follows then that we should be able to account for their effect by means of the «^, n^, and / parameters. The entire sets of log (k^JK^) and log (l/K^) values were correlated with the equation:
(49) Coefficients of and statistics for the best regression equations obtained for the log (k^JK^) (set 5) and for the log (l/AT^) values (set 6) are given in Table 15. Outliers are reported in Table 18. The correlation matrix for correlations of sets 5 and 6 with Eq. 49 is given in Table 22. This parameterization was also applied to log (k^JK^)
Table 21, Correlation Matrices for Sets 3 and 4^ Oj
a
1 1
.101 .122 1 1
"H
.173 .171 .676 .751
^n
.789 .796 .273 .350 .577 .570 1 1
/'
^/
^2
^3
^Pi
.280 .325 .425 .490 .470 .505 .484 .542 1 1
.374 .393 .665 .651 .410 .420 .448 .487 .313 .343 1 1
.549 .581 .824 .810 .575 .595 .614 .673 .441 .488 .714 .710 1 1
.270 .305 .916 .905 .478 .535 .362 .444 .391 .460 .651 .649 .925 .927 1 1
.100 .130 .158 .139 .170 .108 .057 .104 .048 .065 .052 .045 .069 .081 .101 .120 1 1
^Values in boldface are for set 4 only. The other values in this column are for set 3.
^1
a "H
"n
1 ^1
1)2
Cl56
226
MARVIN CHARTON
and log (l/^m) values for enzymes with Xaal56 = Glu. The correlation equation used is:
The coefficients of and the statistics for the best regression equations for the log (k^JK^) values (set 7) and for the log (1/^^) values (set 8) are set forth in Table 15. The correlation matrix for correlations of sets 7 and 8 with Eq. 50 is given in Table 23. Outliers are again reported in Table 18. The results obtained for sets 5 and 6, and in particular for sets 7 and 8 are indeed an improvement over those for sets 1, 2, 3, and 4. Substitution at SI56 Finally, to determine whether anything may be learned about the effect of substitution at position 156 of the enzyme, we have correlated log (k^JK^) and log (l/K^) values for enzymes with Aaxl66 = Gly or Asn with the equation, Qx = ^ % + ^1%X + ^2«nX + ^h + ^X166«X166 + //iPl^HPl + ^2P1«.P1 + ^PI'PI + ^
(^^)
in which the parameter ^^166 takes the value 1 when Xaal66 is Asn and 0 when it is Gly. In parameterizing the effect of substitution at position 156, we have noted that for the three residues studied Dj is constant and 1)2 and QJ nearly so. Steric effects occurring at atoms past the second atom of the side chain were assumed to be negligible. The coefficients of and the statistics for the best regression equations obtained for the log (k^JK^ values (set 9) and the log (l/A'm) values (set 10) are reported in Table 15. The correlation matrix for correlations of sets 9 and 10 carried out with Eq. 51 is given in Table 24. On examining the data points which were excluded from the correlations as outliers we note that of the total of 17 such outliers in sets 1 through 8 Lys is in position 166 in 14 cases and Arg in one case. Thus 15 of the 17 outliers have ionic groups attached to two or more methylene groups in the side chain. No Asp substitution at this position occurred in any outlier. Seven of the 17 outliers had a Glu residue in position PI of the substrate while six had Lys in this position; thus 13 of the 17 outliers had ionic groups attached to two or more methylene groups in this position. In position 156 of the enzyme one-third of the 17 outliers, about six, should be Glu; seven Glu residues were in this position. There does not seem to be any preference for ionic groups in this position. It seems likely that the model is incomplete, and that an additional parameter is required to account for the interaction of ionic side chains on Glu, Lys, and Arg residues in position 166 with ionic side chains on Glu and Lys in position PI. Since interactions between opposite
Table 22. Correlation Matrices for Sets 5 and 6a (J/
1
1
a
nti
"n
.046 .063 1 1
.136 .147 .763 .762 1 1
.866 .865 .287 .301 ,507 .517 1 1
I
,321 ,344 ,475 .465 ,396 .390 .438 .456 1 1
N N
u
u1
u2
.466 .478 ,750 .750 .578 .576 .605 .613 ,434 .430 1 1
.530 .545 .824
.823 ,671 .669 ,679 .689 .509 .SO2 .867 .866 1 1
u3
h56
nHPl
nnPl
'Pl
.199 .217 .942 .941 .617 .614 .385 .399 .495 .487 .773 .773 .908 -908 1
.1 74 .178 ,031 .142 .110 .003 .150 .097 ,122 .lo3 .113 .041 .180 .037 ,108 .110 1 1
.013
,010 .043 ,012 .044 .029
.015 .033 .015 .033 .007 .004
1
Note: aValues in boldface are for set 6, other values are for set 5.
.009 .015 .010 .014 .001 .012 .003 ,004 .022
,002 .014 .009
.006 .012 .010 .161 .076 1 1
.005 .019
.014 .039 .lo2 ,035 .062 ,008 .025 .006
.046 . I 56 .loo ,440 .457 1 1
.ooo .011 ,057 .078 .039 .048 ,008 .019 ,018 .035 ,116 .077 .322 .349 .271 ,222 1 1
01
a "H
"n 1
u1 u2 u3
c156 "HPl
""PI
'Pl
228
MARVIN CHARTON Table 23, Correlation Matrices for Sets 7 and 8^
/
^/
a
''H
^n
1 1
.092 .118 1 1
.155 .173 .706 .704
.781 .779 .293 .311 .577 .591 1 1
.270 .319 .457 .443 .503 .498 .494 .530 1 1
^1
^HPI
^nPI
'Pl
.372 .390 .656 .654 .405 .402 .452 .462 .319 .313 1 1
.015 .005 .016 .009 .015 .010 .012 .006 .013 .025 .010 .014 1 1
.068 .025 .073 .041 .068 .048 .054 .029 .058 .117 .044 .063 .452 .482 1 1
.052 .019 .056 .031 .052 .037 .042 .022 .045 .088 .034 .048 .346 .365 .228 .189 1 1
Note: ^Values in boldface are for set 8 only. The other values are for set 7.
charges will be attractive and those between like charges will be repulsive we have defined the ionic interaction parameter, «•-, as taking the values 1 when the interaction is between unlike charges, - 1 when it is between like charges, and 0 when no interaction is present. We have correlated sets 11 and 12 with the equation:
Table 24, Correlation Matrices for Sets 9 and 10 a 1
"H
.690 1
/'
"n
.724 .000 1
.282 .500 .866 1
'^xiee .000 .000 .000 .000 1
'^HPl
.000 .000 .000 .000 .000 1
'^nPl
.000 .000 .000 .000 .000 .381 1
'p^
.000 .000 .000 .000 .000 .302 .316 1
a "H "n
i "X166
"HPI "nPI '156
Application of IMF Model
229
The coefficients of and statistics for the best regression equations obtained for sets 11 and 12 are given in Table 16. The correlation matrix for correlations of sets 11 and 13 with Eq. 52 is given in Table 25 and sets 12 and 14 in Table 26. Plots of log (^ca/^M)caic against log (kJK^X^^ and log (l/K^\^^^ against log (l//^M)obs ^re shown in Figures 18 and 19, respectively. The results obtained are a dramatic improvement over our earlier attempts, particularly in view of the fact that no data points were excludedfrom these sets. Though these results are excellent they do not prove that the interionic interaction requires a side chain with the structure (CH.2)J where / is an ionizable group and n is greater than two. In order to provide further evidence on the validity of the conclusion that the Asp side chain in position 166 is not involved in interionic interactions with side chains in PI the correlations with Eq. 16 were again carried out after assigning n- values of 1 to Aspl66-Lys(Pl) combinations and - 1 to Aspl66-Glu(Pl) combinations. Again, the coefficients of and the statistics for the best regression equations obtained for log (k^JK^) (set 13) and for log (1/^^) (^^^ ^^) ^^^ ^^^ ^^^^^ ^^ Table 15. The correlation matrix for sets 11 and 13 differs from that for sets 12 and 14 only for the zero-th order partial correlation coefficients of other variables with ^^5^ and n-. These matrices are reported in Table 24. Although the difference is small the results for sets 11 and 12 are indeed better than those for sets 13 and 14. Taken together with the fact that Aspl66 does not as an outlier in any correlation, it seems that this residue probably does not interact significantly with ionic side chains in PI of the substrate. Validity of the Model
In order to provide a further test of the model we have excluded six data points from sets 11 and 12 giving sets 15 and 16, respectively. The data points excluded were chosen to provide a wide range of side chain structure at positions S156, S166, and PI. The results of the correlations for the best regression equations are given in Table 15. The coefficients of the regression equations for sets 15 and 16 are in very good agreement with those for sets 11 and 12. The major differences are that the borderline dependence on / observed in set 11 has disappeared in set 15, while the borderline dependence on Dj in set 12 has disappeared in set 16. We believe that in both cases this is due to the smaller number of degrees of freedom in sets 15 and 16. These results strongly support the validity of the model. It is also of interest to determine how well the model can predict new values of log (^ca/^m) ^^^ ^^S (1/^m)- Calculated values of these quantities obtained from sets 15 and 16 together with the differences A between the observed and calculated values are reported in Table 27. The results for the data points not included in the correlations are given in bold face. The results show that it is possible to make
Table 25. Correlation Matrices for Sets 11 and 1 3a Q/
1
a .015 1
"H
"n
I
u1
u2
.115 .766 1
,867 .262 .489 1
.278 .493 .408 .407 1
.444 ,751 ,581 .590 .441 1
,503 .825 .675 .660 .515 .868 1
h,
W 0
Note: 'Values in the column headed
'3
c156
"HPl
"nPl
'Pl
"ii
.166 ,944 .623 .360 ,511 .774 ,908 1
.164 .038 .114 .145 .129 ,108 .184 .114 1
,005
,021 .024 .036 ,026 .024 .028 ,015 .017 .156 .415 1
.016 .018 .028 .020 .019 .022 .011 .013 .122 .323 .274 1
.014 .008 .012 .053 .045 .020 .024 .034 .009 .159 .295 .043 1
.005 .008
,006 .006 .006 ,034 .004 .162 1
ni are for set 13, those in the column headed nii are for set 11; all other values are for both sets.
*
"ii
,072 ,005 ,020 .Of35 .072 ,032 ,036 ,025 .014 ,021 .089 .070 1
01
a nH "n I
u1 u2
u3 c156 nHPl %PI iP1 "ii
Table 26. Correlation Matrices for Sets 12 and 14a 61
1
N
w,
a
"H
"n
.015 1
.115 .766 1
.867 .262 .489 1
I
.278 .493 .408 .407 1
U1
.444 .751 .581 .590 .441 1
U2
'3
.SO3 .825 .675 .660 ,515 .868 1
,166 .944 .623 ,360 .511 .774 .908 1
6156
.153 .117 .008 .083 .116 .032 .046 .089 1
"HPl
.005 .005 .008 .006 .006 .006 .034 .004 .068 1
"nP7
.021 .024 .036 .026 ,024 .028 .015 .017 .116 .415 1
.016 .018 .028 .020 ,019 .022 .011 .013 .091 .324 ,274 1
Note: Walues in the column headed n: are for set 14, those in the column headed n,,are for set 12; all other values are for both sets
*
"ii
"ii
,014 ,008 .012 .053 ,045 ,020 .024 .034 .033 .159 .295 .043
.072 .005 .020 .085 .072 .032 ,036 .025 .053 .021 ,089 ,070
1
1
'P7
01
a nH
nn
i U,
Ua ~j
nHp1 n,,pl
ipl "ii
232
MARVIN CHARTON
SetS11 7^ 65log k(cat)/K^ ~
• • • ^^
210-
\
1 2
1
r— 1
1
3 4 5 6 " log k(cat)/K(M),obs
Figure 18. log (k^jK^^cak) vs. log (/ccat//^M.calc).
reasonable predictions of log {k^JK^ and log (1/^^^) from the regression equations for sets 11 and 12. The Effect of Substitution at Position 166
The predominant structural effects on log {k^JK^ values resulting from substitution at position 166 based on the results for set 11 are due to polarizability and to steric effects resulting from the first and third segments of the side chain. These effects account for 21.0 and 40.7%, respectively, of the overall structural effect. Hydrogen bonding accounts for another 11.5%. There is a borderline dependence on Gj and on /. The results obtained for substitution at this position contribute most
SetS12
log 1/K(M)c;
1—I—I—I
r
1.5 2 2.5 3 3.5 4 4.5 5 log 1/K(M)obs Figure 19. log (1//CM,calc) vs. log (1//cM,calc)-
Application of IMF Model
233 Table 27. Values of Q„. and A'
S156 Glu
" II II
Gin Ser Glu
" " Gin Ser Gin Ser Glu
"
S166
PI
log^KtA
Asp Glu Asn Gin Asp
Lys
3.68 4.34 4.25 3.97 3.68 3.68 4.19 4.47 4.28 4.28 4.28 4.25 4.25 2.96 3.70 3.70 3.70 4.79 4.17 5.36 5.27 4.79 4.79 5.30 5.58 5.39 5.39 5.39 5.36 5.36 5.36 5.98 6.09 6.09 3.60 2.97 4.17 3.89 3.60 3.49 4.11 4.38 4.19 4.19
II
Met Ala Gly
" M
Asn
" Arg Lys
Gin
"
Ser Glu
II
II II II
Gin Ser Glu II II
Gin Ser Gin Ser Glu
" Gin Ser Glu II II II
Gin Ser Glu
Asp Glu Asn Gin Asp
Met
II
Met Ala Gly
" II
Asn
" Arg Lys II
'• Asp Glu Asn Gin Asp
"
'• "
Met Ala Gly
Gin
"
Gin
A 0.53 0.14 0.00 0.13 0.73 0.56 0.51 0.43 0.32 1.25 0.91 0.50 0.57 0.23 0.53 0.47 0.03 0.98 0.31 0.34 0.27 0.24 0.12 0.34 0.07 0.24 0.09 0.38 0.59 0.36 0.04 0.17 0.12 0.07 0.58 0.09 0.32 0.47 0.20 0.08 0.22 0.04 0.24 0.52
logK^ 2.64 3.55 3.09 3.09 2.74 2.90 3.29 3.24 3.29 3.40 3.56 3.20 3.36 2.65 2.63 2.74 2.89 3.64 3.64 4.10 4.75 3.75 3.90 4.30 4.30 4.30 4.30 4.30 4.20 4.36 4.57 4.97 4.65 4.81 2.90 2.90 3.36 3.36 3.01 3.56 3.56 3.56 3.56 3.67
A 0.54 0.14 0.02 0.06 0.48 0.17 0.60 0.48 0.16 1.00 0.86 0.46 0.56 0.41 0.30 0.01 0.05 0.71 0.36 0.13 0.23 0.23 0.22 0.53 0.16 0.26 0.02 0.43 0.66 0.28 0.35 0.52 0.03 0.09 0.34 0.05 0.22 0.28 0.07 0.47 0.37 0.01 0.13 0.50 {continued)
234
MARVIN CHARTON Table 27, Continued
S156 Ser Gin Ser Glu
" Gin Ser Glu II
Gin Ser Glu
" Gin Ser Gin Ser Glu II
Gin Ser Note:
SI66
P1
" Asn
" Arg Lys II II
Asn Gin Asp
" Met Gly
" " Asn
" Arg Lys
" "
Glu
logk^jK^ 4.19 4.17 4.17 4.16 4.90 4.90 4.90 1.90 1.63 1.21 1.33 1.85 1.93 1.93 1.93 1.90 1.90 3.19 3.92 3.77 3.92
A 0.19 0.34 0.40 0.10 0.20 0.26 0.06 0.28 0.43 0.09 0.10 0.65 0.39 0.86 0.66 0.14 0.01 0.28 0.17 1.05 0.29
log K^ 3.82 3.47 3.62 3.83 3.81 3.92 4.07 2.36 2.36 2.45 2.16 2.56 2.56 2.67 2.82 2.47 2.62 3.74 3.72 4.26 3.98
A 0.03 0.29 0.20 0.33 0.07 0.24 0.13 0.14 0.24 0.66 0.03 0.26 0.27 0.31 0.10 0.25 0.16 0.44 0.53 0.40 0.42
^Values in boldface are for data points w h i c h were excluded from sets 15 and 16.
to the overall effect. The results obtained for the log (l/K^) values in set 12 show a largest dependence on H^^^^ with substitution at position 166 and at PI having about the same magnitude. Sets in which substitution at PI is constant (sets 2E, 2Q, 2M, 2K) show a dependence on hydrogen bonding as does set 6; this for about 9.3% of the overall effect. There is a borderline dependence on steric effects at the first side chain segment. The Effect of Substitution at Position 156
There is certainly a dependence on substitution at position 156 for log (1/^^^)' there may be a dependence for log (k^JK^) as well. Sets 9 and 10 suggest however that there is no dependence on either polarizability, hydrogen bonding, or ionic side chains at this position. This may be due to an error in the assumption that steric effects at the second and third segments of the side chain are negligible, an error in the assumption of a constant electrical effect, or both. As the study of structural effects at position 156 involves only three residues no conclusion can be reached.
Application
of IMF Model
235
The Effect of Substitution at PI Structural effects resulting from substitution at PI in the substrate have an important effect on both log (k^JK^) and log (1/^^)- ^^^^ ^ through 12 show that n^ is significant while / is the major variable, accounting for well over 50% of the structural effect. In the case of log (k^JK^) there may also be a significant dependence on AZ^ as well. Due to the small number of residues studied these results must be considered at best semiquantitative. Salt Bridge Formation What is most striking about these results is the important contribution of ion-ion interactions between Lys, Arg, and Glu side chains in position 166 and in PI. Asp side chains in this position and Glu side chains in position 156 both seem to have little or no effect. E. Hirudin Values of the inhibition constant K^ for the inhibition of thrombin by substituted recombinant hirudins (r-hir) in which Vail and/or Val2 were replaced by other residues were determined by Wallace and coworkers^ ^ and are reported in Table 28. They have been correlated with the equation, Q^ = LJ:C4 + Ala^ + H^Zn^^ + H^'Ln^ + ILi^ + 521)^ + B^
(53)
where the superscript A indicates that the value of the independent variable is the difference between the value for the side chain X and the value for the side chain of Val, the residue in that position in the wild-type. Thus: v^ = vxf-vx'
(42)
where v is an independent variable, X^ designates the side chain of the residue Aax in the substituted protein, and ]C that of the side-chain in the wild-type or unsubstituted protein. The sum of the variables for the residues at positions 1 and 2 was used as the parameter. Had the substitution at positions 1 and 2 been parameterized separately the number of data points would have been insufficient to permit any
Table 28.
/C, Values for the Inhibition of Thrombin by Hirudin Modified at the N-Terminal Positions^
Set PRB21. Xaal, Xaa2, K,; Val, Val, (wt), 0.231; lie, He, 0.099; Phe, Phe, 0.238; Leu, Leu, 9.91; Ser, Ser, 175; Lys, Lys, 152; Gly, Gly, 694; Glu, Glu, 57000; Leu, Val, 0.235; Val, Leu, 10.3; Glu, Val, 295; Val, Glu, 248 Note: ^Data from ref. 9.
236
MARVIN CHARTON Table 29. Parameter Values for Recombinant Hirudins^ Xa«
XaahXaa2 Val,Val^
^nl
la^
0
^l
Z/^
la^
0
0
0
0
-0.04
0.092
0.52
0.04
0.300
0 0
0
Phe,Phe
0 0
0
-0.12
Leu, Leu
liejle
0
-0.04
0.092
0
0
0
0.44
Ser,Ser
0.20
-0.156
2
4
0
-0.46
Lys,Lys
-0.02
0.158
4
2
2
-0.16
Gly,Gly
-0.02
-0.280
0
0
-1.52
Clu,Glu
0.12
0.022
0 2
8
2
-0.16
Leu,Val
-0.02
0.046
0
0
0
0.22
Val,Leu
-0.02
0.046
0
0
0
0.22
Glu,Val
0.06
0.011
1
4
1
-0.76
Vai^Glu
0.06
0.011
1
4
1
-0.76
Note: ^Recombinant hirudin.
analysis. The parameter values used in the correlations are reported in Table 29. The best regression equation obtained is: log Kj = 0.520(±0.100)Zn^ - 1.38(±0.500)Z\)| + 0.249(±0.310)
(54)
100R\ 81.14; Adj. 100/?^ 79.25; F, 19.36; 5est, 0.866; 5°, 0.501; n, 12 CnnAy 86.2; CuA, 13.8. The correlation matrix for Eq. 53 is reported in Table 30. Figure 20 shows a plot of log ^i.caic against log K^ ,obs*
The structural effect of substitution at positions 1 and 2 of hirudin is almost entirely due to the hydrogen bonding parameter n^ though steric effects make a significant contribution. It must be noted however that there is significant collinearTable 30. Correlation Matrix for Equation 53
K
^l
0.254
0.395
1
0.043
1
^
2a«
1
I/^
21)^
0.758
0.237
0.232
0.165
0.157
0.677
M Za«
0.593
0.809
0.144
2"H
1
0.739
0.126
2"n
1
0.045
2i«
1
Application of IMF Model
237
Hirudin •1
43log K(l)calc 2 -
• • • •
1 —
0- 1
L
' ^
• 1
f •
• •
~i 1 1 1—1—1 -2 -1 0 1 2 3 4 5 log K(l)obs
Figure 20.
log /Cj^calc vs. log
ity between Zn^ and both Z/ and Za^. We therefore cannot exclude the possibility that there are significant contributions from dipole-dipole interactions, and iondipole interactions as well as hydrogen bonding interactions in which the hirudin residue supplies the lone pair. As there is collinearity between Za and Zi) we cannot exclude the possibility of a small contribution from polarizability. It cannot be large because the steric effect term accounts for only about a seventh of the overall substituent effect. This result is in accord with the conclusion of Wallace et al. that replacement of the two N-terminal amino acids in r-hir by polar amino acids resulted in an increase in the inhibition constant. F. L case/Thymidylate Synthase
Climie et al.^^ have reported k^^^ values for the conversion of deoxyuridine monophosphate (dUMP) to thymidylate monophosphate (TMP) with 5,10methylene tetrahydrofolate (CH2H4folate) as the reagent catalyzed by mutants of L. easel Thymidylate synthase in which Val316, the C-terminal residue is substituted. Also reported were K^ values for the interaction of dUMP and CH2H4folate with the mutants. These values are given in Table 31. They were correlated with the IMF equation in the form, Qx = ^^ix + ^ % + H,n^^ + H^n^^ + h^ + S\y^ + B^n^^ + B^n^ + B^n^^ + B'
(55)
which uses the composite parameterization of the steric effect. The best regression equation obtained for k^^^ is: log k^^^^ = ~1.85(±0.848)a;^ -
0.287(±0.0635)AZ^;^
238
MARVIN CHARTON
LCTS1 0.50log k(cat)ca-0.5 -1-1.5.2 ^
•
• •
1
••
'
"
'
1
1
1
•
•
•
1
1
1
-2 -1.5 -1 -0.5 0 0.5 1 log k(cat)obs Figure 21, log/ccat.calc vs log
+ 2.96(±0.396)\);^ ~ 0.461(±0.160)n2x " 1.46(±0.234)
(56)
lOOR^, 88.16; Adj. lOOR^, 85.62; F, 24.20; 5est, 0.264; 5^, 0.405; n, 18. nf a, \), 0.486; a, n2, 0.622; \), m, 0.605. Ca, 13.1; CnH. 8.85; C^, 63.8; C„2, 14.2. A plot of log A;^at,caic ^gainst log /:cat,obs ^^ given in Figure 21. The steric effect of the side chain in position 316 seems to be the major factor in determining the activity of a mutant. This may involve the ease of formation of the final ternary complex. The dependence on polarizability is in accord with binding involving ii (dispersion)
L. case! thymylidate synthase, folate 2.52log K_m,ca1.5 1 0.5-
0-1 1
• • •
m
• ••
•
• _• •
•
•
1
•
~~1
\
1
1.5 2 2.5 log K_m,obs
Figure 22. log /C^^^calc vs. log
•
Application of IMF Model
239
L. casei thymylidate synthase, dUMP 0.7 0.6
1—I—r
0 0.10.20.30.40.50.60.70.8 log K_m,obs Figure 23.
log /Cm^calc vs. log K^^ohs-
interactions between the mutant side chain and the (3 and y carbon atoms of Thr^"^ with which it is in contact.-^^ Correlation of i^^ values for CH2H4folate with Eq. 55 gave as the best regression equation: log K^^cH^^folate = 0.155(±0.0463)n„ - 0.516(±0.119)ni + 2.47(±0.147) (57)
100/?^ 65.41; Adj. 100/?^ 63.25; F, 14.18; 5est, 0.263; 5°, 0.644; n, 18. C„n, 23.2; C„., 76.8. Plots of log A:^ caic against log A'in,obs ^ ^ shown in Figures 22 and 23. Although the fit is poor the F test shows that the results are significant at the 99.9% confidence level. Again, the effect of the mutant side chain is largely steric, with some contribution from hydrogen bonding. There is no dependence on polarizability however. Correlation of K^ values for dUMP with Eq. 55 gave as the best regression: log K^^auMP = -1.56(±0.562)a - 0.233(±0.0666)i - 1.14(±0.249)\) + 0.403(±0.0735)ni + 0.260(±0.0739) + 0.222(±0.0992) + 0.694(10.0902) (58)
100/?^ 81.22; Adj. lOOR^, 73.39; F, 7.927; S^su 0.0951; 5°, 0.554; n, 18. ny. a, \), 0.486; a, n2,0.622; a, n3,0.801; D, nu 0.669; \), M2, 0.605; n2, ^3,0.487, Ca, 15.7; Cv, 10.2; Cu, 35.2; C„„ 17.7; C^^, 11.4; C„3, 9.76.
240
MARVIN CHARTON Table 31. Values of k^^^ and K^ for L. Casei Thymidylate Synthase
Aax, kcat (s"^), KjCH2H4folate) (^iM), K^(dUMP) (^iM): Val(w)}, 5.5, 14, 2.9; lie, 3.8, 35, 2.2; Leu, 1.3, 84, 1.7; Phe, 1.3, 65, 2.2; Thr, 1.2, 140, 3.5; Cys, 1.1, 77, 1.6; Ala, 0.81, 370, 1.2; Met, 0.65, 120, 2.5; His, 0.55, 50, 1.6; Ser, 0.54, 180, 1.7; Asn, 0.39, 1 70, 1.4; Gin, 0.32, 280, 3.1; Tyr, 0.29, 170, 2.4; Glu, 0.15, 830, 2.5; Lys, 0.12, 85, 1.2; Trp, 0.050, 300, 1.5; Arg, 0.020, 130, 1.5; Gly, 0.030, 380, 5.6
The effect of the mutant side chain is once more primarily steric, with an important contribution from polarizability. In view of the small range of the side chain effect the fit of the model is surprisingly good. G.
r. thermophilus Clutamyl-tRNA Synthase
Nurek and coworkers^^ have reported K^ values for the interaction of T. thermophilus glutamyl-rRNA synthase with rRNA^^", Glu, and ATP (sets tRNA, G, and ATP, respectively). Also reported were k^^^ values. The data are presented in Table 32. They were correlated with the equation, Qx = ^crf + Aa^ + H^n^ + H^n^ + li^ + 82^2 + ^^3^3 + ^"^
(^^)
which uses the segmental steric effect parameterization. Zeroth order partial correlation coefficients are given with the other statistics beneath the regression equations. The best regression equations for the K^ values are for tRNA: log K^ = - 1.04(±0.211)/^ - 1.16(±0.496)\)^ + 0.614(±0.177)
(60)
100R\ 73.89; AlOO/?^ 71.52; F, 14.15; 5est, 0.378; 5^, 0.583; n, 13; C, 61.2; Cu2, 38.8; ra'.a, 0.704; ra\n". 0.910; r^^xy^, 0.540; ray, 0.523; ra^, 0.500; rn",i, 0.803; r„v» 0.665; ruV» 0.569 ForG: log K^ = - 0.495(10.154)4 - 0.157(±0.0597)AZ^ + 1.10(±0.292)/^ + 1.50(±0.566)D^- 1.95(±0.507)\)^ + 1.839(10.131)
(61)
100/?^ 80.85; AlOO/?^ 71.27; F, 5.910; 5est,0.212; 5^, 0.596; n, 13; Cn", 13.3; Cn% 4.22; C/, 29.6; Cy,\ 23.0; C^\ 29.8 » r.. values for Eq. 61 are the same as those of Eq. 60. And for ATP on the exclusion of the data point R358Q:
Application of IMF Model
241
T. thermophilus glutamyl-tRNA synthase 05 •
•
0.40.3log K_m,ca^ ^ _
0.1 J 0-
•• • • •
•
• •
••
1
0 0.5
1
1
1
1
1 1.5 2 2.5 3 log K_m,obs
Figure 24. log /C^^calc vs. log K^^^hs, tRNA
log K^ = -2.42(±0.448)a^ + 0.157(10.0598)/^ - 0.462(10.145)1)^ + 1.467(±0.0691)
(62)
lOOR^, 82.09; AlOO/?^ 78.11; F, 12.22; 5est, 0.103; 5^, 0.518; n, 12; Ca, 57.0; C/, 16.1; C^,^ 26.9.; ra»,a, 0.703; ra\n^ 0.910; rc\^,^ 0.537; ra,n^ 0.521; r a ^ , 0.520; rn",i, 0.788; rnW, 0.668; rt^V^ 0.552 Plots of log K^ ^^j^ against log K^ ^^^ are given in Figures 24-26. Steric effects and ionic interactions are present in all three data sets.
T. thermophilus glutamyl-tRNA synthase 2.5
log K_m,ca
1—I—I—\—I—I—r
0.8 1 1.21,41.61.8 2 2.22.4 log K_m,obs Figure 25. log /C^^calc vs. log K^^obs, Glu.
242
MARVIN CHARTON
T. thermophilus glutamyl-tRNA synthase 2.5
log K_m,ca
1.2 Figure 26,
n 1 \ r 1.4 1.6 1.8 2 log K_m,obs
2.2
log /C^^^caic vs. log /C^^obs, ATP.
Correlation of the k^^^ values with Eq. 59 gave the best results on the exclusion of S276A and S299A. The regression equation is: log k^^^ = 2.00(±0.533)a^ - 1.80(±0.604)\)^ + 2.23(±0.597)\)^ + 0.426(±0. I l l )
(63)
100/?^ 89.27; AlOO/?^ 86.59; F, 19.41; ^est, 0.284; 5^, 0.411; n, 11; Ca, 16.7; Cu2, 37.2; Cx)3, 46.1; ra\a, 0.735; ra\/, 0.695; ra,/, 0.553; r„v» 0.524; rn"j, 0.782; r„v, 0.595; rnV» 0.639; rx,v, 0.739. Unlike the correlations with K^ there is no dependence on ionic interactions; like the K^ correlations there is a dependence on steric effects. A plot of log ^cat,caic against log /:cat,obs ^^ 8^^^^ ^^ Figure 27. H. Rat Trypsin
Corey and Craik^"^ have reported K^^ and k^^^ values for the hydrolysis of Z-GlyProArg-(7-amino-4-methylcoumarin) by rat trypsins substituted at positions 57, 102, and 195 at pH 8.0 and pH 10.1. Their data is reported in Table 33. It was assumed that at pH 10.1 the ionization of His was suppressed. Thus, the values of / for His are 1 at pH 8.0 and 0 at pH 10.1. As the substitutions at positions 102 and 195 were invariably D102N and SI95A they are represented by the indicator variables ^102 ^^^ ^^195 which take the value 1 when substitution has occurred and 0 when it has not. The correlation equation used has the form: Qx = Lcf + Aa^ + / / i 4 + H^n^ + li^ + Sv^ + ^102^2 + ^i95«t95 + ^"^ ^^"^^
Application
of IMF Model
243
T. thermophilus glutamyl-tRNA synthase 1.5
.2-1.5-1.0.5 0 0.5 1 1.5 2 log k_cat,obs Figure 27,
log /C^ .ale vs. log
Values of r., significant at the 90% confidence level or greater, are given below the regression equations. The best regression obtained for ^^.^^ at pH 8.0 is: log k^^^ = -5.31(±2.18)af - 0.618(±0.121)4 + 1.94(±0.365)/^ - 0.738(±0.221)ni95 - 0.0998(±0.126)
(65)
100/?^ 84.73; AlOO/?^ 79.64; F, 11.10; 5est, 0.281; 5°, 0.498; n, 13; C^\ 12.1; Cn", 16.3; C/, 51.4; Cn'^\ 19.6; rcj\n^ 0.827; rc\i, 0.679; ra,n, 0.706; ra,/, 0.568; rn'',/, 0.713; r„v, 0.723. andatpH 10.1 is: log k^^^ = 9.71(±4.44)af + 27.0(±6.22)a^ - 1.35(±0.404)«^ • 7.16(±2.44)\)^ + 2.714(±0.558)
(66)
Table 32, Values of K^ and k^^^ for T. thermophilus Glutamyl-tRNA Synthase XiposXf K^(tRNA^'")(^M), Kn,(Glu)(^M), K^(ATP)(^M), k^atis"^): wt, 2.73, 12.0, 23.0, 2.39; D1 60A; 1 72.4, 81.5, 41.7, 0.659; S276A, 24.7, 12.9, 46.1, 0.945; E282A, 422.4, 166, 72.3, 1.06; S299A, 2.70, 12.7, 58.1, 0.00727; L300S, 6.10, 28.6, 77.5, 1.36; W312Y, 21.0, 8.00, 65.4, 1.87; W312C, 3.43, 131, 132, 0.0312; R317Q, 40.7, 83.8, 36.2, 3.13; R349Q, 59.1, 53.3, 27.5, 1.28; R350Q, 21.5, 32.1, 53.1, 0.957; R358Q, 27.5, 103, 112, 3.03; R426Q, 55.0, 45.2, 39.8, 2.76 wt, wild-type
244
MARVIN CHARTON
Rat trypsin -0.5 ^ log k_cat,c -1 -1.5 -2
— I
\
1
1
—
-2 -1.5 -1 -0.5 0 log k_cat,obs Figure 28.
0.5
log /(cat,calc vs. log /(cat,obs/ P H 8.
100/?^71.74; AlOO/?^ 62.33; F, 5.078; 5est, 0.857; 5^, 0.678; n, 13; C^y^, 6.08; Ca, 34.6; Cn", 9.42; Cx), 49.9; rcy\n% 0.830; ra',/, 0.759; ra,m 0.689; ra,/, 0.545; r „ v , 0.518; r,//,/, 0.695; rnn,/, 0.764; rn%•^^ 0.529 Plots of log /:
1^ against log Z:^^ obs ^^^ shown in Figures 28 and 29. It is clear that the results at pH 8.0 are very different from those at pH 10.1. There is no dependence on either polarizability or steric effects at pH 8.0; at pH 10.1 they represent more than 80% of the overall structural effect. Correlation of the K^ values with Eq. 64 gave at pH 8.0 the regression equation:
Rat trypsin
T
- 2 - 1 0 1 2 3 log k_cat,obs Figure 29.
log /Ccat,calc vs. log /Ccat,obs/ pH 10.
Application of IMF Model
245
l o g / ^ ^ ^ = 0.132(±0.0503)Az^ + 0.529(±0.114)nio2+ 1-331(±0.0691) (67)
lOOi?^71.74; AlOO/?^ 62.33; F, 5.078; Sesu 0.857; S°, 0.678; n, 13; Cc', 6.08; Ca, 34.6; Cn", 9.42; C^, 49.9; rcj\n% 0.827; ra,n, 0.706; r^",/, 0.695; rn",u 0.794 At pH 10.1: log K^^ = -3.80(±1.31)af - 5.05(±1.69)a^ + 0.450(10.117)4 + 1.67(±0.710)\)^ + 0.619(±0.149)A1IO2 + 0.929(±0.177)
(68)
100/?^ 78.58; AlOO/?^ 69.06; F, 5.870; Sesu 0.265; 5°, 0.612; n, 14; Ca\ 8.53; Ca, 23.2; Cn", 11.2; C^), 41.6; Cn»^ 15.4; ra\n", 0.797; ra,n, 0.686; rn% 0.675; r;,",/, 0.787 Exclusion of the data point for D102N gives much improved results: log K^^
= -2.92(±1.01)af - 3.18(±1.41)a^ + 0.353(±0.117)AZ^
+ 1.13(±0.0555)\)^ + 0.699(±0.113)^102 + 1.097(±0.143)
Rat trypsin
1
1.2 1.4 1.6 1.8 2 log K_m,obs
Figure 30. log /C^^calc vs. log K^^^hs^ pH 8.0.
(69)
246
MARVIN CHARTON
Rat trypsin
log K_m.ca
1 1.21.41.61.8 2 2.22.4 log K_m,obs Figure 31. log /C^^calc vs. log K^^obs^ pH 10.1.
100/?^ 88.08; A100i?^ 82.13; F, 10.35; 5est, 0.194; 5°, 0.470; n, 13; Ca\ 8.69; Ca, 19.3; Cn", 11.7; C^, 37.2; Cn^^ 23.1. nj values are the same as those for Eq. 67 Plots of log K^.
against log A:„
obs ^ ^ shown in Figures 30—32. As the coefficients of Eq. 69 are not significantly different from those of Eq. 68 but the fit is much improved, the data point D102N is an outlier. Though K^ at both pH 8.0 and pH 10.1 is a function of n^^ and /1JQ2 at the higher pH it is highly dependent on polarizability and steric effects.
Rat trypsin
log K^m.ca
I I \ I r 1.2 1.4 1.6 1.8 2 2.2 2.4 log K_m,obs Figure 32. log /C^^calc vs. log /C^^obs. pH 10.1.
Application of IMF Model
247
Table 33, Values of k^^^ and K^ for Rat Trypsin XiposXf^ kcat(min"^)(pH 8.0), K^{^M)(pH 8.0), kcat(min"^)(pH 10.1), K^(^M)(pH 10.1): wt, 3200, is, 2700, 19; H57A, 0.054, 17, 0.11, 20; H57L, 0.075, 20, 0.16, 21; H57D, 0.78, 13, 0.71, 17; H57E, 0.69, 21, 0.63, 25; H57K, 0.83, 41, 5.2, 48; H57R, 0.01 7, 67, 0.65, 160; D102N, 1.3, 4.2, 140, 13; H57A/D102N, 0.1 7, 87, 7.5, 130; H57D/D102N, 0.18, 62, 0.48, 130; H57K/D102N, 0.41, 18, 6.2, 130; H57L/D102N, 0.13, 41, 4.9, 230; H57A/D102N/S195A, 0.038, 89, 0.041, 1 70; 5195A, 0.079, 41, 0.057, 45.
I. Human Growth Hormone II
Cunningham and coworkers^^ determined EC5Q values for the dimerization of a labeled human growth hormone (hGH) mutant, S257C-AF by other hGH mutants. S257C-AF was prepared by reacting the thiol group of the Cys at position 257 (the terminal position) with 5-iodoacetamidofluorescein. The data set is reported in Table 34. Also reported are values of the ratio ^^s(i,xm/^^5QM (''mut/wt)' which gives a comparison of mutant activity to that of the wild-type. The ratio is used to identify residues that are involved in the dimerization and are therefore part of the receptor site. A value of r^^ut/wt greater than or equal to 2 is considered to indicate a receptor site residue. The EC5Q values for mutants bearing such residues were correlated with the equation: Qx = L(5^ + Aa^ + H^n^^ + H^n^^ + li^ + S{y)\ + 52^2 + ^3^)3 + BT The best regression equation is:
Hunnan growth hormone
1 r 0 0.5 1 EC_50.obs Figure 33,
log ECso^calc vs. log ECso^obs-
(70)
248
MARVIN CHARTON Table 34, Values of EC5Q for Human Growth Hormone
XiposXf^ r^^^^: F1 A, 2.9, 5; I4A, 30, 55; I6A, 1.4, 3; R8A, 1.8, 3; R19A, 0.92, 2; Y111 A, 1.0, 2; K11 5'A, 0.84, 2; D116A, 3.1, 6; El 18A, 0.96, 2; El 19A, 1.1, 2.
log £C5o AX = 5.57(±0.985)\)f + 0-0850 (±0.0779)
(71)
100^^, 80.01; F, 32.01; 5est, 0.222, S°, 0.500; n, 10 A plot of EC3Q ^,^1^ against EC5Q QJ,^ is given in Figure 33. Diimerization seems to be dependent on the difference in steric effect of the first segments of the initial and final side chains in the mutant.
V. THE IMF METHOD AS A BIOACTIVITY MODEL A. Peptide and Protein Bioactivities
The peptide bioactivity models described in this work include all types of peptide substitution except that at the N atom of the peptide bond. The protein bioactivity models described include those involving substitution at one or two positions and those involving the substitution at positions that are part of the receptor site of one or more different residues. The models of peptide and protein bioactivities presented here combined with those reported previously"^ for amino acids, peptides, and proteins provide support for both the specific application to amino acids, peptides, and proteins, and to the general application of the IMF method to all bioactivities. B, The Hansch-Fujita Model
It has been shown that if all the necessary pure parameters are included in the composite parameters and if enough them are used, then a model constructed from composite parameters is completely equivalent to one which uses pure parameters in representing the data.*^^ The only advantage in using pure parameters is the ease of interpretation of the results. In its use of lipophilicity parameters such as log P, log k\ or 71, the Hansch-Fujita (HF) model uses composite parameters.^"-^ The HF model often requires in addition to transport parameters the use of electrical effect, steric effect, and polarizability parameters and occasionally dipole moment^^ parameters as well. These parameters are needed because the composition of a particular transport parameter may not be the same as that of a particular
Application of IMF Model
249
type of bioactivity. This is not surprising. The probability that all biomembranes and all receptor sites will require the same pure parameter composition is extremely small. This conclusion is supported by the review of Seydel and coworkers.^^ The addition of electrical, steric, and polarizability terms adjusts the parameter composition to that of the bioactivity being studied. To illustrate the point let us consider a typical HF correlation equation: log ba^ = Tx^ + pQ;^ + AMR^ + 5\) + i5^
(72)
where ba is the bioactivity; a is a composite electrical effect parameter of the Hammett type; x is a transport parameter such as log P, 7C, or log k'\ MR is the group molar refractivity; D is a steric parameter; and T, p, A, 5, and 5°, are coefficients. As was noted above a is given by the expression: o^^lOi^-^dG^^rc^^^h
(73)
MR^ = 100(a;^ + 0.0103) = lOOa^^ + 1.03
(74)
5„ = 5;t), + ^2a)2 + 5;u3 + 5:
(75)
Equation 3 gives:
Equation 8 gives:
X is given by the equation: % = l^lX + O O ^ + R^eX + ^^X + til^HX + til^nX + ^^X
+ M\ix + S{0^x + Sp^ + S:^-0^x + B>
(76)
Substituting Eqs. 73 through 76 into Eq. 72 results in: log ba^ = (L + p,)a„ + (D + p^)a^ + (/? + f))a^^ + (A + \mA*)a^ + i/jn^x + ^a^nx + ^h + Mjix + (iSi + 55i)\)ix + (^2 + SS*^M2x + (^3 + SS\)\)j,^ +fi°+ p/i„+1.03A' + 55;
(77)
250
MARVIN CHARTON
which may be rewritten as: log ba^ = L'Oi^ + D'G^ + R'a^x + A'a^ + H^rif^^ + H^n^ + li^ + Mii^ + ^jDj;^ + 5^\)2;^ + 5^1)3;^ + ^'(9
(78)
This is a form of the IMF equation. Then based on the success of the HF model bioactivity is a function of the difference in intermolecular forces between initial and final states. This does not mean that transport parameters should not continue to be used in modeling bioactivities. It simply provides an explanation of the manner in which they work. It is vital to recognize that any combination of pure and/or composite parameters which has the correct composition will serve to quantitatively describe a phenomenon. It is not necessary to use transport parameters. Bioactivities can be correlated directly either with the IMF equation or with any convenient combination of pure and composite parameters.
VI. APPENDIX: STATISTICS REPORTED FOR THE CORRELATIONS lOOV?, the percent of the variance of the data accounted for by the regression equation. AlOO^, the lOOR^ value corrected for the number of independent variables in the correlation equation. The difference between lOOR^ and AlOOR^ serves as a measure of the quality of the model. The smaller the difference the better the model. F, the value of the F test which is a measure of the goodness of fit of the model. Sesty the standard error of the estimate. S^, the standard error of the estimate divided by the root mean square of the data. This is also useful as a measure of the goodness of fit of the model. n, the number of data points in the set. Ty, the zeroth order partial correlation coefficients. They serve as a test for collinearity among the parameters. Values of r// are given only for those pairs of parameters which exhibit extensive collinearity.
Application of IMF Model
251
Statistics 1 through 5 may be used as a measure of the goodness of fit of the model for a given data set, all of these except S^^^ may be used in comparing the goodness of fit of one data set with that of another.
ABBREVIATIONS hb hydrogen bonding dd dipole-dipole di dipole-induced dipole induced dipole-induced dipole ii ct charge transfer Id ion-dipole Ii ion-induced dipole Vd\^^ van der Waals IMF intermolecular force Ab bromoacetamino Cm carbamoyl Mg maleoylglycine (0) sulfoxide Pv pivaloyl Cha cyclohexylalanine Cpg cyclopentylglycine Cbz benzyloxycarbonyl Boc r-butoxycarbonyl Nle norleucine Orn ornithine Thg 2-thienylglycine Pe pentyl Pn phenylene c cyclo Ak alkyl Aax amino acid with side chain X Bta butanoic acid
Dbt Dpe Mep
3,5-dibromotyrosine Deaminopenicillamine P-mercapto-P,|3-diethylpropionic acid Mma a-mercapto-a,a-dimethylacetic acid Mmp P-mercapto-P,P-dimethylpropionic acid Mpa P-mercaptopropionic acid Pen penicillamine bromoacetyl Ba Phe(F5) pentafluorophenylglycine mesyl Ms Pa proprionylamino triglycyl Tg Chg cyclohexylglycine Hse homoserine benzyl Bzl sulfone (O2) Nva norvaline Pym pyridylmethyl Sta 3-hydroxy-4-amino-6methylheptanoic acid hexyl Hx naphthyl Nh X^ replacement
REFERENCES 1. Charton, M. In Rational Approaches to the Synthesis ofPesticides; Magee, P. S.; Menn, J. J.; Koan, G. K., Eds.; American Chemical Society: Washington, DC, 1984, pp 247-278. 2. Charton, M. In Trends in Medicinal Chemistry '88\ van der Goot, H.; Domany, G.; Pallos, L.; Timmerman, H., Eds.; Elsevier: Amsterdam, 1989, pp 89-108.
252
MARVIN CHARTON
3. Charton, M. Classical and 3-D QSAR in Agrochemistry and Toxicology; American Chemical Society: Washington, DC, 1995, pp. 75-95. 4. Charton, M. Prog. Phys. Org. Chem. 1990,18,163-284. 5. Charton, M. In Lipophilicity in Drug Action and Toxicity; Pliska, V.; Testa, B.; van der Waterbeemd, J. Eds.; VCH: Weinheim, 1996, pp 387-400. 6. Charton,M.;Ciszewska,G.R.;Ginos,J.;Standifer, K. M.;Brooks,A. I.;Brown,G. P.;Ryan-Moro, J. R; Pasternak, G. W. Quant. Struct. Act. Rel. 1998, 77,109-121. 7. Charton, M. Prog. Phys. Org. Chem. 1987,16, 287-315. 8. Charton, M. In Design of Biopharmaceutical Properties Through Prodrugs and Analogs; Roche, E. B., Ed.; American Pharmaceutical Society: Washington, DC, 1977, pp 228-280. 9. Charton, M. Topics Current Chem. 1983,114, 57-91. 10. Charton, M. Stud Org. Chem. 1992,42, 629-687. 11. McFarland, J. W. Pwg. Drug Res. 1971, 75, 173. 12. Pliska, v.; Charton, M. Proc. 11th Am. Peptide Symp. 1990, pp 290-292. 13. Pliska, v.; Charton, M. J. Receptor Res. 1991, 77, 59-78. 14. Free, S. M.; Wilson, J. W. J. Med Chem. 1964, 7, 395-399. 15. Pliska, v.; Heininger, Int. J. Peptide Protein Res. 1988, 31, 520-536. 16. Dellaria, J. R; Maki, R. G.; Bopp, B. A; Cohen, J.; Kleinert, H. D.; Luly, J. R.; Merits, I.; Plattner, J. J.; Stein, H. H. J. Med Chem. 1987,30, 2137-2144. 17. Nisato, D.; Wagnon, J.; Callet, G.; Mettefeu, D.; Assens, J.-L.; Plouzane, C ; Tonnerre, B.; Pliska, v.; Fauchere, J.-L. J. Med Chem. 1987,30, 2287-2291. 18. Bock, M. G.; DiPardo, R. M.; Evans, B. E.; Rittie, K. E.; Boger, J.; Poe, M.; LaMont, B. I.; Lynch, R. J.; Ulm, E. H.; Vlasuk, G. R; Greenlee, W. J.; Veber, D. F J. Med Chem. 1987,30,1853-1857. 19. Bolis, G.; Fung, A. K. L.; Greer, J.; Kleinert, H. D.; Marcotte, R A.; Rerun, T. J.; Plattner, J. J.; Stein, H. H. J. Med Chem. 1987,30,1729-1737. 20. Kempf, D. J.; deLara,E.; Stein, H. H.;Cohen, J.; Plattner, J. J. J. Med Chem. 1987,30,1978-1983. 21. Luly, J. R.; Yi, N.; Soderquist, J.; Stein, H. H.; Cohen, J.; Rerun, T. J.; Plattner, J. J. J. Med. Chem. 1987,50,1609-1616. 22. Hui, K. Y; Carlson, W. D.; Bematowicz, M. S.; Haber, E. J. Med Chem. 1987, 30, 1287-1295. 23. Charton, M.; Prog. Phys. Org. Chem. 1981,13,119-251. 24. Charton, M. Environ. Health Perspec. 1985,61, 229-238. 25. Alber, T.; Bell, J. A.; Dao-Pin, S.; Nicholson, H.; Wozniak, J. A.; Cook, S.; Matthews, B. W. Sci. 1988,259,631-635. 26. Charton, M. Coll. Czech. Chem. Commun. 1990, 55, 273-281. 27. Fersht, A. R., Shi, J-R, Knill-James, J., Lowe, D. M., Wilkinson, A. J., Blow, D. M.; Brick, R; Carter, R; Waye, M. M. Y; Winter, G. Nature 1985,57^, 235-238. 28. Charton, M. Int. J. Peptide Protein Res. 1986,28, 201-206. 29. Cunningham, B. C ; Wells, J. A. Science 1989,244,1081-1085. 30. Wells, J. A.; Powers, D. B.; Bott, R. R.; Graycar, T. R; Estell, D. A. Proc. Natl. Acad. Sci. 1987, 84, 1219-1223. 31. Wallace, A.; Dennis, S.; Hofsteenge, J.; Stone, S. R. Biochemistry 1989,28,10079-10084. 32. Climie, S. C ; Carreras, C. W; Santi, D. V. Biochemistry 1992,57, 6032-6038. 33. Nurek, O.; Vassylyev, D. G.; Katayanagi, K.; Shimuzu, T.; Sekine, S.; Kigawa, T.; Miyazawa, T.; Yokoyama, S.; Morikawa, T. Science 1995,267,1958-1965. 34. Corey, D. R.; Craik, C. S. J. Am. Chem. Soc. 1992, 77^, 1784-1790. 35. Cunningham, B. C ; Ultsch, M.; de Vos, A.M.; Mulkerrin, M. G.; Clauser, K. R.; Wells, J. A. Sci. 1991,25< 821-825. 36. Charton, M.; Greenberg, A.; Stevenson, T. A. J. Org. Chem. 1985,50, 2643-2646. 37. Lien, E. J.; Guo, Z.-R.; Li, R.-L.; Su, C.-T. J. Pharm. Sci. 1982, 71, 641-655. 38. Seydel,J.K.;Coats,E.A.;Cordes,H.R;Wiese,M.Arc/i.Pharm.(Weinheim) 1994,327,601-610.
INDEX
Ab initio, see Quantum Abraham hydrogen bond, 15, donor parameter, 146 Acid-base reactions, 36 Acidity, 53 Activation energy 3, Activity-determining, 208 Adsorption, 2, 5, 7, 9-10-19 constant, 19, 20 Aliphatic, 10,21,24,72, 148 Alkanes,190 Alkyl group, 105, 113, 194, 196, 197, 207 Amino acids, 21, 23, 178, 187, 198, 237 Aromatic hydrocarbons, 139, Bader analysis 46 Basicity, 50, 58, 76, Benko method, 138, 167 Bondi, see Volume Binding, 13, 15-16, 18,22-3,28, constant, 9 energy, 2,10-11, 24, 183 Bioactivity, 178, 181, 196, 205, 208 Biocomponent, 201 Biomembrane, 181 Biparametric model, 38 Biphenyl, 147
Branching equation extended(EB), 180 simple(SB), 180 Bronsted acid, 54 Bulk parameter, 12, 16, 24, 145 Buttressing effect, 51 Calorimetry, 5, 165 Cations, 5, 80, Charge-transfer, 178, 194 Chemisorption, 18 Chromatographic properties adsorption, 9, 17, 21, 24 , 26, 28, capacity factors, 178 relative flow rate, 178 retention times,2,15-16, 178 Clausius-Clapeyron equation, 127 Complex formation, 181 Conformation, 183 Connectivity, 19 Correlation, 81 analysis, 39 coefficient,43,59, 143, 187 equation, 195, 202, 216, 226 Covalent bond, 183 Critical temperature(K), 135 Crystal lattice, 129, 134 253
254
distortion, 168 thickness, 162 Cyclohexyl, 190, 198 Debye, see Intermolecular forces Dehydrohalogenation, 100, Delocalization, 61 Differential thermal analysiss (DTA), 133 Dispersion forces, 131, 187, 238 Dissociation constants, 52, 209 Disulfide, 185 Elasticity coefficient, 135 Electrical effect parameters, a, 178, 248-249 Electronegativity, 22 difference, 169 Electron-atom ratio, 169 Electron donor, 8, 63, 69, 71, 96, 105, 112,117, Electron-withdrawing, 58, 86, 88, 9495,101,112,117, Electronic demand, 37, 179, 188 effect, 5, 37 Electrophilic(PH),4 Electrostatic, 45, 51, Eliminations, 83, 92, 94-95, 101, 105, 119, Enantiomers, 188, 197 Enthalpy, change, 45, 48, 54, 56 fusion, 151, 154, 156, 159 melting, 129 sublimation, 148 Entropy boiling, 130 expansion, 130 fusion, 130, 148, 159 melting, 129, 153 rotation, 130
INDEX
Enzyme, 201 Equilibrium constant, 2, 18, 29, 44 Eutectic effect, 129 Extended branching equation (EB), 180 Face-centered cubic, 128 Factor analysis, 145 Fatty acids, 135 Fermi energies, 168 Field effect 44, 54, 67, 73, See Substituent constant Force-field calculation, 65 Fisher statistic, 7 Free energy , 3, 5, 52, adsrption, 7 binding, 8 melting, 129 Free-Wilson analysis, 184 Freezing point, 128 Freundlich see adsorption constant Fusion, 129 Gibbs energy, 42, 49, 57, 63, 66, 72, Halogenated hydrocarbons, 141, 150, 153 Hammett equation, 38 , 94, 97, 109, 115, see Substituent constant Hancock parameter, 83, 87, Hansch hydrophobic substituent parameter, 146 Hansch-Fujita model(HF), 248 Hydration, 5, 80, Hydrocarbons, 143, 148, 152 Hydrogen bonding, 2,3, 14, 16, 21,23, 24,30,39,148,151,194,205, 234 complex formation, 178 intermolecular, 153, 179 intramolecular, 131, 154 Hydrophobicity, 178
255
Index
Hygroscopicity, 131 hyperconjugation, 85, Indicator variable, 5, 16, 19, 21, 24, 187,205 Inductive effect, 3, 22, 37, 58, Inhibition, 235 Intermolecular forces, 31, 129 ,178 charge transfer, 179 dipole-dipole,2,3, 179, 237 dipole-induced dipole, 2,3,179 hydrogen bonding, 179, 187, 236, 237 induced dipole-induced dipole, 2, 17, 19, 179, 238 ion-dipole,48, 179, 237 ion-induced dipole,48, 179 Intermolecular force equation (IMF), 149, 178,181 Ionic bonding, 131, 235 Ionic charge, 180 Ionization, 194, 229 Isomers (see Enantiomers), 200 Keesom see Intermolecular forces Kofler hot bench, 132 Lanthanide rare earths, 168 Lewis acid, 14, Lindemann equation, 168 Lipid-soluble, 181 Lipophilic(PL), 4, 22, 24 LFER, Linear free energy relationship, 72,107, 167 Linear relationship, 36,46,59,68,75, 83, 100, London see Intermolecular forces Lone pair electrons, 179, 185 Log P, 3,21,28, 178 Lysozomes, 209
Melting point, 127, 131, 178 estimation methods, 139,144, 155 160-1, 168, 170 homologous series, 134, 136 inorganic compounds, 167, 170 isomers, 133 paraffins, 137 polymers, 136, 162 Metallocenes, 167 Metastable, 128 Molar refraction, 3 surface, 166 Mole-fraction solubility, 150 Molecular connectivity, 19, 141, 148, 167 eccentricity, 146 field analysis, 167 structure, 178 symmetry, 146 volume, 148 Molecular orbital theory, 170 NMR see Spectroscopy Neural network, 167 Non-hydrogen bonding,, 148, 152, 155, 156 OUgomers, melting point, 162 Omega method, 201 Partial correlation coefficients, 187 Partition coefficient, 128, 151 Peptides, 178,181, 183, 188, 248 Polarizability, a ,38, 47, 85, 179, 188, 191,201,205,234,238,240, 246 Polarization, 3, 98, Polymers, 2,4, 21, inorganic, 5 melting point, 161-171 Principal component analysis, 45
256
Proteins, 178, 208-10, 235, 239, 247, Proton transfer, 56, 75, Protonation, 23, 52, 56, 58, Prud'homme'srule, 138, 167 relationship, 146 Quantum calculations, 43, 50, chemical properties, 166 QSAR, quantitative structure-activity QSPR, 2 relationship, 146 semiquantative,(SQSAR), 207 p value, 71, 85, 88,91,98,102,112, 14, Reactivity, 36 Receptor binding, 178, 181 Regression, 167, coefficient, 181 equation, 181, 187, 191, 202, 211, 216, 240, 243,245, 248, Resonance demand, 39, 59, 79, effect, 4, 37, 38, 54, 67, 73, 85, 101, interaction, 60, 91 Retardation, 2 Rhombohedral, 128 Rotation, 147 Salt bridge, 235 Segmental model, 181 Semiconductors, 170 Sidechain, 180,181,185, 193 Silica, 5-6 Simple branching equation, (SB), 180 Solubility, 10,129 Solvent effect, 56, 75, 78, 112, Spectrophotometry, UV, 113, Spectroscopy, 5, 54, 56, Steric effects,3, 4, 13, 27, 51, 65, 180, 205, 240, 246 Steric parameter, % 180, 201, 212,
INDEX
composite steric parameter, D, 26, 85180,181 strain energy, 66, 92, Structural parameters, 145, 190, 225 Structure, 2, 3, 36, 56, Substituent, 9, 12, constant, a , 4, 6, 26, 37, 41,48, 60, 95,106,110, effects, 36, 57, 69, 88, 99, 102, 110, parameter, 40, 93, 146 Substrate, 181, 184,203,213 Surface, 3 Swain-Lupton parameters, 146 Symmetry rotational, 130, 142 tetrahedral, 180 Thermal conductivity, 138 Thiele apparatus, 132 Topological descriptors, 150 indices, 141, 150, Transfer, 3, 8 Transitition state, 79, 83, 92, 103, 104, 106,113,114,183 Transport, 181,248,250 UPPER, unified physical property estimation relationships, 159 Upsilon, see Steric parameter, composite Uracil, 129 Valence electron density, 169 van der Waals 2, 180 Vibrational forces, 129 molecular descriptors, 144, 150 Viscosity, 151 Volume, 18, 19 Bondi, 3 geometric, 148 Wiener index (W), 141,164
Index
WHIM, weighted holisitc invariant vibrational frequency, 135 Yukawa-Tsuno equation, 38, 62, 81, Zeta method, 203
257
This Page Intentionally Left Blank