PROTEIN
A Comprehensive Treatise Volume 2 •
1999
This Page Intentionally Left Blank
PROTEIN
A Comprehensive Trea...
96 downloads
1187 Views
14MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
PROTEIN
A Comprehensive Treatise Volume 2 •
1999
This Page Intentionally Left Blank
PROTEIN
A Comprehensive Treatise
Editor:
GEOFFREY ALLEN London, England
VOLUME 2 • 1999
(jEn) J AI P R E S S I N C
Stamford, Connecticut
Copyright © 1999 by JAI PRESS INC. 100 Prospect Street Stamford, Connecticut 06904 All rights reserved. No part of this publication may be reproduced, stored on a retrieval system, or transmitted in any way, or by any means, electronic, mechanical, photocopying, recording, filming, or otherwise without prior permission in writing from the publisher. ISBN:
1-55938-672-x
Manufactured
in the United States of America
CONTENTS
List of Contributors Preface Geoffrey Allen Chapter 1 Protein Crystallography Anirhuddha Achari and David K. Stammers Chapter 2 The Chemistry of Protein Functional Groups Gary E. Means, Hao Zhang, and Min Le Chapter 3 Electrostatic Effects in Proteins: Experimental and Computational Approaches Norma M. Allewell, Himanshu Oberoi, Meena Hariharan, and VinceJ. LiCata Chapter 4 The Binding of Ions to Proteins Jenny P. Glusker Chapter 5 Protein Folding Franz X. Schmid Chapter 6 Thermodynamics of Protein Folding and Stability Alan Cooper
v
vi Chapter 7 Protein Hydrodynamics Stephen E. Harding INDEX
CONTENTS
27) 307
LIST OF CONTRIBUTORS
Anirhuddha
Achari
Glaxo Wellcome Medicines Research Centre Stevenage, Herts, England
Norma M.
Allewell
Department of Biochemistry University of Minnesota St. Paul, Minnesota
Alan Cooper
Chemistry Department Glasgow University Glasgow, Scotland
Jenny P. Glusker
Institute for Cancer Research The Fox Chase Cancer Center Philadelphia, Pennsylvania
Stephen E. Harding
School of Biology University oi Nottingham Sutton Bonington, England
Meena
Department of Biochemistry
Hariharan
University of Minnesota St. Paul, Minnesota Gary E. Means
Department of Biochemistry The Ohio State University Columbus, Ohio
Min Le
Department of Biochemistry The Ohio State University Columbus, Ohio
VinceJ. LiCata
Department of Biochemistry University of Minnesota St. Paul, Minnesota
vii
LIST OF CONTRIBUTORS
VIII
Himanshu
Oberoi
Department of Biochemistry University of Minnesota St. Paul, Minnesota
Franz X. Schmid
Biochemisches Laboratorium Universitat Bayreuth Bayreuth, Germany
David K. Stammers
Laboratory of Molecular Biophysics University of Oxford Oxford, England
Hao Zhang
Department of Biochemistry The Ohio State University Columbus, Ohio
PREFACE
In Volume 1 of this series, the structures of protein molecules were described, together with computational methods linking sequence data to folded structure and function. The determination of protein structure by nuclear magnetic resonance spectroscopy was also presented. The current volume begins by continuing the theme of protein structure with an outline of methods of crystallographic structure determination. Subsequent chapters describe various structure-related properties of proteins. The chemistry of protein functional groups, with emphasis of reagents used to chemically modify proteins, is covered in Chapter 2. Complementary chapters on electrostatic effects in proteins and on the binding of ions to proteins follow. The topic of protein folding is also described in two chapters, one on pathways of folding and the other on thermodynamics of protein folding and stability, areas of significant recent advances in understanding. Finally, the hydrodynamic properties of proteins, reflecting primarily their molecular size and shape, are covered in Chapter 7. I thank the authors for their contributions, which should be valuable to those new to the field of protein science as well as to those already expert in various aspects of the field. Geoffrey Allen Editor ix
This Page Intentionally Left Blank
Chapter 1
Protein Crystallography ANIRHUDDHA ACHARI and DAVID K. STAMMERS
2 2 3 3 4 6 7 8 8 9 10 10 11 12 14 16 16 16 17 19
Abstract Introduction Protein Crystallization The Crystallization Process Factors Affecting Crystal Growth Crystallization Methods Diffraction of X-rays Crystallographic Data Collection Synchrotron Sources Detection of Diffracted X-rays Data Reduction and Processing Methods of Phase Determination Molecular Replacement Multiple Isomorphous Replacement Anomalous Dispersion Map Improvements Density Modification Maximum Entropy Techniques Structure Refinement Final Model and Validity of the Structure
Protein: A Comprehensive Treatise Volume 2, pages 1-22 Copyright © 1999 by JAI Press Inc. All rights of reproduction in any form reserved. ISBN: 1-55938-672-X
1
2
ANIRHUDDHA ACHARI and DAVID K. STAMMERS
ABSTRACT Almost all three-dimensional protein structures known were determined by means of X-ray crystallography. Protein crystals of high quality are required, and various methods are available for obtaining these, including such procedures as partial proteolysis, addition of antibody Fab fragments, and protein engineering. A brief outline of the X-ray diffraction process, including sources and detection of X-rays, is presented. A major hurdle to identifying the structure from the measured intensities is the determination of the associated phases. Multiple isomorphous replacement using heavy atoms remains an important technique. An array of computer software is available for refining structures, including molecular dynamics methods.
INTRODUCTION Knowledge of the three-dimensional structure of proteins is now routinely used for the understanding of the functional properties of such macromolecules. It is now considered an essential framework on which to bring together and rationalize diverse biochemical and genetic data. In addition, knowledge of the binding sites of ligands to macromolecules can be used in the design of novel inhibitors with potential for development as drugs. Such advances in our knowledge have been the result largely of technological developments. Firstly, the techniques of recombinant DNA and heterologous expression of proteins have made available in sufficient quantities a vast array of proteins that were previously impractical to purify from natural sources of cells or tissues. Technological developments in the production of X-rays and highly sensitive area detectors have been of key importance. This coupled with computer hardware and software capable of structure refinement as well as computer graphics for model building have been important factors in giving rise to rapid structure determinations of an increasing number of complex proteins and macromolecular assemblies. There is currently an exponential growth in the reporting of structures in the Brookhaven Protein Databank. The vast majority of the three-dimensional protein structures determined to date have been by the use of X-ray crystallography. This remains the most general method for three-dimensional structure determination of proteins, as it is applicable to proteins of molecular weights greater than 700 kDA. By contrast, NMR methods, in spite of the significant developments in multidimensional methodology and high field spectrometers, are currently limited to de novo structure determination of proteins with an upper molecular weight limit of 25 kDA. One prerequisite for X-ray crystallographic structure determination of proteins is the growth of suitable crystals. Crystals must be of suitable size and internal order to enable the recording of high-resolution X-ray data. This is in some cases a nontrivial problem and can require particular attention to the purity of the protein preparation and the setting up of numerous crystallization trials. The crystallization
Protein Crystallography
3
stage can still represent the rate-limiting step in the structural analysis for some proteins. We describe in this chapter some of the methods used in the crystallographic determination of proteins by X-ray diffraction including the obtaining of suitable protein crystals for the analysis.
PROTEIN CRYSTALLIZATION In contrast to many of the stages in structural analysis of a protein such as X-ray data collection, calculation of electron density maps, and refinement of the protein model, the crystallization of proteins is the least well understood part of the whole structure determination process. Thus developments in this field are largely the result of empirical knowledge. Such experimentally derived methodology has expanded greatly from the early years of the subject. The relative lack of understanding of protein crystallization is the result of the complex nature of proteins that, as large polyelectrolytes of low symmetry, have properties that vary with a wide range of factors such as pH, temperature, and ionic strength among others. A selection of review articles and practical guides to protein crystallization are available and should be consulted for more details including experimental protocols (McPherson, 1982, 1990; Carter, 1990; Ducruix and Giege, 1992). In this chapter, we give a brief overview of the crystallization process together with an update on some of the current developments in this field. The Crystallization Process
In common with the crystallization of small molecules, the crystallization of proteins is achieved by producing a supersaturated solution. This is a metastable state that is thermodynamically unstable and achieves equilibrium by either forming precipitate or crystals. Crystallization is characterized by three stages: initially there is a nucleation stage, which is then followed by a growth phase and finally the cessation of growth. Spontaneous nucleation consists of the formation of a stable aggregate that then provides surfaces suitable for the growth of a crystal. Crystal growth is halted when either the protein concentration is lowered as a result of the crystallization process or there is deformation of the lattice or the presence of impurities blocks the growing crystal faces (Weber, 1991). The crystallization process is entropically unfavorable. This is as a result of the loss of translational and rotational degrees of freedom of the molecules as they are packed into a crystal lattice. In the case of proteins, there is as well a constraining of surface loops within the crystal. To counterbalance this there has to be a favorable gain in enthalpy to give an overall free energy change that can drive the crystallization process. This enthalpic gain is derived by addition of an agent such as salt which competes with the water that solvates the exposed amino acid side chains on the protein surface. This results in a desolvation effect that leads to favorable
A N I R H U D D H A ACHARI and DAVID K. STAMMERS
4
interactions with neighboring molecules and hence crystallization (Weber, 1991). Essentially there are three types of agent that can compete with bulk solvent and thereby induce proteins to crystalize. These are salts such as ammonium sulphate or sodium phosphate, organic solvents such as ethanol, methylpentanediol (MPD) or isopropanol, or thirdly, long-chain polymers such as polyethylene glycol. In addition to their use as single agents, these precipitants have also been used in various combinations. The available scientific literature on conditions for protein crystallization has been collated in the Biological Macromolecule Crystallization Database [BMCD] (Gilliland and Bickham, 1990). Analysis of this indicates that ammonium sulphate is the most commonly used precipitant for crystallization of proteins followed by PEG 6000 and MPD. Factors Affecting Crystal Growth
There are many factors that are known to affect the growth of protein crystals. McPherson (1990) has listed 26 factors that are considered to be important in his experience. Some distinction can be made between extrinsic factors on the one hand and variants within the protein itself on the other. Some of these two classes of variants are listed in Table 1 and discussed below. Extrinsic Factors
The variation of pH can affect the ionization of certain amino acid side chains and hence their interactions with neighboring molecules and in turn their crystallization properties. Variation in pH of less than 0.5 of a pH unit can affect crystal growth. Temperature can affect both the solubility of a protein as well as its stability. Generally protein crystals are obtained either at 4 °C or close to room temperature (-22 °C). Ionic strength is a key factor in promoting crystallization from high salt
Table 1. Some Factors Important in Crystallizing Proteins Extrinsic Factors
Intrinsic Factors
pH, Temperature Ionic strength Protein concentration Precipitant type and concentration Crystallization method Metal ions, Detergents Ligands, Fabs Seeding Microgravity
Purity Different species Protease digestion Truncated forms Single point mutations Aggregation state Glycosylation state
Protein Crystallography
5
conditions. Some proteins have a solubility minimum at low ionic strength and hence can be crystallized by dialyzing against low salt. The actual crystallization method can be of importance as to whether crystals appear. The methods used can be classified into three common types: batch, vapor diffusion, and equilibrium dialysis. These are described in more detail below. Additions of metal ions that are not necessarily of functional or structural importance in the native form of the protein have important roles in bridging molecules in the crystal lattice. The use of nonionic detergents, particularly noctyl-glucoside, was first developed as a method for the crystallization of membrane proteins but has since been shown to be of benefit in the case of soluble proteins (McPherson, 1990). It is thought that the detergent might be reducing nonspecific hydrophobic interactions. A similar effect might be the result of the addition of a few percent of organic solvents such as DMF or ethanol. (Miller et al., 1989). It has long been observed that the presence of a protein ligand such as a substrate or inhibitor can have dramatic effects on crystallization. This can be the result of a conformational change in the protein or a general tightening up of the structure to give a more rigid molecule. This is seen in the generally observed greater resistance to proteolysis of ligand-bound forms compared to their apoprotein equivalents. The addition of a Fab fragment of an antibody has been found useful in the crystallization of proteins such as neuraminadase, (Laver, 1990). It could be the result of stabilizing an outside loop or just the additional surface of the Fab of an antibody providing an extra region on which crystal contacts can be formed. The use of seeding methods can be crucial in obtaining large crystals suitable for a structural analysis. Two basic methods are employed: these are the macroscopic method where small crystals are washed prior to introduction into a new supersaturated protein solution. The second method is the use of microscopic seeding where a crystal is crushed, the solution diluted, and a small amount introduced into a supersaturated protein solution. An alternative variant on this is the "streak" seeding method using a cat's whisker (Stura and Wilson, 1990). Methods for the crystallization of proteins under conditions of microgravity in space have been developed in the last 10 years and for two proteins improvements in diffraction or crystals with better morphology have been observed (DeLucas et al., 1989). Variants within the Protein
In the category of intrinsic variants, the importance of high purity protein cannot be overstressed. Early work on proteins prior to the availability of recombinant DNA methods emphasized the importance of preparing a protein of interest from different species as the variations in surface residues present can be of key importance in producing usable crystals. Limited protease digestion can be used to clip off "floppy" regions of the protein giving rise to a more rigid core domain. Once the points of proteolytic cleavage have been identified, then recombinant
6
ANIRHUDDHA ACHARI and DAVID K. STAMMERS
DNA methods can be applied to express large quantities of these domains for functional characterization, crystallization trials, and structure determination. An exciting development in the field is the use of protein engineering to modify a protein so as to improve its crystallization properties. This is illustrated in the elegant work on HIV integrase (Dyda et al., 1994). This protein had previously resisted all attempts at crystallization, largely as a result of its tendency to aggregate. Putative surface regions of the protein that might give rise to hydrophobic interactions were identified. Site-directed mutants that contained either alanine or lysine replacing hydrophobic residues were constructed. Screening of a variety of singlepoint mutants revealed that the mutant with tyrosine 185 changed to lysine demonstrated an increase in the solubility of the expressed protein. This mutated form of the catalytic core domain of HIV integrase was found to crystallize easily and in turn lead to the rapid determination of the structure (Dyda et al., 1994). The presence of posttranslational modifications of proteins expressed in eukaryotic systems can be a source of heterogeneity giving rise to the poorly ordered crystals. This can be overcome in some cases by inhibiting the modification completely within an expression system as, for example, in inhibiting the glycosylation of CD2 (Davis et al., 1993 ) thereby giving a more homogeneous preparation. The aggregation state of the protein preparation can be of crucial importance in determining whether a protein can be crystallized. Aggregation can be the result of the method of purification or some intrinsic property of the protein. The method of dynamic light scattering has been successfully applied to screen protein preparations for aggregation or polydispersity. A good correlation has been observed between monodisperse protein preparations and their ability to crystallize (Zaluaf andD'Arcy, 1992). Crystallization Methods
The earliest method used for crystallizing proteins was the batch method. This has the disadvantage of being relatively expensive in terms of material. Various methods of equilibrium dialysis have been developed (Zeppezauer, 1971), but by far the most common method now used is that of vapor diffusion (McPherson, 1982). With this, usually equal volumes of protein (generally in the range of 5-30 mg/ml) and the precipitant are mixed. In the case of "hanging drop" vapor diffusion, the protein/precipitant are placed on a siliconized cover slip that is inverted and sealed with vacuum grease over a reservoir of precipitant solution. Vapor equilibration occurs between the droplet and reservoir giving rise to supersaturation and hopefully crystal growth. An alternative is the use of "sitting-drops" in which the protein is placed on a bridge with the reservoir below. For the crystallization of certain proteins, it might be necessary for many thousands of crystallization conditions to be surveyed. To reduce the amount of manual labor involved and improve precision, a number of automated systems have been developed. These range from modified pipeting stations (Cox and Weber,
Protein Crystallography
7
1987) to fully automated robots including video camera monitoring of crystal growth and associated database record keeping (Jones et al., 1987). In deciding on the best use of often limited quantities of material, methods for statistical analysis to optimally sample the multidimensional space of the crystallization variables have been devised (Carter and Carter, 1979). An alternative approach is to produce standard sets of conditions, usually about 50, based on the most commonly used crystallization conditions (Jancarik and Kim, 1991). This sparse-matrix approach has proved extremely successful in crystallizing a wide variety of proteins. As the reagents are commercially available, it is the simplest first step in attempting to crystallize a protein. Various other sets of standard conditions are available and many laboratories create their own, based on experience with particular proteins.
DIFFRACTION OF X-RAYS A crystal can be considered as a diffraction grating made up of regular, repeating molecules or atoms known as a unit cell. A unit cell is defined by three axes denoted a, b, and c and three interaxial angles, a between b and c; P between a and c ; y between a and b. Planes of atoms in a crystal are assigned indices known as Miller indices, which are reciprocals of intercepts of that plane on a, b, and c, the axes of the unit cell. Thus a plane parallel to a and b will have indices 001 and a plane with indices 235 means that intercepts are 1/2, 1/3, 1/5 on a, b, and c axes respectively. Seven crystal classes are defined. 1. 2. 3. 4. 5. 6.
Triclinic Monoclinic Orthorhombic Tetragonal Hexagonal or Trigonal
7. Cubic
no restrictions on a, b, or c and a, P or y P * 90°, a = y = 90°; no restriction for a,b,c a = P = y = 90°; no restriction for a,b,c a = P = y = 90°; a=b, c any dimension a = P = 90°, y = 120°; a=b, c any dimension a = P = 90°, y = 120°; a=b, c any dimension or a = P = Y*90°; a=b=c a = P = Y = 90°; a=b=c
Rotational symmetry in a crystal can be twofold, threefold, fourfold, or sixfold or a combination depending on the crystal class. (An rc-fold rotational symmetry means a pair of objects in an unit cell are related by a rotation of 360/n degree.) Screw axes are combinations of a rotation followed by a specified translation along that axis; for example, a twofold screw along b means a 180-degree rotation around b, followed by a b/2 translation along b. Rotational symmetry defines the point group of a crystal and a combination of rotational and translational symmetry assign it to its Space Group. An asymmetric unit of a unit cell is related to the other parts of the lattice by rotation and translation.
8
ANIRHUDDHA ACHARI and DAVID K. STAMMERS
Crystals act as a diffraction grating to an incident X-ray beam, and the relationship between the spacing of rows of planes of atoms J, the wavelength X, and the angle at which the emergent ray is observed is given by the equation known as Bragg's law: 2ds'mQ = nX
(1)
(JAz)sine = X/2
(2)
where n is the order of diffraction.
One can consider a reflection at an angle 9 either as first order from planes of spacing din, or as nth order from planes of spacing d. It is more convenient and practical to deal with only one order of reflections from planes of different spacing. The smallest spacing that will give a first order reflection (n = 1) is d = A/2, the limit of resolution one can get from single crystal diffraction experiments with an X-ray source of wavelength X.
CRYSTALLOGRAPHIC DATA COLLECTION Prior to the recent development of methods for cryocooling protein crystals in liquid nitrogen (Rodgers, 1994), a common feature of data collection strategies was a protein crystal mounted in a sealed glass or quartz capillary with a drop of mother liquor at one end. This is to maintain the protein crystal in a hydrated environment in which the crystals were grown. For flash freezing, the crystal is equilibrated with a cryoprotectant solution of glycerol or polyethylene glycol and is captured in a fiber loop and then rapidly frozen in a nitrogen gas cold stream (such a device can be purchased from Oxford Cryo Systems or made in-house.) The crystal is then mounted on an X-ray camera, optically aligned and centered. A beam of X-rays, either monochromated or reflected from mirrors, is then shone on the crystal and diffracted X-rays are collected by a detector. X-rays are generated in laboratories by the impact of electrons on a target (usually copper or molybdenum). The target emits X-rays as a result of an excited electron returning to K-shell from L-shell. Copper K a (X = 1.54A), E = 8 Kev is the choice for protein crystals in laboratories. In rotating anode mode of X-ray generation, the copper anode is rotated and water cooled, which allows for higher loading and a stronger X-ray beam compared to fixed anodes. Synchrotron Sources
In the late 1970's, the availability of X-rays from a synchrotron at Daresbury U.K. opened up a new dimension for macromolecular crystallographic experiments. In storage rings or synchrotrons, electrons or positrons move at relativistic velocities in a closed loop. Acceleration or deceleration of particles confined by magnets and moving at velocities close to the velocity of light emits X-rays that are
Protein Crystallography
9
captured by beam lines tangential to the storage rings. The wavelength of the emitted X-ray beam is given by X= 0.559R/E3, where R is the radius of the storage ring and E the energy of the particles. Three advantages of X-rays from synchrotrons are (a) extremely high intensity, (b) a cleaner beam of low divergence, and (c) tunability of the wavelength. Extreme care in collimation and focusing of the intense beam can generate data of high quality obtainable at a short exposure time and higher signal-to-noise ratio. Tunability and highly monochromatic X-rays offer the experimenter the ability to collect data at or near the absorption edge of the metal of a metalloprotein or a heavy atom derivative to collect high-quality anomalous data (Harada et al., 1986), (see the section on phase problems, p. 17). Another experimental technique that intense X-rays from synchrotron offers is to collect data with X-rays containing a broad range of wavelengths, known as white radiation (Laue method). This method allows the complete data to be collected in a few milliseconds and can be used to do time-resolved crystallographic snapshots of an enzyme catalysis (Hajdu et al., 1987; Helliwell et al., 1989; Shrive et al., 1990). Detection of Diffracted X-rays
Film methods were the first used for the efficient detection of diffracted X-rays from macromolecular crystals. Later, single counter diffractometers were used, but these can generally make only one measurement at a time, whereas a film can record hundreds of diffraction data simultaneously. Given that film has an advantage over single counter as an area detector, this method reemerged in the 1970s as the method of choice to collect accurate medium to high resolution data. This was dependent on the development of screenless precession (Xuong and Freer, 1971) and screenless oscillation photography (Arndt et al., 1973) as well as improved software. A crystal is mounted on a horizontal spindle of an oscillation camera and one of the principal axes of the crystal aligned along the axis of the spindle. The crystal is then rotated through an angle, governed by the size of the unit cell (0.25-1° for viruses/large proteins, 1-2.5° for small proteins) about the spindle while being exposed to X-rays, and the diffracted beams are recorded on flat, curved, or V-shaped film cassettes. The cassette contains a pack of three to six films so that the strongest reflections are attenuated and recorded within the linear range of response of the X-ray films. After collecting for a preset time, the computer moves the crystal to a new position and a fresh film cassette records the data from the current position of the crystal. Position-sensitive photon detection used in high-energy nuclear physics was the next generation of detection for crystallographic use (Charpak et al., 1968). A position sensitive photon detector is a chamber consisting of a horizontal and a vertical plane of wires and filled with xenon gas. A photon arriving through the gas-tight window of the detector ionizes xenon at a particular location and is
10
ANIRHUDDHA ACHARI and DAVID K. STAMMERS
recorded by the X-Y grid of wires. Image intensifier or fast detector employs another method in which the diffracted X-rays excite visible wavelength fluorescence from phosphor-covered fiber optics screen, and the emitted light is then converted to electrons and the image is read out by a television scanning system. These detectors can be used as multidetector arrays to collect larger volumes of data faster. Another advantage of flat detectors is that they can be placed close to the crystal or further away to resolve the closely spaced spots from a large unit cell (Cork et al., 1974). Image plates, which are europium-doped phosphors, have replaced films for data collection in laboratories and at synchrotron sources (Hendrickson and Ward, 1987). Charged coupled devices (CCD) are being developed for use as detectors at synchrotrons. Their particular advantage over image plates is a shorter read-out time. They will also soon be available in the laboratory. Data Reduction and Processing
In a crystallographic experiment, the strategy of data collection is to collect a large volume of data. All Miller indices (h k 1) and their symmetry-related mates have their intensities measured more than once. Multiple observations of a reflection h k 1 allow better scaling of symmetry-related mates and a measure of systematic error in the data such as absorption. Any data-processing software such as Xengen (Howard et al., 1985, 1987) or DENZO (Otwinosky and Gewirth, 1993) and XDS (Kabsch, 1988) goes through several steps: 1. Get the unit cell, symmetry, and misorientation angles of the crystal. 2. Index spots or predict their positions from the parameters found in (1) and integrate the observed intensities. 3. Reduce the data to a unique set. 4. Scale and reject outliers. 5. Finally, produce a data set containing scaled h k 1 and intensities (I) or amplitude (F).
METHODS OF PHASE DETERMINATION The intensity of each reflection h k 1 is the quantity one measures in an X-ray diffraction experiment. Intensity I is proportional to the square of the structure factor F, which is a complex number consisting of an amplitude F and a phase a: F
hki = FhkileicW
(3)
hence F 2 = [IFhklleiaKk.][IFhklle-" hki] - so that measured X-ray intensity I has no phase information (see Stout et al., 1968).
Protein Crystallography
11
A regular repeating function such as electron density p of a crystal can be represented by a Fourier series: p^l/VZISlFje-
2
"^^
(4)
where the triple summation is over all h k 1 values and V is the unit cell volume. One has the problem of determining the phase angle associated with experimental intensities to generate an electron density map. This is the so-called phase problem in protein crystallography. Molecular Replacement
Patterson (1935) showed that a Fourier map calculated with the "phaseless" F 2 as coefficients, known as a Patterson map or Patterson function, has peaks corresponding to all interatomic vectors. The idea of molecular replacement is to rotationally orient and then translate to the correct position a known molecule into a crystallographic unit cell. If an unknown protein structure has been crystallized, native data collected and a set of atomic coordinates are available from a closely related structure, then the known model can be used to solve the structure of the unknown protein. This is accompanied by orienting the model molecule by the rotation function as follows: R(C) = JP i (x)P m (cx)dv
(5)
where Pj is the Patterson function of the unknown crystal and P m is the Patterson function of the known molecule. C is usually represented by three Eulerian angles. Peaks in the rotation function R(C) represent the possible correct orientation of the known structure in the unknown unit cell. Interatomic vectors between atoms within a molecule (self-vectors) depend only on the structure and lie close to the origin of the Patterson map. Once the orientation of the model molecule in the unknown cell has been established, the next step is to translate the molecule in the correct position of the cell. The translation function depends on intermolecular vectors (cross vectors) between molecules related by the space group symmetry of the unknown cell and will reach a maximum when the correctly oriented molecule is stepped through the cell of the unknown (Rossmann and Blow, 1962, Fitzgerald, 1988). Alternatively, one can do an R-factor search R = Z(|Fo|-|Fc|)/2|Fo|
(6)
where IFol is the observed structure amplitude and IFcl is the calculated structure amplitude at a particular position of the cell. A minimum value of R indicates the correct location of the molecules.
12
ANIRHUDDHA ACHARI and DAVID K. STAMMERS Multiple Isomorphous Replacement
Perutz and co-workers used multiple isomorphous replacement to solve the phase problem for hemoglobin more than four decades ago. In this method, a large atom (e.g., uranium, mercury, gold) or a cluster of heavy atoms is diffused into the protein crystal. If the heavy-atom derivatized crystal remains isomorphous, the measured intensities obtained from the derivative will differ slightly from the parent. If F h, F , and Fh are the structure amplitudes of the derivatives, natives, and heavy atom respectively, then these are related by the vector sum:
We measure the amplitudes F and F h, allowing an estimate for the amplitude Fh. The difference Patterson function is the Fourier transform using the square of the difference amplitudes (F h~F ) as coefficients and shows peaks at the end of the vectors connecting atoms. The Patterson search technique is often used to locate the heavy atom(s) within the unit cell. Once one knows the location of the heavy atom, a h is known (Figure 1). In an ideal, error-free problem, the triangle will close and one can estimate the value of protein phase a . This ambiguity can be resolved with a second derivative as demonstrated by the Harker (1956) diagram (Figure 2). A circle is drawn with F , the parent structure factor whose magnitude is known but not its direction; then the heavy-atom structure factor vector -F h is added to F and a second circle of radius F h (the derivative structure factor) is drawn with its origin at the end of -Fh. The two circles interact at two places corresponding to the two possible phase angles of F . Magnitudes of F and F h are available from measured X-ray data, the magnitude of Fh is obtained from the difference of F . and F , and its phase is figured from the knowledge of the heavy atom positions. The positional parameters (coordinates, occupancy, temperature factor—either isotropic or anisotropic) of heavy atom(s) are refined by programs to minimize the difference between calculated and observed structure factors. The presence of an ambiguity of phase angle with a single derivative can be resolved by the use of a second derivative or anomalous dispersion. With the data from the second derivative (F h2 ), the correct phase a is located where the three circles intersect. Experimental data are not error-free and as a consequence phase triangles don't close and circles in Figure 1 do not intersect to give an unambiguous phase for F . Blow and Crick (1959) introduced the idea of casting the phase as a probability distribution of the form: p . s o ( a ) = * H VF ph (calc)|E 2 iso)
(8)
where E represents cumulative error and IF h-F h(calc)l is the lack of closure—a measure of how poorly the phase triangle closes. The errors and the problems due to the presence of errors arise primarily from lack of isomorphism, inaccuracies in intensity measurements, and scaling of native and derivative data sets. The opera-
Protein Crystallography
13
Figure 1. Estimation of protein phases in a single isomorphous (SIR) case. (A) Only the magnitude of Fp is known. The loci of all possible values of Fp form a circle of radius IF p l. The information from a single heavy atom derivative can be used to reduce the number of possible phase values to two. (B) Both the magnitude and phase of F H are known. The possible values of F PH correspond to a circle of radius IF P H I. If this circle is centered at -F H , then since Fp = F PH - F H , the points of intersection of the t w o circles give the two possible values of ocp.
14
A N I R H U D D H A ACHARI and DAVID K. STAMMERS
Figure 2. Estimation of protein phases with two heavy atom derivatives; multiple isomorphous replacement (MIR). The method of isomorphous replacement requires information from at least two heavy atom derivatives to unambiguously assign the phase of the parent structure factor. The case of two heavy atom derivatives is diagrammed above. The point of intersection of all three circles indicates the parent phase.
tional philosophy of MIR is to continue to collect heavy atom data from more than two derivatives until an interpretable electron density map can be calculated. Covalent modification of free sulphydryl groups of cysteine residues by mercurial compounds are often "sure-shot" derivatives. With the availability of recombinant DNA techniques, site specific cysteine mutants can be introduced; otherwise trial and error seems to remain the method of choice. A n o m a l o u s Dispersion
The phenomenon of anomalous dispersion or scattering occurs when the frequency of an incoming X-ray is close to the absorption frequency of a heavy atom; the X-ray will undergo a phase shift and become attenuated. The expression for scattering factor f of an atom then is: f=fo + f + iAf
(9)
Protein Crystallography
15
The correction term f is a negative real number representing the attenuation and iAf' represents the phase shift (Figure 3 ). The Freidel law (Fhkl = F. h k l ) breaks down in the presence of significant anomalous scattering. Although the anomalous effect is often a small difference between two large numbers representing a reflection F hkl and its Freidel mate F_h.k.!, when measured accurately it can be considered as a good second derivative to resolve the phase ambiguity. With the tunable frequency of X-rays from a synchro-
Figure 3. Effect of anomalous dispersion on structure factor; pictorial representation of Equation 9. The structure factors F(W and V(-h) can be expressed as the sums of the vectors representing the normal scatterers F (±h), and the normal, dispersion, and the absorption components of the anomalous scatterers (F ±h), F'(±h), and F"(±h), respectively. If only one type of anomalous scatter is present, then the phase of F'(±h) leads by 7i/2 that of F'(±«.
16
ANIRHUDDHA ACHARI and DAVID K. STAMMERS
tron source, there is now increasing interest in the use of anomalous dispersion to solve structures from metalloproteins and signals from sulfurs of methionines or cysteines replaced by selenium.
MAP IMPROVEMENTS Density Modification
An initial MIR map, though of sufficient quality to show secondary structure elements such as (3-strands and a-helices, is often not good enough to do an accurate chain tracing of the protein. One of the best tested methods of density modification available to crystallographers is solvent flattening, which relies on the fact that in the unit cell of a protein crystal there is bulk solvent outside the envelope defined by the protein molecule, and the electron density of this should be of a constant low value. Hence, once the location (the envelope) is defined by the criteria of contiguity of electron density for a protein and the knowledge of solvent content from either measured or estimated density of the crystal, one can get more accurate phases by attempting cycles of solvent flattening and combining the solvent-modified phases with MIR phases. Obviously the larger the solvent content, the more accurate the phase that can be obtained. If there is more than one copy of the macromolecule in the asymmetric unit, noncrystallographic symmetry can be used for averaging to yield better phases leading to a cleaner electron density map (Wang, 1985.). Software such as SQUASH (Cowtan and Main, 1993) uses Sayers' equation and density modification along with noncrystallographic symmetry (if present) to improve phases and/or extend phase to higher resolution data. SOLOMON (Abrahams and Leslie, 1996) exploits "solvent flipping" along with solvent flattening for phase improvement. Maximum Entropy Techniques
Following Shanon (1948), a unique and consistent measure of the amount of "ignorance" (uncertainty, entropy) in a discrete probability distribution containing the electron density p is given by: s = -plogp
and is immediately seen to correspond to the Boltzmann expression for entropy that arises in statistical mechanics. The basic theory of this formalism is to maximize plogp; hence the name maximum entropy methods. Several groups are working to develop ab initio phasing of macromolecules from diffraction data. Prince and his co-workers (1988) have shown that maximum entropy is a powerful technique for phase improvement/extension when the molecular envelope is available. A structure of a DNA oligomer and several small molecules were solved by a method that uses a maximum entropy formalism on cross-entropy with phases (Harrison, 1989;
Protein Crystallography
17
Miller et al., 1988). Carter and co-workers are developing techniques with maximum entropy, phase permutation, and likelihood scoring for ab initio phasing and phase improvement.
STRUCTURE REFINEMENT In small molecule crystallography where X-ray amplitudes are available to atomic/near-atomic resolution, full matrix least squares refinement is done to improve the quality of the structure. A residual R R= SjFhuCobs)- Fhki(calc)|/2hklFhkl(obs)
(10)
is minimized with respect to the coordinates, and thermal parameters of the atoms: p(r) = p0exp(-|Bi.(r-ri)|)2
(11)
where an atom i is located at xx (i = i-n^ atom of the structure) and B{ is a symmetric tensor representing the thermal motion of the atom i as an ellipsoid. The refinement is carried out in reciprocal space by calculating the structure factor: F^Ccalc) = Sp(r)exp(-27iih.r)d3r
(12)
where the integration is done over the entire volume V of the unit cell of the crystal. Elements of the normal matrix constructed from this equation are the derivatives of the structure factors with respect to rj and the thermal tensor B{. The number of parameters for anisotropic refinement of individual atoms is 3+6=9; this number is 4 (3+1) for isotropic temperature factor refinement. For small molecules diffracting to atomic resolution, the ratio of observable data to refinable parameters is large enough to have stable least-squares refinement (Hendrickson, 1985). Most macromolecules do not diffract to atomic resolution so that full matrix least-square refinement, more often than not, cannot be used in this case. In the latter half of the 1960's, Diamond (1971) suggested a real space technique of structure refinement by minimizing: (p(r)-pm(r))d3r
(13)
where p(r) and pm(r) are the electron densities obtained from the Fourier transform of the observed structure factors and the model respectively. In the late 1970s, Hendrickson (Hendrickson and Konnert, 1980) used restrained conjugate gradient techniques and included the knowledge of stereochemistry (i.e., ideal values of bond lengths, bond angles, planarity, chiral volumes, etc) in the refinement as additional observations with appropriate weights so that the ratio of observed to refinable parameters was increased. Weights for the stereochemistry are assigned from the information such as the standard deviations of bond lengths from refined X-ray crystal structures of amino
18
ANIRHUDDHA ACHARI and DAVID K. STAMMERS
acids and peptides. At the start of a refinement cycle with a crude model, often medium resolution data (4A or 3 A) is used to allow large shifts of up to 1A in the model. As the refinement progresses, higher resolution data are added in bins of resolution. Once a round of refinement converges as judged by negligible or no shift of parameters in two consecutive cycles of refinement, one calculates a difference or a 2Fo-Fc map to rebuild the model manually with the help of graphics programs such as ERODO (Jones, 1978, 1986) or O (Jones et al., 1991). The process of refinement and model building, when atoms are moved to fill electron density better, extra atoms added to fit electron density assignments, solvent molecules added, incorrectly placed atoms deleted, is iterated until a difference map is featureless. Until recently this was the method of choice for successful refinement of protein structures. The methods for arriving at this global minimum in the conventional conjugate gradient least-squares programs of Hendrickson and others is that of gradient descent. If the starting model is not close to the final model, the methods of gradient descent cannot go through the uphill barrier of peptide flips, large movements of the main chain, and so forth. The recent advance in macro molecular refinement came through the introduction of molecular dynamics (Brunger et al., 1987) and simulated annealing to explore large areas of conformation spaces with an initial model. Molecular dynamics is a technique in which energy is pumped in a system of macromolecular assembly by increasing the "temperature" of the system's coordinates and velocities of atoms are allowed to vary according to Newton's laws of motion. The initial set of crystallographic coordinates rj, obtained either from an initial chain trace of a multiple isomorphous map or a molecular replacement solution, are assigned initial velocities Vj (t=0) = dr/dt where directions of Vj are random and magnitudes of Vj are given by: v i 2 (t = 0 ) o c T
(14)
T is a nominal temperature. If the atom i with mass ir^ is acted upon by a force F k i , then the structure will change according to ai(0) = d 2 r i /dt 2 = (Z(F ld ))/m i
(15)
If the sampling in time is small enough (in femto seconds), then one can approximate the velocity and displacement at time At as Vj(At) = vj(0) + A*aj(0)
(16)
rjCAt) = rj(0) + At*Vi(0)
(17)
In crystallographic refinement incorporating molecular dynamics, the conventional force constants F ^ (i.e., force of type k on the atom i) are pseudo forces retaining
Protein Crystallography
19
the calculated structure factor to be similar in magnitude to the observed structure factors. The potential (pseudo-potential) energy function to be minimized is: E=E chem + (ZOWobs) - F ^ c a l c ) ) 2 ) / ^
(18)
where Echem includes terms from bond length, bond angles, electrostatic forces, van der Waals forces, and so on, and Gx is a weight factor. In a standard crystallographic refinement, the system is heated to a typical temperature of T = 4000 K and then slow-cooled against a heat bath for about 25 to 50 steps of 0.5 femtoseconds with a reduction of T by 25 units until T = 300K. This process explores a large area of conformational space and is capable of making large adjustments to atomic positions. XPLOR (Brunger, 1988) and GROMOS (Gros et al., 1990) are two powerful packages for macromolecular structure refinement and reduce the time spent by tae user on manual model building. The other more recent development in refinement is the automatic refinement program by Lamzin and Wilson (1993). For a well-diffracting crystal (2.0A or better), this program can, with an initial poor model, include atoms or delete wrongly placed atoms or even add solvent molecules. The input of the crystallographer, however, is critical and essential even with the use of "automatic" or "semi-automatic" refinement programs to check and judge the accuracy and validity of the refined models, that is, that they make chemical sense and fit with known biochemical and biological facts.
FINAL MODEL AND VALIDITY OF THE STRUCTURE During cycles of model building and refinement, electron density maps are generated from calculated phases, which introduces a degree of model bias to the map. To reduce the bias, 2 IFo l-Fc or in general mIFol-nlFcl (m=n+l) are used as amplitudes to generate Fourier maps. If experimental sources of phases, such as MIRs, are available, one can combine experimental and calculated phases with appropriate weights using software such as SIGMA A (Reed, 1986) or COMBINE (Z. Otwinosky, private communication) to reduce the bias. Another way of checking the validity and reducing bias is to calculate a series of omit maps in which a fragment or fragments in turn are omitted from calculating phases and then an electron density map is calculated. The omit map will reveal the fragment without any bias coming from the presence of it during phase calculation. XPLOR has the facility to calculate an annealed omit map, in which a fragment is omitted, and then the rest of the molecule undergoes a short period of molecular dynamics, and then the atoms are allowed to refine. An electron density map then reveals the omitted fragment on the map. The molecular dynamics run wipes out any residual "memory" of bias from the original phase set. Improved detection technology—Charged Coupled Device (CCD), for example—in tandem with stronger X-ray sources, clever software, and faster computing
20
ANIRHUDDHA ACHARI and DAVID K. STAMMERS
make the future of macromolecular crystallography and refinement ever more exciting.
REFERENCES Abrahams, J.P. and Leslie, A.G. (1996). Methods used in the structure determination of bovine mitochondrial Fl ATPase. Acta Crystallogr., Sect. D, 52, 30-42. Arndt, U.W., Champness, J.N., Phizackerley, R.P., and Wonacott, A.J. (1973). Single-crystal oscillation camera for large unit cells. J. Appl. Crystallogr. 6,457-463. Blow, D.M. and Crick, F.H.C. (1959). The treatment of errors in the isomorphous-replacement method. Acta Crystallogr. 12, 794-802. Brunger, A.T. (1988). Crystallographic refinement by simulated annealing. Application to a 2.8A resolution structure of aspartate aminotransferase. J. Mol. Biol. 203, 803-816. Brunger, A.T, Kuriyan, J., and Karplus, M. (1987). Crystallographic R-factor refinement by molecular-dynamics. Science, 235, 458-460. Carter, C.W., Jr. (Ed.)(1990). Protein and nucleic acid crystallization. Methods 1, 1-127. Carter, C.W., Jr. and Carter, C.W. (1979). Protein crystallization using incomplete factorial experiments. J. Biol. Chem. 254, 12219-12223. Charpak, G., Boucher, R., Bressani, T, Favier, J., and Zupancic, C. (1968). Some read-out systems for proportional multiwire chambers. Nucl. Instrum. and Methods, 62, 262. Cork, C, Fehr, D, Hamlin, R., Vernon, W., Xuong, Ng.-H., and Perez-Mendez, V. (1974). Multiwire proportional chamber as an area detector for protein crystallography. J. Appl. Crystallogr. 7, 319-323. Cowtan, K.D. and Main, P. (1993). Improvement of macromolecular electron-density maps by the simultaneous application of real and reciprocal space constraints. Acta Crystallogr. Sect. D, 49, 148-157. Cox, M.J. and Weber, PC. (1987). Experiments with automated protein crystallization. J. Appl. Crystallogr. 20, 366-373. Davis, S., Puklavec, M.J., Ashford, D.A., Harlos, K., Jones, E.Y., Stuart, D.I., and Williams, A.F. (1993). Expression of soluble recombinant glycoproteins with predefined glycosylation: Application to the crystallization of the T-cell glycoprotein CD2. Protein Engineering, 6, 229-232. DeLucas, L.J., Smith, CD., Smith, H.W., Vijay-Kumar, S., Senadhi, S.E., Ealick, S.E.* Carter, D.C., Snyder, R.S., Weber, PC, and Salemme, F.R., Taylor, G., Stammers, D.K., Powell, K., Darby, G., and Bugg, C (1989). Protein crystal growth in microgravity. Science, 246, 651-654. Diamond, R. (1971). Real-space refinement procedure for proteins. Acta Crystallogr., Sect. A, 27, 436-452. Ducruix, A. and Giege, R. (1992). Crystallization of Nucleic Acids and Proteins: A Practical Approach. Oxford University Press, Oxford, England. Dyda, F, Hickman, A.B., Jenkins, T.M., Engelman, A., Craigie, R., and Davies, D.R. (1994). Crystal structure of the catalytic domain of HIV-1 integrase: Similarity to other polynucleotidyl transferases. Science, 266, 1981-1986. Fitzgerald, P.M.D. (1988). Merlot, an integrated package of computer-programs for the determination of crystal-structures by molecular replacement. J. Appl. Crystallogr. 21, 273-278. Gilliland, G.L. and Bickham, D.M. (1990). The biological macromolecular crystallization database: A tool for developing crystallization strategies. Methods 1,6-11. Gros, P., van Gunsteren, W.F., and Hoi, W.G. (1990). Inclusion of thermal motion in crystallographic structures by restrained molecular dynamics. Science, 249, 1149-1152. Hajdu, J., Acharya, K.R., Stuart, D.I., McLaughlin, P.J., Barford, D., Oikonomakos, N.G., Kein, H., and Johnson, L.N. (1987). Catalysis in the crystal: Synchrotron radiation studies with glycogen phosphorylase b. EMBO J. 6, 539-546.
Protein Crystallography
21
Harada, S., Yasui, M, Masanori, Y., Murakawa, K., Kasai, N., and Satow, Y. (1986). Crystal-structure analysis of cytochrome-c' by the multiwavelength anomalous diffraction method using synchrotron radiation. J. Appl. Crystallogr. 19, 448-452. Harker, D. (1956). The determination of the phases of the structure factors of noncentrosymmetric crystals by the method of double isomorphous replacement. Acta Crystallogr. 9, 1-9. Harrison, R.W. (1989). Minimization of cross entropy - A tool for solving crystal structures. Acta Crystallogr., Sect. A, 45,4-10. Helliwell, J.R., Habash, J., Cruickshank, D.W.J., Harding, M.M., Greenhough, T.J., Campbell, J.W., Clifton, I.J., Elder, M., Machin, P.A., Papiz, M.Z., and Zurek, S. (1989). The recording and analysis of synchrotron X-radiation Laue diffraction photographs. J. Appl. Crystallogr. 22,483-497. Hendrickson, W.A. (1985). Stereochemically restrained refinement of macromolecular structures. Meth. Enzymol. 115,252-270. Hendrickson, W.A. and Konnert, J.H. (1980). Incorporation of stereochemical information into crystallographic refinement. In: Computing in Crystallography. (Diamond, R., Ramaseshan, S., and Venkatesan, K., Eds.), pp. 13.01-13.25. Indian Acad. Sci., Bangalore, India. Hendrickson, W. A., and Ward, K.B. (1987). Imaging Plate Detectorsfor Synchrotron Radiation. Howard Hughes Medical Institute Scientific Conference Center, Coconut Grove, FL. Howard, A.J., Nielsen, C, and Xuong, Ng-H. (1985). Software for a diffractometer with multiwire area detector. Methods Enzymol. 114, 452-472. Howard, A.J., Gilliland, G.L., Finzel, B.C., Poulos, T.L., Ohlendorf, D.H., and Salemme, F.R. (1987). The use of an imaging proportional counter in macromolecular crystallography. J. Appl. Crystallogr. 20, 383-387. Kabsch, W. (1988). Evaluation of single-crystal X-ray-diffraction data from a position-sensitive detector. J. Appl. Crystallogr. 21, 916-924. Jancarik, J. and Kim, S.H. (1991). Sparse-matrix sampling—A screening method for crystallization of proteins. J. Appl. Crystallogr. 24, 409-411. Jones, N.D., Decter, J.B., Swartzenderber, J.K., and Landis, RL. (1987). Amer. Crystallogr. Assoc. Meet. March 15-20, Austin, Texas, H-4. (Abstr.) Jones, T.A. (1978). A graphics model building and refinement system for macromolecules. J. Appl. Crystallogr. 11,268-272. Jones, T.A. (1986). Interactive computer graphics: FRODO. Meth. Enzymol. 115, 157-171. Jones, T.A., Zou, J.-Y, Cowan, S.W., and Kjeldgaard, M. (1991). Improved methods for building protein models in electron-density maps and the locations of errors in these models. Appl. Crystallogr. A47, 110-119. Lamzin, V.S. and Wilson, K.S. (1993). Automated refinement of protein models. Acta Crystallogr. Sect. D,49, 129-147. Laver, W.G. (1990). Crystallization of antibody-protein complexes. Methods, 1, 70-74. McPherson, A. (1982). The Preparation and Analysis of Protein Crystals. John Wiley and Sons, New York. McPherson, A. (1990). Current approaches to macromolecular crystallization. Eur. J. Biochem. 189, 1-23. Miller, M., Harrison, R., Wlodawer, A., Appella, E., and Sussman, J.L. (1988). Crystal-structure of 15-mer DNA duplex containing unpaired bases. Nature 334, 85-86. Miller,M., Jaskdlski, M., Rao, J.K.M., Leis, J., and Wlodawer, A. (1989). Crystal structure of a retroviral protease proves relationship to aspartic protease family. Nature 337, 576-579. Otwinosky, Z. and Gewirth, D. (1993). Denzo Manual. Yale University, New Haven, CT. Patterson, A.L. (1935). A direct method for the determination of the components of interatomic distances in crystals. Z. Krist. 90, 517-542. Prince, E., Sjolin, L., and Alenljung, R. (1988). Phase extension by combined entropy maximization and solventflattening.Acta Crystallogr., Sect. A, 44, 216-222.
22
ANIRHUDDHA ACHARI and DAVID K. STAMMERS
Reed R.J. (1986). Improved Fourier coefficients for maps using phases from partial structures with errors. Acta Crystallogr., Sect. A, 42, 140-149. Richards, F.M. (1985). Optical matching of physical models and electron density maps: Early developments. Meth. Enzymol. 115, 145-154. Rodgers, D.W. (1994). Cryocrystallography. Structure 2, 1135-1140. Rossmann, M.G. and Blow, D.M. (1962). The detection of subunits within the crystallographic asymmetric unit. Acta Crystallogr. 15, 24-31. Shanon, C.E. (1948). The Mathematical Theory of Communication. Bell Syst.Tech. J., 27, 379-423, 623-656. Shrive, A.K., Clifton, I.J., Hajdu, J., and Greenhough, T.J. (1990). Laue film integration and deconvolution of spatially overlapping reflections. J. Appl. Crystallogr. 23, 169-174. Stout, G.H. and Jensen, L.H. (1968). X-ray Structure Determination. The Macmillan Company, London, England. Stura, E.A. and Wilson, I.A. (1990). Analytical and production seeding techniques. Methods 1, 38-49. Wang, B.C. (1985). Resolution of phase ambiguity in macromolecular crystallography. Methods Enzymol. 115,90-112. Weber, RC. (1991). Physical principles of protein crystallization. Adv. Protein Chem. 41, 1-36. Xuong, Ng-H. and Freer, S.T. (1971). Reflection intensity measurement by screenless precession photography. Acta Crystallogr., Sect. B, 27, 2380-2387. Zaluaf, M. and D'Arcy, A. (1992). Light scattering of proteins as a criterion for crystallization. J. Crystal Growth 122, 102-106. Zeppezauer, M. (1971). Formation of large crystals. Meth. Enzymol. 22, 253-266.
Chapter 2
The Chemistry of Protein Functional Groups GARY E. MEANS, HAO ZHANG, and MIN LE
Abstract Introduction Modification of Amino Groups (a-NH2 and Lysine) Reductive Methylation Amidination Maleic Anhydride Trinitrobenzenesulfonate Selective Modifications of a- or e-Amino Groups Modification of Imidazole Groups (Histidine) Diethyl Pyrocarbonate Modification of Guanidino Groups (Arginine) Butanedione Phenylglyoxal Modification of Carboxyl Groups (a-COOH, Aspartate, and Glutamate) Water-soluble Carbodiimides and Glycine Ethyl Ester Modification of Carboxamide Groups (Asparagine, and Glutamine) Deamidation Modification of Sulfhydryl Groups (Cysteine) TV-Ethylmaleimide
Protein: A Comprehensive Treatise Volume 2, pages 23-59 Copyright © 1999 by JAI Press Inc. All rights of reproduction in any form reserved. ISBN: 1-55938-672-X 23
24 24 29 29 31 31 32 32 33 33 34 34 35 36 36 37 37 38 38
24
GARY E. MEANS, HAO ZHANG, and MIN LE
Methyl Methanethiosulfonate Dithio(2-nitrobenzoate) Dipyridyl Disulfide Selective Reactions with Vicinal Sulfhydryl Groups Modification of Disulfide Bonds (Cystine) Reduction by Dithiothreitol and Other Thiols Modification of Thioether Groups (Methionine) Hydrogen Peroxide Chloramine T Modification of Indole Groups (Tryptophan) N-Bromosuccinimide Modification of Phenolic Groups (Tyrosine) Iodination Tetranitromethane yV-Acetylimidazole
39 39 42 43 44 44 45 45 45 46 46 46 47 48 49
ABSTRACT The physical, chemical, and biological properties of proteins are determined to some extent by the properties of their constituent amino acid side-chains or functional groups. Reagents and procedures are described for selective chemical modification of the major types of functional groups. Those reagents and procedures can be used to alter the properties of individual proteins and to identify functional groups required for the catalytic activities of enzymes and those responsible for the other properties of biologically important proteins.
INTRODUCTION Amino acid side chains containing oxygen, nitrogen, or sulfur atoms are required for the catalytic activities of enzymes and for the biological properties of most other proteins. They are called functional groups due to their roles as acids, bases, nucleophiles, electrophiles, electrostatic charges, hydrogen bond donors, acceptors, and so forth in their catalytic mechanisms and/or other functions. The particular side chains required for the biological activity of a protein can often be determined from the effect(s) of their modification on the activity. Side chains composed entirely of carbon and hydrogen atoms are just as surely necessary but are not usually called functional groups and are not usually subject to chemical modification. Some properties of those side chains and the so-called functional groups are presented in Tables 1 and 2. ct-Amino and cc-carboxyl groups are similar in most respects to side chain amino and carboxyl groups and will be discussed in the same sections. Less common functional groups, like those of y-carboxyglutamate, phosphoserine, phosphotyrosine, and O- and N-linked glycosyl groups resulting from various posttranslational modifications will not be addressed.
The Chemistry of Protein Functional Groups
25
Table 7. Physical Parameters of Amino Acid Residues Surface Area (A2j°
Van der Waals* Residue
3
Volume (A )
Total
Side chain
Hydrophobicit/1 (Kcal/mol)
Ala
67
113
67
Arg
148
41
196
3.0
Asn
96
158
113
0.2
Asp
91
151
106
2.5
1/2Cys
86
140
104
-1.0 0.2
-0.5
Gin
114
189
144
Glu
109
183
138
Gly
48
83
His
118
194
151
-0.5
He
124
182
140
-1.8 -1.8
2.5 0
Leu
124
180
137
Lys
135
211
167
3.0
Met
124
204
160
-1.3
Phe
135
218
175
-2.5
Pro
90
143
105
-1.4
Ser
73
122
80
0.3
Thr
93
146
102
-0.4
Trp
163
259
217
-3.4
Tyr
141
229
187
-2.3
Val
105
160
117
-1.5
Notes: aCreighton, 1993. b Milleretal., 1987. c Levitt, 1976.
Proteins are subject to chemical modification for many different purposes. They are sometimes modified, for example, to increase or decrease their solubility, to promote or discourage subunit dissociation, to alter their susceptibility to proteolysis, to stabilize or protect unstable structures during sequence determinations, and for other kinds of structural studies. Modification procedures are sometimes used to introduce isotopic, fluorescent, and other kinds of spectroscopic labels to effect the attachment or conjugation of one protein to another protein, to an insoluble support, or to some other substance, and to determine spatial relationships between side chain groups. Chemical modification procedures are also sometimes used to determine the number of certain amino acid residues and are frequently the simplest and most direct way to identify particular amino acid residues required for biological activity.
26
GARY E. MEANS, HAO ZHANG, and M1N LE Table 2. The Properties of lonizable Groups of Side Chains in Proteins pKaa
Residues Arg Asp Cys Glu His Lys Ser Thr Tyr
Nominal 12.0 4.0 8.7 4.4 6.5 10.5 14.2 15.0 10.1
Range
— 3.9-4.0 9.0-9.5 4.3-4.5 6.0-7.0 10.4-11.1
— — 10.0-10.3
AH (Kcal/mol)h 12.4 1.2 8.6 1.2 4.0 11.0
— — 6.0
Notes: aDixon and Webb, 1964; Creighton, 1993; Kyte, 1995. b Dixon and Webb, 1964; Fasman, 1974.
Table 3 lists some of the most widely used and most effective reagents/procedures for identifying functional groups required for the biological activities of proteins and indicates the extent to which they can be expected to react with the different protein functional groups. Although more reagents might have been included, the list is purposely short so as to emphasize those reagents thought to be the most useful. Those included were selected particularly for their specificity for a single functional group under conditions compatible with the biological activity of most proteins. Those effecting the least change in size or charge of the side chains and, thereby, usually having the least effect on protein structure, those that are easy to detect, follow, or determine, those giving chemically stable derivatives that can be isolated and characterized, or those from which the original side chain can subsequently be regenerated were also considerations. These and other desirable attributes will be discussed in regard to each of the reagents described below. More extensive discussions of protein modification reagents and procedures are available elsewhere (Hirs, 1967; Means and Feeney, 1971a; Hirs and Timasheff, 1972, 1977, 1983; Glazer and Delange, 1975; Lundblad and Noyes, 1984; Eyzaguirre, 1987; Imoto and Yamada, 1989; Wong, 1991; Lundblad, 1991, 1995). A similar list including more than a hundred reagents has been published elsewhere (Means and Feeney, 1993). Although most of the reagents listed in Table 3 affect more than just one functional group, differences in reactivity are frequently large and selective modification can usually be achieved by using limited amounts of reagent. N-Ethylmaleimide, for example, reacts readily with a wide range of nucleophiles but about 1000 times faster with sulfhydryl groups than with other common functional groups (Brewer and Riehm, 1967) and, in limited amounts, is usually very specific for them.
Table 3. The Specificity of Reagents and Procedures for Chemical Modification o f Protein Functional Groups AMINOC GUANlDlNO IMIDAZOLE CARBOXYL
AMINO GROUPS Citraconic anhydride Maleic anhydride Methyl acetimidate Reductive methylation Trinitrobenzenesulfonic acid
N
V
+++d
-
+++d
-
+++ +++ +++
f f
-
-
-
-
-
-
-
THIOL
+ +
-
THIOETHER DlSULFlDE PHENOL
-
-
k
INDOLE
-
-
5 -
-
-
-
-
te
-
-
IMIDAZOLE GROUPS Diethyl pyrocarbonate
+
-
+++d
k
+d
-
-
+d
-
GUANlDlNO GROUPS ~utanedione~ Phenylglyoxal
+'
-
+++
-
-
-
-
-
-
-
-
-
+
-
-
-
+d
-
CARBOXYL GROUPS Water soluble carbodiimide + glycine ethyl ester
DlSULFlDE BONDS Dithiothreitol
+++
+g
-
-
+++
+
-
-
-
-
-
-
-
+++d
-
-
(continued
Table 3. Continued AMIN@
C U A N l D l N O I M I D A Z O L E CARBOXYL
THIOETHER GROUPS Hydrogen peroxide Chloramine T
-
-
++
-
-
-
-
-
?
-
-
-
THlOL
+++ +++
THIOETHER DISULFIDE
++d
++d
PHENOL
INDOLE
-
-
-
-
+++d
-
INDOLE GROUPS N-Bromosuccinimide N-Chlorosuccinimide PHENOL GROUPS N-Acetyl imidazole
N
Q,
Iodine
-
Tetranitromethane
-
++
-
It
-
+++
+
+++
-
+++ +++
+ +
Notes: T h e reagents included were selected from a longer list of reagents (Means and Feeney, 1993) on the basis of their widespread, general usefulness in determining functional groups required for biological activity. The indicated specific~tiesare those expected under the conditions usually employed, as described in the text. b ~ h indicated e reaction specificities are as follows:
+++ highly reactive, extensive reaction under typical conditions; ++ significant reaction should be expected; + some reaction is possible but is not usually extenswe;
- no reaction expected under typical conditions; + a significant reaction usually takes place but the resulting derivatives are unstable and usually break down to regenerate the original side chain group. 'Reagents that affect €-amino groups usually also affect a-amino groups. d ~ a be n converted back to the unmod~fiedside chain under relat~velymild conditions. eMay be hazardous if handled improperly--check literature before using. '~eactionrequires two closely spaced thiol groups. gln the absence of an added nucleophile, may give rise to cross links with nearby carboxyl moieties. h ~pH t 7 to 9 in the presence of borate. 'Usually affects only a-amino groups.
The Chemistry of Protein Functional Croups
Sulfhydryl groups are generally one of the most reactive functional groups in proteins and are therefore usually one of the easiest to modify. Due to their generally high reactivity, however, they also sometimes interfere with the modification of other functional groups. A^Bromosuccinimide, for example, is widely used to modify tryptophan residues of proteins but usually not those with sulfhydryl groups, which almost always react even faster. The use of other oxidizing and electrophilic reagents is also usually limited, for similar reasons, to proteins without sulfhydryl groups. The stabilities of the products of a modification procedure are sometimes just as important as their rate of formation. Under the conditions usually employed to acylate amino groups (i.e., pH -8.5 to 10), for example, sulfhydryl, imidazole, and phenolic groups are also normally acylated to some extent, but the resulting products are usually unstable and either hydrolyze spontaneously to regenerate the original side chains or they can be readily deacylated by a subsequent treatment with hydroxylamine. The specificity of acylating agents for amino groups under such conditions is thus due to both the high reactivity of amino groups and to the stability of the products. At lower pH (i.e., ~6 to 7), where most amino groups are strongly protonated and therefore unreactive, imidazole moieties of histidine residues are still largely unprotonated and are usually the only residues susceptible to acylation. Under such conditions, diethyl pyrocarbonate, which gives particularly stable acyl derivatives, is then relatively specific for histidine residues. By the use of an appropriate acylating agent and an appropriate pH, acylation can thus be quite specific for either amino or imidazole moieties of proteins. Reagents for selectively modifying the major protein functional groups are described below.
MODIFICATION OF AMINO GROUPS (a-NH 2 AND LYSINE) A large number of reagents are used to chemically modify amino groups in proteins. Because amino groups are usually relatively abundant and quite reactive and the products are usually stable, they are frequently employed to introduce various kinds of labels and probes for crosslinking in order to conjugate proteins to each other, to insoluble supports, and to other substances. Most of the reagents react to some extent with both a- and e-amino groups. Table 3 indicates the extent to which they are also likely to affect other functional groups. Reductive Methylation
Reductive methylation (Equation 1) is very specific for amino groups in proteins. Reactions are usually conducted from about pH 6 to 9 depending mainly on the reductant, and while methylation slightly increases the size of the amino groups, it has little effect on their pKa values or the overall charge of most proteins (Means and Feeney, 1968,1995; Means, 1977; Jentoft and Dearborn, 1979). Because it has
29
GARY E. MEANS, HAO Z H A N G , and M I N LE
30
©
-NH3
CH20
NaBH4
pH 8 - 9, or NaBH3CN p H6-7
(P)-NH2CH3 ^—^
CH20
NaBH4 pH 8 - 9 , or^ @ - N H ( C H NaBH3CN
3
)2
(1)
pH6-7
so few effects, reductive methylation is sometimes used to introduce isotopic labels into proteins. 14C- and 3H-labels can be introduced using 14C- or 3H-labeled formaldehyde or 3H-labeled sodium borohydride, and although specific activities are usually lower than might be obtained by radioiodination, physical and biological properties are less likely to be affected and the radiological half-lives are much, much, longer (Rice and Means, 1971; Ascoli and Puett, 1974; Jentoft and Dearborn, 1979; Tack et al., 1980; Means and Feeney, 1995). 13 C- and 2 H- labels can also be introduced with appropriately labeled precursors and may then be characterized by 13C or2H-NMR (Jentoft et al., 1979; Jentoft and Dearborn, 1979, 1983; Zhang and Vogel, 1993). Depending on the conditions, purposes, and other circumstances, sodium borohydride, sodium cyanoborohydride, dimethylamine borane, or pyridine borane can all be used as reducing agents. The last three are usually employed in large excess and the incorporation of formaldehyde is usually very efficient but the reactions are relatively slow. With sodium borohydride, some formaldehyde is also converted into methanol and its incorporation into proteins is therefore usually less efficient. Reactions are usually complete in only a minute or two, however, and the reducing agent is utilized more efficiently, which is particularly important when 3H-labeled borohydride is used for radiolabeling (Tack et al., 1980). With all four reducing agents, monomethylamino groups are formed initially and rapidly converted into dimethylamino groups, which usually predominate except at very low levels of modification. The extent of reaction is usually controlled by the amount of formaldehyde employed and/or by the reaction time. Modification approaching 100% of the amino groups is not unusual and, in many cases, appears to have few or no obvious effects on physical or biological properties. Under the reaction conditions usually employed, no significant side reactions have been described. The particular advantages, disadvantages, and special circumstances involved with each of the reducing agents have been described (Means and Feeney, 1995). Other carbonyl compounds can be used similarly to introduce a wide variety of substituents into proteins. Pyridoxal phosphate, for example, can be incorporated and, at relatively low concentrations, is sometimes used as an affinity label to modify amino groups in or near phosphate and other anion binding sites (Anderson et al., 1966; Rippa et al., 1967; Means and Feeney, 1971b; Dudkin et al., 1975). The UV-visible and fluorescence spectra of the resulting pyridoxamine phosphate moieties are sensitive to their environment and may sometimes be used to characterize the attachment sites or, after fragmentation, to identify sequences originating from those sites.
The Chemistry of Protein Functional Groups
31
Amidination
Methyl and ethyl acetimidate are commercially available and may be used to selectively modify amino groups under very mild conditions (Equation 2). The
©
^NH2
+
DH~8 P
- N H 3 + CH3(^
S~\
*NH2
» (>)-NHC^
+ CH3OH + H + (2)
resulting acetamidine moieties are relatively small, retain a cationic charge, and usually have few effects on protein structure (Hunter and Ludwig, 1962; Ludwig and Hunter, 1967; Browne and Kent, 1975; Makoff and Malcolm, 1981; Inman et al., 1983). Imido esters are also used to introduce many different kinds of substituents into proteins (Jue et al., 1978; Plapp, 1970; Riley and Perham, 1973). Bifunctional imido esters like dimethyl suberimidate, for example, are particularly useful as crosslinking agents (Davies and Stark, 1970). Maleic Anhydride
Maleic anhydride, citraconic (i.e. methylmaleic) anhydride and several related dicarboxylic acid anhydrides are widely used to reversibly modify amino groups in proteins (Equation 3). The reaction is usually done under slightly alkaline
€>»
NH3
~
"
pH8-9
(3)
conditions (pH -8-9) and usually affects only amino groups (Butler et al., 1967, 1969; Atassi and Habeeb, 1972; Shetty and Kinsella, 1980; Aviram et al., 1981). Under those conditions, the introduced substituents are negatively charged and stable. The increased negative charge frequently increases the proteins' solubility, may effect the dissociation of subunits, and sometimes has other effects on protein structure (Shetty and Kinsella, 1980; Aviram et al., 1981). At low pH (i.e., -3.5 or lower), the maleamide moieties are not charged and, more importantly, undergo relatively slow deacylation to regenerate the original amino groups according to Equation 4. This ability to deacylate the modified amino
GARY E. MEANS, HAO ZHANG, and MIN LE
32
groups, to essentially reverse the modification, is very useful of course for a number of purposes. Recovery of an enzyme's catalytic activity or other properties that were lost during the modification, for example, can be a strong indicator of the specificity of the reaction. Like most procedures for their modification, maleylation or citraconylation can be used to protect lysine residues from digestion by trypsin. Deacylation of the purified peptides obtained after such a digestion and subsequent treatment with trypsin, should then effect the cleavage of lysine residues to give the same peptides that would have been obtained without any intervening modification procedure and also provides very important information on the order of those peptides in the overall amino acid sequence. Trinitrobenzenesulfonate
2,4,6-Trinitrobenzenesulfonate reacts with the amino groups of proteins under mild, slightly alkaline conditions, and the products have a strong, easy-to-detect, UV-visible absorption (Equation 5)(Okuyama and Satake, 1960; Goldfarb, 1970, NO2
(V)~NH3 + N O r \ Q / - S Q 3 NO2
NO2 PH8 9
' »
( ^ ~ N H V Q ^
N 0
2
NO2
+ S03= + 2H* (5)
1974; Fields, 1971, 1972). Because they are large and hydrophobic, the introduced substituents are likely to affect protein structure. Because they are easy to detect, their incorporation can usually be followed at low levels and, in some cases, correlated with other changes, for example, in biological activity (Coffee et al., 1971; Hartman et al., 1985). Because the reaction is quite specific for amino groups and easy to follow, it is often used to determine amino groups in proteins and to monitor changes in the number of amino groups resulting from other modifications, procedures, manipulations, and so forth (Fields, 1972). Selective Modifications of a- or e-Amino Groups
Because they are usually more abundant than ct-amino groups, e-amino groups are the principle targets of most amino group modification procedures. Due to differences in the basicity and nucleophilic reactivity of a- and e-amino groups, selective reaction with one or the other is possible, although not usually easy to achieve. With e-amino groups, which are the more basic and the stronger nucleophiles, reactions are strongly favored by high pH. At lower pH values, where they are largely protonated and oc-amino groups are only partially protonated, however, the latter are frequently the more reactive. Selectivity for a- over e-amino groups should be maximal at pH values well below the pKa values of the a-amino group(s),
The Chemistry of Protein Functional Groups
33
but reaction rates can be very slow at unnecessarily low values. Sometimes other factors are also important. Reactions of proteins with nitrous acid, for example, require a low pH and affect mainly a-amino groups, but due to another important ionization (i.e., HONO + H + ^ H 2 0 - NO+), are usually optimal around pH 3.5. Due to their greater abundance, some e-amino groups are also usually affected (Shields et aL, 1959; Wagner et al., 1969; Kurosky and Hofmann, 1972). Other procedures for the selective modification of a-amino groups also usually employ a low pH. Wetzel et al. (1990), for example, have described a procedure to selectively acylate the amino-termini of peptides with iodoacetic anhydride at pH 6. The resulting iodoacetyl derivatives are themselves potent alkylating agents and suitable for conjugation to thiol moieties of other peptides or proteins, on solid supports, and to a variety of other substances. Dixon and coworkers (1972) have described a procedure for the selective transamination of amino-termini. The reaction takes place under mild, slightly acidic conditions (e.g., pH ~5), involves an aldehyde (usually glyoxylate), a heavy metal cation [usually copper (II) or nickel (II)], relatively high concentrations of acetate ion or another weak base, and converts amino-terminal amino acids of peptides and proteins into corresponding a-ketoacyl moieties. The high reactivity of periodate ion with 2-amino alcohols (~1000-times faster than its reaction with 1,2-diols) can be used to effect the specific oxidation of amino-terminal serine and threonine residues of proteins and peptides (Dixon and Fields, 1972). In the absence of sulfhydryl groups, which also react rapidly with periodate, the reaction is usually very specific for those two amino-termini and proceeds quite rapidly at approximately neutral pH. The aldehyde moieties that result are again quite reactive and can be conjugated to various fluorescent labels, biotin, cytotoxic drugs, and so forth (Geoghegan and Stroh, 1992).
MODIFICATION OF IMIDAZOLE GROUPS (HISTIDINE) Diethyl Pyrocarbonate
Diethyl pyrocarbonate, also sometimes called ethoxyformic anhydride, reacts readily with most of the nucleophilic functional groups in proteins at high pH but is relatively specific for histidine residues at low pH (i.e., below pH ~7) (Equation 6) (Melchior and Fahrney, 1970; Miles, 1977). The acylated imidazole moieties have absorption maxima at about 230 to 242 nm (e = 3.0 to 3.6 x 103 M_1cm_1) that can be used to follow the reaction or to determine the number acylimidazole moieties introduced. Subsequent treatment of the modified proteins with hydroxylamine, usually at a pH of about 7, can be used to deacylate the introduced ethoxyformyl histidine residues and frequently restores some of the biological activity lost during the modification.
GARY E. MEANS, HAO ZHANG, and MIN LE
34
©-o ^
+
W*°-C{ C2H5(>-
(8)
2[ArSOH] + 2ArS" •
2ArSSAr + 20H"
ArS02* + ArS "+ 2H + ArS02~ + 3ArS" + 2H+
(14) (15) (16)
2-nitrobenzoate dianion, which can be mistaken for slow-reacting sulfhydryl groups and interfere with their detection when they are present (Riddles et al., 1979). Metal ion-catalyzed reoxidation of 5-thio-2-nitrobenzoate by oxygen (Equation 17) is also prominent at high pH and usually interferes with sulfhydryl group 2ArSH + 1/2 0 2 - * *
^ ArSSAr + H2O
(17)
determinations under those conditions (e.g., above -8.5). However, because metal ions are involved in the latter, reoxidation can usually be suppressed by the presence of EDTA or by somehow excluding o\ygen (Riddles et al., 1979).
The Chemistry of Protein Functional Groups
41
The 5-thio-2-nitrobenzoate substituent introduced upon reaction with 5,5'dithio(2-nitrobenzoate) is relatively large, hydrophobic, and anionic, and it often affects a protein's structure. It has an absorption spectrum similar to that of the reagent, but less intense (i.e., Xmax= 323 nm, e = 2,500 M _1 cm _1 ), that can usually be used to determine the number of substituents introduced (Colman, 1969; Riddles et al., 1979). (3-Mercaptoethanol and other simple thiols react readily with the incorporated substituents and, in large excess, can be used to regenerate the original protein sulfhydryl groups (Equation 18) and sometimes the original biological ( ? ) - S S A r + RSH ~^r** Q P ) - S H + RSSR + ArS" (excess)
(18)
activity (Kastenschmidt et al., 1968; Ploux et al., 1995). In proteins with closely spaced, so-called vicinal, sulfhydryl groups, the same reaction sometimes results in the formation of intramolecular disulfide bonds (Equation 19). In such cases, only one equivalent of 5,5/-dithio(2-nitrobenzoate) is
v^H
r\^
(EC—©a---*
(19)
consumed but two equivalents of 5-thio-2-nitrobenzoate dianion are released. None is incorporated into the protein (Wassarman and Major, 1969; Lewis et al., 1993). The stoichiometry, based on the amount of 5-thio-2-nitrobenzoate released, is exactly the same as if two sulfhydryl groups had reacted independently with 5,5'-dithio(2-nitrobenzoate), but there is no approximate 323-nm chromophore incorporated into the protein. Cyanide ion can also be used to remove 5-thio-2-nitrobenzoate moieties and convert the original thiol groups into thiocyanate derivatives (Equation 20). Some proteins are reactivated or partially reactivated by such (V)-SSAr + CN"
( V ) - S C N + ArS~
(20)
treatment due, presumably, to the reduced size of the substituent and/or its lack of a charge (Huynh and Snell, 1986; Cunningham et al., 1990). 2-Nitro-5-thiocyanobenzoate, a commercially available analog of 5,5'-dithio(2nitrobenzoate), reacts directly with sulfhydryl groups to give the same thiocyanate derivatives and one equivalent of 5-thio-2-nitrobenzoate dianion (Kindman and Jencks, 1981; Altamirano et al., 1992). The latter's formation is not stoichiometric, however, as significant amounts of a 5-thio-2-nitrobenzoate adduct are also usually obtained, depending largely on the accessibility and environment of the particular sulfhydryl group(s) (Equations 21a & 21b).
GARY E. MEANS, HAO ZHANG, and MIN LE
42
"OOCv 'OOO
Q - S H
( P ) - S C N + 02N-((Q)-S" (21a)
+ 02N-—f
• 'if* "+ (28) (Ci )
The Chemistry of Protein Functional Groups
47
also sometimes affects tryptophan and methionine residues (Koshland et al., 1963) and is particularly important as the most widely used method to radiolabel proteins. lodination
Two isotopes of iodine, 125I and 131I, are commonly used to radioiodinate proteins. In addition to its familiarity and convenience, the principle advantage of radioiodination appears to be the very high specific activities that can be obtained (i.e., -2,125 Ci/mmol for each incorporated 125I and -6,500 Ci/mmol for each incorporated 131I) (Wilbur, 1992). The tendency of iodine to react with several different kinds of functional groups, particularly its very fast reaction with sulfhydryl groups, are among its disadvantages. Because both monoiodotyrosine and diiodotyrosine residues are sometimes obtained and are both larger and appreciably more acidic than tyrosine, their formation also sometimes affects protein structure (Covelli and Wolff, 1966). The relatively short half-lives of 125I and 131I (-60 and 8.1 days, respectively) are also often a disadvantage necessitating the use of decay corrections and precluding some relatively long-term applications and longterm storage of radioiodinated proteins. Elemental iodine is relatively insoluble in water but soluble in aqueous sodium or potassium iodide to give red-brown solutions of triiodide ion (i.e., I2 + I~ ^ Ip, that are convenient for the iodination of proteins. Under slightly alkaline conditions, reactions with the small amount of iodine in equilibrium with triiodide ion are usually fast and may sometimes be followed by monitoring decreases in triiodide concentration at -355 nm (Cunningham and Nuenke, 1961). The products, mono- and diiodotyrosine, have increased absorbencies in the near UV and their approximate amounts can usually be determined, after removal of triiodide, from their absorptions according to the procedure of Edelhoch (Edelhoch, 1962). When radioactive iodine is used, procedures designed to achieve more efficient incorporation and the highest possible specific activity are usually preferred. Most of those procedures involve the formation of IC1 from radioactive iodide ion and an electrophilic chlorine donor, like 7V-chlorosuccinimide or chloramine T (Greenwood et al., 1963), that serves as a source of Cl+ (i.e., *I" + Cl+ -> *IC1). The IC1 formed then reacts with proteins to effect their iodination without any isotope dilution (Greenwood et al., 1963; Lawrence and Loskutoff, 1986). Because chloramine T and related /V-chloro compounds are potent oxidants and have their own effects on proteins (i.e., see earlier sections on Thioether and Indole Groups), they are usually employed in very low amount. Several insoluble jY-chloro compounds, 1,3,4,6-tetrachlorodiphenylglycouril (Fraker and Speck, 1978), and derivatized polystyrene bead products containing chloramine T-like moieties (Markwell, 1982), are also used similarly to effect iodination and, due to their lesser interactions with soluble proteins, usually have fewer direct effects on them. Stopping reactions and separating modified proteins from the insoluble reagents by decantation is also very convenient.
48
GARY E. MEANS, HAO ZHANG, and MIN LE
Iodination of proteins with lactoperoxidase, hydrogen peroxide, and iodide ion proceeds under mild conditions and, like other iodination procedures, results in the formation of both mono- and diiodotyrosine as well as mono- and diiodohistidine residues. In contrast to reactions with I2 and IC1, however, those with lactoperoxidase involve the formation of a Michaelis-Menten complex between lactoperoxidase and the three reactants, the protein being modified, T, and H 2 0 2 ; the reactions with lactoperoxidase are therefore limited to accessible or surface tyrosine and histidine residues (Morrison and Bayse, 1970; Huber et al., 1989). Because many tyrosine and histidine residues subject to iodination by other methods are not accessible to lactoperoxidase, smaller amounts of iodine are usually incorporated. Because those affected are at the surface, however, there is also usually less effect on a protein's structure. Because only accessible or surface residues are involved, reactions with lactoperoxidase are also sometimes used as part of a scheme to identify exposed tyrosine and histidine residues of proteins or to identify the surface components of various macromolecular assemblies (Wower et al., 1983; Illy et al., 1991). As with the previously mentioned iodination methods, immobilized forms of lactoperoxidase are available and appear to offer some important advantages as compared to the soluble enzyme, for example, again making it easy to stop reactions and to separate modified proteins from lactoperoxidase (David, 1972). Tetranitromethane
Tetranitromethane is one of the most widely used reagents to modify tyrosine residues in proteins. The reaction again proceeds optimally under alkaline conditions, converts tyrosine into 3-nitrotyrosine residues (Equation 29), and usually N02
0 - ^ - O H * C(N02)4 ^
^ ©-^-OH
E l(aq)
(E)
i->r
AG aq
AG,
t
i->r
t l'(E)
•'(aq)
AG
aq -> E
Scheme 2.
where I and I' are the two inhibitors and where aq and E designate the inhibitor in solution and on the enzyme surface, respectively, the difference in the free energy of binding is given by AG1' _>E - AG1 _>E. Since the total free energy change around the cycle must be zero, this is equal to AGEI-»I' - AG I—>I'. In words, the difference in free energies of binding the two inhibitors is equal to the difference in free energies of converting I to I' in solution and bound to the enzyme. A similar approach is used to calculate the difference in the free energy of binding of each inhibitor to two forms of the enzyme that differ in the state of protonation of a single residue. Here the thermodynamic cycle is:
ALLEWELL, OBEROI, HARIHARAN, and LICATA
88
K
I aq-> E
E +H + I
K
E -1 +H
E -> EHT E
K
EH++ I
E ->EH* El
EH K
aq->EH +
Scheme 3.
The quantity of interest is AGI EH+ - AGI E, which is equal to AG[^EH+ the difference in the free energies of protonating the liganded and unliganded protein. The calculations predict correctly the relative affinities of the two inhibitors for the enzyme, but overestimate the difference in free energy of binding by about 4 kcal.mor 1 . They also correctly predict the magnitude and pH dependence of the limited set of experimentally determined binding constants for each inhibitor. They provide considerable insight into the many factors that give rise to the difference in free energy of binding and they identify with a reasonable degree of certainty the groups that function as the general acid and base in the catalytic mechanism. The discussion of the effect of bound solvent on pKa values is a particularly interesting feature of the analysis. M^H->EH+'
Kinetics of Ligand Binding
Inspection of potential surfaces has often given rise to the suggestion that they could accelerate rates of ligand binding. This hypothesis can be examined by modeling the effects of the potential surface on the Brownian dynamics of the ligand, and calculating its probability of reacting versus diffusing away when ligand and protein are separated by a specified distance (Allison et al., 1985).
Electrostatic Effects in Proteins
89
In acetylcholinesterase, a funnel of negative potential that extends outward from the active site can be envisioned to be involved in catalysis and the first calculations were consistent with this possibility (cf., Antosiewicz et al., 1994). However, a set of mutations that eliminated negative charges had only modest effects on rates of hydrolysis (Shafferman et al., 1994). Subsequent papers by Antosiewicz and colleagues (1995, 1996) demonstrate just how complex the design and interpretation of mutagenesis experiments can be. They show that a number of mutants including those generated by Shafferman and colleagues (1994) have only small effects on calculated encounter rates. However, they show that electrostatic steering is nevertheless important because both varying the ionic strength and eliminating the charge on the substrate affect both measured rates of hydrolysis and calculated encounter rates. A future challenge will be to find mutations that produce larger changes in calculated and experimentally observed rates. Similar attempts to quantitatively understand the role of electrostatics in protein-protein association rates are in progress (Janin, 1997; Schreiber and Fersht, 1996). Enzyme Mechanisms
Very few applications of electrostatic modeling to enzyme mechanisms have been reported as yet, perhaps because of skepticism about the applicability of classical continuum models to catalysis. Warwicker and colleagues (1994) examined the effects of alcohols on phospholipase A2 experimentally and through modeling to test the utility of two refinements in continuum electrostatic models proposed in a previous paper (Warwicker, 1994). The first is a double layer of solvent in which the volume traced out on the protein by a single solvent molecule (with a 1.4 A radius) is assigned a dielectric of 30, as proposed in the smeared dipole model of Onsager (1936), while the volume beyond this inner solvent layer is assigned a dielectric of 80, as proposed by Kirkwood (1939). The second refinement is the use of a saturating dielectric in high electric fields (for example, near-charged side chains) that is adjusted throughout the calculation. The reduction in activity produced by alcohols was shown to result primarily from a destabilization of the transition state rather than from changes in pKa values of groups known to be involved in catalysis. The effects of mutants were also predicted correctly. This example demonstrates that continuum methods can be useful in predicting trends, at least in some cases. The role of electrostatic effects in the catalytic and regulatory mechanisms of E. coli aspartate transcarbamylase has been analyzed by Oberoi and colleagues (1996) using the finite difference method with multigridding. This is the largest system to which this approach has been applied to date. A number of interactions over distances too large for direct ion pair formation were identified and the possibility that these long-range interactions are involved in the allosteric mechanism was proposed.
90
ALLEWELL, OBEROI, HARIHARAN, and LICATA Redox Potentials
The thioredoxin family of proteins has disulphide bonds that can be reversibly oxidized and that enable the proteins to function in redox reactions in cells. The various members of this family have different redox potentials and considerable effort has gone into elucidating the structural basis of these differences. Differences in redox potential are directly proportional to differences in the stabilities of the oxidized and reduced forms of these proteins. Since the thiol groups in the reduced protein are ionizable, interactions between their ionized forms and other groups in the protein could contribute to the difference in stability in the oxidized and reduced forms. Langsetmo and colleagues (1991a and b) established the linkage between thioredoxin stability and the titration of specific amino acids in the protein. AAG = 2.303RTApKa where AAG is the contribution of the titration of an ionizable group to the stability of the protein and ApKa is the change in pKa of that ionizable group produced by the unfolding of the protein. Both an Asp residue near one of the thiol groups and a Lys with which it can interact may be involved in regulating the redox potential of thioredoxin on the basis of the elevated pKa of the Asp residue and the fact that its pKa shifts when thioredoxin is oxidized or reduced (Langsetmo et al., 1991a and b). However, comparison of finite difference calculations on E. coli thioredoxin and DsbA, a homolog containing a thioredoxin domain and a 76-residue insert that is primarily a-helical, indicates other factors may also be important in DsbA (Gane et al., 1995). These include other ionizable residues, some of which are present only in DsbA, interactions with backbone dipoles, and the presence of a low dielectric region near the active site in DsbA. Interactions with Lipids
Potential surface calculations have been used recently in two systems to provide insights into the mechanism of lipid-protein interactions. In the first study, Lakey and colleagues (1994) carried out calculations on several colicins. Colicins are bacterial toxins that kill enterobacteria in a process that requires binding to a receptor on the outer membrane, translocation to the inner membrane, and insertion in the inner membrane. They found that, despite large differences in isoelectric point, the long-range potential surfaces of naturally occurring colicins were similar, with an extensive positive region and a negative dome that probably orients the colicin with respect to the negatively charged membrane. Parallel experimental studies have shown that eliminating several negatively charged residues affects in vitro activity. In a similar study, Scott and colleagues (1994) compared the potential surfaces of several phospholipases A2. All of the potential surfaces had distinct molecular-
Electrostatic Effects in Proteins
91
sidedness. The results were consistent with delocalized molecular electrostatics playing a role in orienting and holding phospholipases A2 at water-lipid interfaces; however, mutational results also implicate hydrophobic interactions. The conclusion that electrostatics is unlikely to be the only factor probably applies to many other systems. Hormone-Receptor Interactions
Demchuk and colleagues (1994) applied the potential surface approach to investigate the role of electrostatic effects in the binding of four-helix bundle growth factors to their receptors. The potential surfaces of hormones that bind to identical receptor subunits have twofold rotational symmetry, despite differences in sequence, while the potential surfaces of hormones that bind to heterooligomeric receptors lack symmetry. Future Prospects
There has been enormous progress in our ability to model electrostatic effects in proteins in recent years. Grid methods have increased the amount of molecular detail that can be incorporated into models substantially, while methods that allow free energies and kinetic constants to be calculated have increased the range of questions that can be addressed. As a result, there is currently intense interest in carrying out parallel experimental and theoretical studies in many systems. Increases in accuracy have lagged far behind increases in the complexity of the models used in theoretical calculations. Further improvements in the finite difference approach are possible, as discussed by Warwicker (1994) and others. The finite element method improves the accuracy with which the molecule is mapped on the grid, although at considerable computational expense (You and Harvey, 1993). The greatest shortcomings of current models are the neglect of molecular fluctuations, ion binding, and the details of solvent structure. Incorporating molecular fluctuations is largely a matter of having sufficient computer memory and time available since molecular dynamics methods are well-developed (cf., McCarrick and Kollman, 1994). Incorporating ion binding, pH effects, and more realistic solvent models is also challenging because the experimental information that is available is limited; however these problems constitute much of the most active current research (Garcia-Moreno, 1994; Warwicker, 1994; Coitino et al., 1995; Sharp et al., 1995; Dimitrov and Crichton, 1997; Alexov and Gunner, 1997; Zhou and Vijayakumar, 1997). Effective models should represent the principal components of the system components: the solvent, ions, and solute (the macromolecule) in sufficient detail. They should also include the dynamic nature of the system, including the motion of all three components. Standard molecular dynamics simulations and free energy simulation techniques alone (reviewed in Beveridge and DiCapua, 1989) are time-consuming and treat the system in a fixed ionization state, while continuum
92
ALLEWELL, OBEROI, HARIHARAN, and LICATA
approaches lack detailed representation and do not represent the dynamic nature of the system. Hence, approaches that combine molecular dynamics simulations with continuum calculations have recently been developed (cf., Gilson, 1995). A recently developed method for sampling potential energy surfaces (Tidor, 1993) (instead of a full free energy simulation) helps reduce the computational cost considerably, Free energy simulations that take into account pH dependence by calculating the protonation state using continuum calculations have also been recently described (MacKerell et al., 1995). Hybrid models that combine molecular dynamics and continuum methods are relatively rapid and have the additional advantage of allowing ionic strength and pH to be included. They can be used to describe the complete energetics of a macromolecular complex as a sum of terms that include changes in the conformation of the system, hydrophobic energy based on changes in solvent accessibility, a continuum electrostatic term, and a covalent term describing the bonded geometry. While electrostatic effects are frequently the subject of experimental studies, very few studies that directly combine experimental and theoretical approaches have been carried out. Examples include Bashford and colleagues (1993) and Warwicker et al. (1994). More frequently, predictions from theory are compared with a limited set of experimental data obtained in a different laboratory for a different purpose. The separation of theory and experiment severely limits the synergy required to produce rapid productive improvements in the theory. Development of close collaborations between theoreticians and experimentalists with a good understanding of the issues involved in both approaches would be very productive. Despite the popularity and successes of electrostatic modeling, it is important to keep in mind that electrostatics is only one of several factors in protein folding, stability, and function. Electrostatics should not be overemphasized in interpreting experimental results, and approaches that allow other factors to be investigated need development.
REFERENCES Ackers, G.K. and Halvorson, H.R. (1974). The linkage between oxygenation and subunit dissociation in human hemoglobin. Proc. Natl. Acad. Sci., U.S.A. 71,4312-4316. Alber, T. (1989). Mutational effects on protein stability. Ann. Rev. Biochem. 58, 765-798. Alexiev, U., Marti, T., Heyn, M.R, Khorana, H.G., and Scherrer, P. (1994). Surface charge of bacteriorhodopsin detected with covalently bound pH indicators at selected extracellular and cytoplasmic sites. Biochemistry, 33, 298-306. Alexov, E.G. and Gunner, M.R. (1997). Incorporating protein conformational flexibility into the calculation of pH-dependent protein properties. Biophys. J. 72, 2075-2093. Allewell, N.M. and Oberoi, H. (1991). Electrostatic effects in protein folding and function. Meth.Enzymol. 202,3-19. Allison, S.A., Ganti, G., and McCammon, J.A. (1985). Simulation of the diffusion-controlled reaction between superoxide and superoxide dismutase. 1. Simple models. Biopolymers 24,1323-1336. Antosiewicz, J., McCammon, J. A., and Gilson, M.K. (1994). Prediction of pH-dependent properties of proteins. J. Mol. Biol. 238, 415-436.
Electrostatic Effects in Proteins
93
Antosiewicz, J., McCammon, J.A., Wlodek, ST., and Gilson, M.K. (1995). Simulation of charge-mutant acetylcholinesterases. Biochemistry, 34, 4211-5219. Antosiewicz, J., Wlodek, S.T., and McCammon, J. A. (1996). Acetylcholinesterase: Role of the enzyme's charge distribution in steering charged ligands toward the active site. Biopolymers 39, 85-94. Bakir, U., Coutinho, P.M., Sullivan, P.A., Ford, C, and Reilly, P.J. (1993). Cassette mutagenesis of Aspergillus awamori glucoamylase near its general acid residue to probe its catalytic and pH properties. Protein Engineering, 6, 939-946. Barker, P.D., Mauk, M.R., and Mauk, A.G. (1991). Proton titration curves of yeast iso-1-ferricytochrome c. Electrostatic and conformational effects of point mutations. Biochemistry, 30, 2377-2383. Bashford, D., Case, D.A., Dalvit, C, Tennant, L. and Wright, P.E. (1993). Electrostatic calculations of side-chain pK(a) values in myoglobin and comparison with NMR data for histidines. Biochemistry 32, 8045-8056. Bashford, D. and Gerwert, K. (1992). Electrostatic calculations of the pKa values of ionizable groups in bacteriorhodopsin. J. Mol. Biol. 224, 473-486. Bashford, D. and Karplus, M. (1990). pKa's of ionizable groups in proteins: Atomic detail from a continuum electrostatic model. Biochemistry 29, 10219-10225. Beroza, P., Fredkin, D.R., Okamura, M.Y., and Feher, G. (1991). Protonation of interacting residues in a protein by a Monte Carlo method: Application to lysozyme and photosynthetic reaction center of Rhodobacter sphaewides. Proc. Natl. Acad. Sci. U.S.A. 88, 5804-5808. Beveridge, D.L. and DiCapua, F.M. (1989). Free energy via molecular simulation: Applications to chemical and biochemical systems. Ann. Rev. Biophys. Biophys. Chem. 18, 431-492. Brocklehurst, K. (1994). A sound basis for pH-dependent kinetic studies on enzymes. Protein Engineering 7, 291-299. Cantor, C.R. and Schimmel, PR. (1980). Biophysical Chemistry., pp 847-929. W.H. Freeman and Co., San Francisco. Cleland, W.W. (1977). Determining the chemical mechanisms of enzyme-catalyzed reactions by kinetic studies. Adv. in Enzymol. Relat. Areas Mol. Biol. 45, 273-387. Coitino, E.L., Tomasi, J., and Cammi, R. (1995). On the evaluation of the solvent polarization apparent charges in the polarization continuum model: A new formulation. J. Comp. Chem. 16, 20-30. Cummins, PL. and Gready, J.E. (1993). Computer-aided drug design: A free energy perturbation study on the binding of methyl-substituted pterins and N5-deazapterins to dihydrofolate reductase. J. Comp. Aided Mol. Design 7, 535-555. Davis, M.E. and McCammon, J. A. (1990). Electrostatics in biomolecular structure and dynamics. Chem. Rev. 90, 509-521. Delepierre, M., Dobson, CM., Karplus, M., Poulsen, F.M., States, D.J. and Wedin, R.E. (1987). Electrostatic effects and hydrogen exchange behaviour in proteins. The pH-dependence of exchange rates in lysozyme. Appendix: States, D.J. and Karplus, M., A model for electrostatic effects in proteins. J. Mol. Biol. 197, 111-130. Demchuk, E., Mueller, T., Oschkinat, H., Sebald, W., and Wade, R.C. (1994). Receptor binding properties of four-helix-bundle growth factors deduced from electrostatic analysis. Protein Science 3, 920-935. Dill, K.A. and Stigter, D. (1995). Modeling protein stability as heteropolymer collapse. Adv. Prot. Chem. 46,59-104. Dimitrov, R.A. and Crichton, R.R. (1997). Self-consistent field approach to protein structure and stability. I: pH dependence of electrostatic contribution. Proteins. Struct. Funct. Genetics 27, 576-596. Ellis, K.J. and Morrison, J.F. (1982). Buffers of constant ionic strength for studying pH-dependent processes. Meth. Enzymol. 87, 405-426. Ewing, T.J. A. and Lybrand, T.P. (1994). A comparison of perturbation methods and Poisson-Boltzmann electrostatics calculations for estimation of relative solvation free energies. J. Phys. Chem. 98, 1748-1752.
94
ALLEWELL, OBEROI, HARIHARAN, and LICATA
Forman-Kay, J.D., Clore, G.M.,and Gronenborn, A.M. (1992). Relationship between electrostatics and redox function in human thioredoxin: Characterization of pH titration shifts using two-dimensional homo- and heteronuclear NMR. Biochemistry, 31, 3442-3452. Gane, P.J., Freedman, R.B., and Warwicker, J. (1995). A molecular model for the redox potential difference between thioredoxin and DsbA, based on electrostatic calculations. J. Mol. Biol. 249, 376-387. Garcia-Moreno, B. (1994). Estimating binding constants for site-specific interactions between monovalent ions and proteins. Meth. Enzymol. 240, 645-667. Garfin, D.E. (1990a). One-dimensional gel electrophoresis. Meth. Enzymol. 182, 425-441. Garfin, D.E. (1990b). Isoelectric focusing. Meth.Enzymol. 182, 459-477. Gao, J., Mammen, M., and Whitesides, G.M. (1996). Evaluating electrostatic contributions to binding with the use of protein charge ladders. Science 272, 535-537. Gilson, M.K. (1995). Molecular-dynamics simulation with a continuum electrostatic model of the solvent. J. Comp. Chem. 16, 1081-1095. Gilson, M.K. and Honig, B. (1987). Destabilization of an oc-helix-bindle protein by helix dipoles. Proc. Natl. Acad. Sci. U.S.A. 86, 1524-1528. Harvey, S.C. (1989). Treatment of electrostatic effects in macromolecular modeling. Proteins: Struct. Funct. Genet. 5, 78-92. Hoi, W.G.J., van Duijuen, P.T., and Berendsen, H.J.C. (1978). The a-helix dipole and the properties of proteins. Nature 273, 443-446. Hoist, M. and Saied, F. (1993). Multigrid solution of the Poisson-Boltzmann equation. J. Comp. Chem. 14, 105-113. Hoist. M., Kozack, R.E., Saied, F, and Subramaniam, S. (1994a). Protein electrostatics: Rapid multigrid-based Newton algorithm for solution of the full nonlinear Poisson-Boltzmann equation. J. Biomolec. Struct. Dynamics 11, 1437-1445. Hoist. M., Kozack, R.E., Saied, F, and Subramaniam, S. (1994b). Treatment of electrostatic effects in proteins: Multigrid-based Newton iterative method for solution of the full nonlinear PoissonBoltzmann equation. Proteins: Struct. Funct. Genet. 18, 231-245. Honig, B. and Nicholls, A. (1995). Classical electrostatics in biology and chemistry. Science 268, 1144-1149. Honig, B. and Yang, A.S. (1995). Free energy balance in protein folding. Adv. Proiein Chem. 46, 27-58. Janin, J. (1997). The kinetics of protein-protein recognition. Proteins, Struct. Funct. Genet. 28, 153-161. Jayaram, B., Sharp, K.A., and Honig. B.H. (1989). The electrostatic potential of B-DNA. Biopolymers 28, 975-993. Jorgensen, W.L. (1989). Free energy calculations: a breakthrough for modelling organic chemistry in solution. Accts. Chem. Res. 22, 184-189. Karshikov, A. (1995). A simple algorithm for the calculation of multiple site titration curves. Protein Eng. 8, 243-248. Karshikov, A., Duerring, M., and Huber, R. (1991). Role of electrostatic interaction in the stability of the hexamer of constitutive phycocyanin from Fremyella diplosiphon. Protein Eng. 4, 681-690. Kirkwood, J.G. (1939). Theory of solutions of molecules containing widely separated charges with special applications to zwitterions. J. Chem. Phys. 2, 351-361. Klapper, I., Hagstrom, R., Fine, R., Sharp, K.A., and Honig, B. (1986). Focusing of electric fields in the active site of Cu-Zn superoxide dismutase: Effects of ionic strength and amino-acid modification. Proteins: Struct. Funct. Genet. 1, 47-59. Kollman, P. (1993). Free energy calculations: Applications to chemical and biochemical phenomena. Chem. Rev. 93, 2395-2417. Lakey, J.H., Parker, M.W., Gonzales-Manas, J.M., Duche, D., Vriend, G., Baty, D., and Pattus, F. (1994). The role of electrostatic charge in the membrane insertion of colicin A. Calculation and mutation. Eur. J. Biochem. 220, 155-163.
Electrostatic Effects in Proteins
95
Langsetmo, K., Fuchs, J. A., and Woodward, C. (1991a). The conserved, buried aspartic acid in oxidized Escherichia coli thioredoxin has a pKa of 7.5. Its titration produces a related shift on global stability. Biochemistry 30, 7603-7609. Langsetmo, K., Fuchs, J.A., Woodward, C, and Sharp, K.A. (1991b). Linkage of thioredoxin stability to titration of ionizable groups with perturbed pKa. Biochemistry 30, 7609-7614. Leach, A.R. (1994). Ligand docking to proteins with discrete side-chain flexibility. J. Mol. Biol. 235, 345-356. Lee, B. and Richards, F.M. (1971). The interpretation of protein structures: Estimation of static accessibility. J. Mol. Biol. 55, 379-400. Leger, D. and Hervd, G. (1988). Allostery and pKa changes in aspartate transcarbamoylase from Escherichia coli. Analysis of the pH dependence in the isolated catalytic subunits. Biochemistry 27, 4293-4298. Linderstrom-Lang, K. (1924). On the ionization of proteins. Comptes Rendus des Travaux du Laboratoire Carlsberg 15 (7), 1-29. Loewenthal, R., Sancho, J., Reinikainen, T, and Fersht, A.R. (1993). Long-range surface charge-charge interactions in proteins. J. Mol. Biol. 232, 574-583. Matthew, J.B., Gurd, F.R.N., Garcia-Moreno, B., Flanagan, M.A., March, K.L., and Shire, S.J. (1985). pH-dependent processes in proteins. CRC Crit. Rev. Biochem. 18, 91-197. MacKerell, A.D., Jr., Sommer, M.S., and Karplus, M. (1995). pH dependence of binding reactions from free energy simulations and macroscopic continuum electrostatic calculations: Application to 2'GMP/3'GMP binding to ribonuclease Tt and implications for catalysis. J. Mol. Biol. 247, 774-807. McCarrick, M.A. and Kollman, R (1994). Use of molecular dynamics and free energy perturbation calculations in anti-human immunodeficiency virus drug design. Meth. Enzymol. 241, 370-384. McQuarrie, D.A. (1976). Statistical Mechanics. Harper and Row, New York. Meeker, A.K., Garcia-Moreno, B., and Shortle, D. (1996). Contributions of the ionizable amino acids to the stability of Staphylococcal nuclease. Biochemistry 35, 6443-6449. Misra, V.K., Sharp, K.A., Friedman, R.A., and Honig, B. (1994). Salt effects on ligand-DNA binding. Minor groove binding antibiotics. J. Mol. Biol. 238, 245-263. Monette, M. and Lafleur, M. (1995). Modulation of melittin-induced lysis by surface charge density of membranes. Biophys. J. 68, 187-195. Nakamura, H. (1996). Roles of electrostatic interactions in proteins. Quart. Rev. Biophys. 29, 1-90. Nicholls, A., Sharp, K.A., and Honig, B. (1991). Protein folding and association: Insights from the interfacial and thermodynamic properties of hydrocarbons. Proteins: Struct. Funct. Genet. 11, 281-296. Oberoi, H. and Allewell, N.M. (1993). Multigrid solution of the nonlinear Poisson-Boltzmann equation and calculation of titration curves. Biophys. J. 65, 48-55. Oberoi, H., Trikha, J., Yuan, C., and Allewell, N.M. (1996). Identification and analysis of long-range electrostatic effects in proteins by computer modeling: Aspartate transcarbamylase. Proteins: Struct. Funct. Genet. 25, 300-314. Oliveberg, M. and Fersht, A.R. (1996). Formation of electrostatic interactions on the protein-folding pathway. Biochemistry 38, 2726-2737. Onsager, L. (1936). Electric moments of molecules in liquids. J. Am. Chem. Soc. 58, 1486-1493. Perakyla, M. and Pakkanen, T.A. (1995). Model assembly study of the ligand binding by p-hydroxybenzoate hydroxylase: Correlation between the calculated binding energies and the experimental dissociation constants. Proteins: Struct. Funct. Genet. 21, 22-29. Reddy, M.R., Viswanadhan, V.N., and Weinstein, J.N. (1991). Relative differences in the binding free energies of human immunodeficiency virus 1 protease inhibitors: A thermodynamic cycle-perturbation approach. Proc. Natl. Acad. Sci. U.S.A. 88, 10287-10291. Rogers, N.K. (1986). The modeling of electrostatic interactions in the function of globular proteins. Prog. Biophys. Mol. Biol. 48, 37-66.
96
ALLEWELL, OBEROI, HARIHARAN, and LICATA
Rossomando, E.R (1990). Ion exchange chromatography. Methods Enzymol. 182, 309-317. Schreiber, G. and Fersht, A.R. (1996). Rapid, electrostatically assisted association of proteins. Nature Struct. Biol. 3,427-431. Scott, D.L., Mandel, A.M., Sigler, P.B., and Honig, B. (1994). The electrostatic basis for the interfacial binding of secretory phospholipases A2. Biophys. J. 67, 493-504. Shafferman, A., Ordentlich, A., Barak, D., Kronman, C, Ber, R., Bino, T., Ariel, N., Osman, R., and Velan, B. (1994). Electrostatic attraction by surface charge does not contribute to the catalytic efficiency of acetylcholinesterase. EMBO J. 13, 3448-3455. Sharp, K.A. and Honig, B. (1990). Electrostatic interactions in macromolecules: Theory and applications. Ann. Rev. Biophys. Biophys. Chem. 19, 301-332. Sharp, K.A., Friedman, R.A., Misra, V, Hecht, J., and Honig, B. (1995). Salt effects on polyelectrolyteligand binding: Comparison of Poisson-Boltzmann and limiting law/counterion binding models. Biopolymers 36, 245-262. Stigter, D., Alonso, D.O.V., and Dill, K.A. (1991). Protein stability: Electrostatics and compact denatured states. Proc. Natl. Acad. Sci. U.S.A. 88, 4176-4180. Stoddard, B.L. and Koshland, D.E., Jr. (1993). Molecular recognition analyzed by docking simulations: The aspartate receptor and isocitrate dehydrogenase from Escherichia coli. Proc. Natl. Acad. Sci. U.S.A. 90, 1146-1153. Tan, Y.J., Oliveberg, M., Davis, B., and Fersht, A.R. (1995). Perturbed pKa values in the denatured states of proteins. J. Mol. Biol. 254, 980-992. Tanford, C. and Kirkwood, J.G. (1957). Theory of protein titration curves. I. General equations for impenetrable spheres. J. Am. Chem. Soc. 79, 5333-5339. Tanford, C. and Roxby, R. (1972). Interpretation of protein titration curves. Application to lysozyme. Biochemistry 11,2192-2198. Taylor, M.A.J., Baker, K.C., Connerton, I.F., Cummings, N.J., Harris, G.W., Henderson, I.M.J., Jones, S.T.,Pickersgill,R.W., Sumner, I.G.,Warwicker, J., and Goodenough, P.W. (1994). An unequivocal example of cysteine proteinase activity affected by multiple electrostatic interactions. Protein Eng. 7, 1267-1276. Tidor, B. (1993). Simulated annealing on free energy surfaces by a combined molecular dynamics and Monte Carlo approach. J. Phys. Chem. 97, 1069-1073. Thomas, F.G., Russell, A.J., and Fersht, A.R. (1985). Tailoring the pH dependence of enzyme catalysis by protein engineering. Nature 318, 375-376. Tomasi, J. and Persico, M. (1994). Molecular interactions in solution; an overview of methods based on continuous distribution of the solvent. Chem. Rev. 94, 2027-2094. Turnbull, J.L., Waldrop, G.L., and Schachman, H.K. (1992). Ionization of amino acid residues involved in the catalytic mechanism of aspartate transcarbamylase. Biochemistry 31, 6562-6569. Warshel, A. and Russell, S.T. (1984). Calculation of electrostatic interactions in biological systems and in solution. Q. Rev. Biophys. 17, 283-422. Warwicker, J. (1994). Improved continuum electrostatic modelling in proteins, with comparison to experiment. J. Mol. Biol. 236, 887-903. Warwicker, J. and Watson, H.C. (1982). Calculation of electrostatic potential in the active site cleft due to a-helix dipoles. J. Mol. Biol. 155, 53-62. Warwicker, J., Mueller-Harvey, I., Sumner, I., and Bhat, K.M. (1994). The activity of porcine pancreatic phospholipase A2 in 20% alcohol/aqueous solvent, by experiment and electrostatics calculations. J. Mol. Biol. 236, 904-917. Weber, G. (1975). Energetics of ligand binding to proteins. Adv. Protein Chem. 29, 1-83. Wyman, J., Jr. (1964). Linked functions and reciprocal effects in hemoglobin: A second look. Adv. Protein Chem. 19, 223-286. Wyman, J. and Gill, S.J. (1990). Binding and Linkage: Functional Chemistry of Biological Macromolecules. University Science Books, Mill Valley, CA. Yang, A.-S. and Honig, B. (1993). On the pH dependence of protein stability. J. Mol. Biol. 231,459-474.
Electrostatic Effects in Proteins
97
You, T.J. and Harvey, S.C. (1993). A finite element approach to the electrostatics of macromolecules with arbitrary geometries. J. Comp. Chem. 14, 484-501. Yuan, C, LiCata, V, and Allewell, N. (1996). Effects of assembly and mutations outside the active site on the functional pH dependence of E. coli aspartate transcarbamylase. J. Biol. Chem. 271, 1285-1294. Zhou, H.X. (1993). Boundary element solution of macromolecular electrostatics: interaction energy between two proteins. Biophys. J. 65, 955-963. Zhou, X.H. and Vijayakumar, M. (1997). Modeling of protein conformational fluctuations in pKa predictions. J. Mol. Biol. 267, 1002-1011.
This Page Intentionally Left Blank
Chapter 4
The Binding of Ions to Proteins JENNY P. GLUSKER
Abstract Introduction Metal Ion Binding to Protein Functional Groups Examples of Cation Binding in Proteins Ion Migration in Proteins Containing More Than One Metal Anion Binding to Protein Functional Groups Methods of Prediction of Ion-Binding Sites Acknowledgments
99 100 105 118 126 129 141 147
ABSTRACT The sites on proteins that ions select for binding depend on the charge, cavity size, and chemistry of the space available. Positively charged ions such as metal ions bind to the carboxylate, imidazole, and sulfhydryl groups on the side chains of proteins. The optimal location of metal ions with respect to these functional groups can be found from crystal structures of proteins and of small molecules. Metal ions can be distinguished in terms of their polarizabilities: the less polarizable cations such as Mg 2+ bind to oxygen ligands, whereas the more polarizable cations such as Cu+ prefer sulfur as a ligand. Most transition metal ions have properties intermediate between these two. Examples of several studies of metal binding in X-ray crystal structure
Protein: A Comprehensive Treatise Volume 2, pages 99-152 Copyright © 1999 by JAI Press Inc. All rights of reproduction in any form reserved. ISBN: 1-55938-672-X 99
100
JENNY P. GLUSKER
determinations are presented together with some information on methods currently in use to identify sites of ion binding when the three-dimensional structure of the protein is known.
INTRODUCTION The binding of ions to proteins is generally an electrostatic effect. The geometries of the interactions involved are governed by the need for local balance of charge in any area of a protein, as Linus Pauling (1929) noted for the packing of small molecules or ions in crystals. The binding of ionic ligands to proteins conforms to conditions such as that the available binding cavity in the protein must be of the appropriate size to enclose the ion and that the charge on the inner surface of this cavity in the protein must be approximately balanced by the charge of the ion that is to be bound. It will be necessary for the ion to displace other binding groups; some of these such as water molecules are readily displaced, while others are not so that a local conformational change in the protein (to reorient amino-acid side chains) may be necessary. The main thrust of this article is to consider those geometric and electronic factors that result in the binding of a specific ion (and sometimes of related ions) to sites in proteins to the exclusion of other ions. Results of three-dimensional structure determinations of proteins and protein-ligand structures by X-ray diffraction methods form the basis of the descriptions in this article. Cations and anions will be considered separately. Studies of ion binding are now so extensive that only a few selected examples illustrating binding types can be given here. Often ions will bind in sites on an enzyme that have been specifically engineered by nature to attract these ions that are required for the successful catalysis of a biochemical reaction. For example, metal ions are often part of the active site of an enzyme while many enzyme substrates are anions so that investigations of the binding of ions to an enzyme are often relevant to an understanding of the mechanism of action of that enzyme. These ions are replaceable by other ions of similar size and charge, but the result of such changes many be an inactive enzyme. There are three important types of interactions between proteins and ligands: 1. electrostatic interactions that have little, if any, orientational preferences, 2. hydrogen bonding, which is generally highly directional (Umeyama and Morokuma, 1977; Taylor and Kennard, 1984; Jeffrey, 1987), and 3. weaker interactions such as C-H-O interactions, which are mainly found in the more hydrophobic areas of proteins and can serve to help align ligand molecules containing functional groups (Sutor, 1963; Gould et al., 1985; Burley and Petsko, 1988).
Ion Binding to Proteins
101
Of these three types of interactions, the electrostatic and hydrogen-bonding components will mainly be considered here. Several of the amino-acid side chains in proteins are ionized at neutral pH and therefore, under physiological conditions, attract ions of the opposite charge. Some of these possibilities are diagrammed in Figure 1 and listed in Table 1. The carboxyl groups on aspartic and glutamic acid residues are the main attractants of positively
I I
^V*^ Asp.Glu Cs*
\ i
s
\
H
CH
// \\
* |
^
-CH
2
CH;
His
Lys
CH 2
I'
Arg
CH3
1
"CH2
Ser, Thr
H
\
^ V Asn, Gin
Tyr
C|
H
^
^
Figure 7. Amino-acid side chains that bind ions. Directions of binding (hydrogen bonding, and, in some cases, metal-ion binding) are indicated by open arrows.
JENNY P. GLUSKER
102
Table 1. Amino-Acid Side Chains That Bind Ions (a)
(b)
(c)
Negatively-charged amino acids
(d)
Hydrogen bond donors
Aspartate (Asp)
Main-chain a m i n o (-NH-l
Glutamate (Glu)
Asparagine (Asn)
Positively-charged amino acids
Glutamine (Gin)
Histidine (His)
Arginine (Arg)
Lysine (Lys)
Lysine (Lys)
Arginine (Arg)
Tyrosine (Tyr)
Hydrogen bond acceptors
Serine (Ser)
Main-chain carbonyl (C=0)
Threonine (Thr)
Asparagine (Asn)
Cysteine (Cys)
Glutamine (Gin)
Aspartic acid (Asp)
Aspartate (Asp)
Glutamic acid (Glu)
Glutamate (Glu) Histidine (His)
charged cations such as metal ions, but there are other functional groups that also bind metal or hydrogen ions. Examples are provided by histidine, methionine, asparagine, cysteine, and glutamine side chains, main-chain carbonyl (C=0) groups, and the hydroxyl groups on tyrosine, serine, and threonine. Negatively charged groups are more often held in place by means of hydrogen bonds to positively charged side chains such as those of arginine, histidine, lysine, and main-chain amino (-NH-) groups. Arginine side chains play a particularly important role in aligning anions such as carboxylate and phosphate groups and holding them in a rigid manner (Borders et al., 1994; Shimoni and Glusker, 1995). Positively charged ions that can bind to proteins include hydrogen ions, metal ions, and other cations such as ammonium and substituted ammonium ions. Of these, hydrogen ions are difficult to locate; the X-ray diffraction method does not generally reveal them in the electron-density maps of proteins because the resolution is not high enough. They can be found by neutron diffraction studies and by NMR measurements. They are also identified in hydrogen bonding motifs, where presumably the hydrogen ion can be transferred back and forth along certain hydrogen bonds as has been suggested for a-chymotrypsin (Blevins and Tulinsky, 1985; Tsukada and Blow, 1985). Most of the information on ion binding has come from X-ray diffraction and NMR studies. This chapter will concentrate on results from X-ray diffraction studies. In the early stages of protein structure determination by X-ray diffraction, it is usual to soak compounds containing solutions of heavy atom-containing compounds into protein crystals. These heavy-atom compounds enter the crystal via the water channels composing, in the average case, about half of the volume of the crystal. The heavy-atom containing compounds can then attach to side chains and some main-chain atoms in the protein. X-ray diffraction data from two or more
Ion Binding to Proteins
103
heavy-atom derivatives of the protein are used to determine relative phases of each X-ray diffraction beam so that an electron-density map can be calculated. For example, in the crystal structure determination of D-xylose isomerase, when heavyatom derivatives were made, it was found that the uranyl-containing groups were located in the metal-binding site of the enzyme while platinum- and mercurycontaining groups were located on the exterior of the enzyme (H.L. Carrell et al., 1984). In high-resolution electron-density maps of proteins (about 1.7 A resolution or higher), it is often possible to distinguish high peaks with other peaks around them at about 2.0-2.4 A. Arrangements of atoms with these separations imply that a metal ion has been bound in the protein. By contrast, covalent bond distances are of the order of 1.2-1.5 A, hydrogen bond distances are 2.7-3.1 A long, and van der Waals interactions are generally 3.4-3.7 A. Therefore these different types of bonding can usually be distinguished by the interatomic distances found, although distances from ligand atoms to very large cations such as potassium ions approximate the distances found in hydrogen bonds. When perusing reports of crystal structure determinations, one must be careful to note the resolution of the structure determination (Glusker and Trueblood, 1985). If this is in the region of 1.6-1.9 A, the resolution is good for a protein structure determination, and most of the side chains as well as the main chain will be located,
(a)
(b)
(c)
Figure 2. Resolution of a three-ring structure at (a) 2.5 A resolution, (b) 1.5 A resolution, and (c) 0.8 A resolution. Protein structures are generally determined at the resolution shown in (a) and (b). Small molecules are done at the resolution shown in (c) or better.
104
JENNY P. GLUSKER
together with many water molecules bound to the protein. Most of the protein structures discussed here are at a fairly high resolution. The resolution of a ring structure is diagrammed in Figure 2. When protein side chain groups are fit to an electron-density map, the model of an entire side chain is fit as best it can, possibly with some rotation of bonds at the end of a long side chain. Then the model is refined. Unless the resolution of the structure is very high, there are generally not enough data for a refinement of each atomic position and temperature factor. Therefore, the refinement is applied to the side chain as an entity rather than just an atom. As a result, minor structural variations cannot be identified at the lower resolution of protein structure determination. This must always be kept in mind when interpreting results of protein structure determination (Bernstein et al., 1977). The identities of atoms in macromolecular structure determinations are sometimes in question. The X-ray scattering of an atom is proportional to its atomic number; this is not true for neutron scattering which is the reason that hydrogen atoms (with the lowest atomic number of any atoms) can be located more readily by neutron diffraction. Experimental problems and the greatly increased number of atoms that must be located in a neutron-diffraction experiment make this method of protein structure determination more difficult than X-ray diffraction studies and therefore rarely used. The electron density map obtained from an X-ray diffraction study gives a measure of the electron count per A3 at each point on a chosen grid of selected spacings in three dimensions. Therefore if electron-density values are summed for each grid point (n in all) that covers the area of an ion, an electron count can be made by dividing the sum of n grid points by the volume (in A3) that they cover. The identity of a metal ion in a macromolecular crystal structure determination can be obtained in this way from a high-resolution electron-density map. The height of a peak in an electron density map is related to the atomic number of the atom it represents, but this height is also affected by the temperature factor (also more correctly called the displacement parameter) of the atom, as diagrammed in Figure 3 (Glusker and Trueblood, 1985). The atomic coordinates define the position of each atom in three dimensions in the repeat unit (the unit cell). The displacement parameter factor defines the extent to which atomic positions vary from unit cell to unit cell throughout the millions of such unit cells in a crystal. The frequency of X-rays is much higher than that of atomic and molecular vibrations so that the X-ray diffraction experiment encounters an instantaneous snapshot of the displaced atoms in a molecule. A measure of the displacement parameter of an atom can serve to amend the crystal structure model for an incorrect atomic number for that atom. Thus, if the temperature factor of one atom is found, on protein structure refinement, to be very low compared with those of the surrounding atoms, the atomic number of that atom in the model that is being refined is probably too low and should be increased. If the temperature factor is very high, the atomic number of that atom in the model is probably too high, and in the refinement, an attempt is being made to reduce the contribution of this atom to each structure factor.
A
/\
Ion Binding to Proteins
(a)
105
(b)
Figure 3. Atomic displacement parameters (temperature factors). Profiles of atoms in an electron-density map. The vertical axis represents electron density in electrons per A3, (a) An atom with a small displacement parameter, (b) An atom with a larger displacement parameter. In both (a) and (b) the three-dimensional volume under the plots are similar, but the electron density is more spread out in (b), and the peak height is lower.
The shapes of scattering factor curves for different types of atoms vary slightly; because metal ions are positively charged, they have a sharper contour in an electron density map than do single atom anions. As a result, the peak height is greater for a cation than for an anion with the same numerical charge of opposite sign. This means that magnesium ions (atomic number 12, charge +2, effectively 10 electrons) give higher peaks (with less width) than do fluoride ions (atomic number 9, charge - 1 , effectively also 10 electrons) provided the map resolution is high enough.
METAL ION BINDING TO PROTEIN FUNCTIONAL GROUPS The best protein binding sites for ions are those that have been engineered by nature to attract biochemically relevant ions. Proteins, including enzymes, often have specific sites on them for metal ion binding and they attach these cations in a variety of ways and for a variety of reasons. Metal ions are better for catalysis than are hydrogen ions because they generally have a higher charge and can be present in reasonable concentrations at neutral pH. If a metal ion takes part in the catalytic mechanism of an enzyme, this cation may serve to bring specific functional groups together in the relative orientation that is most appropriate for reaction, it may take a part in oxidation-reduction reactions, or it may provide electrostatic shielding from negative charges so that a negatively charged substrate can approach the active site. Alternatively the metal ion may help stabilize the active site so that the catalyzed reaction can be highly stereospecific. Nature has chosen several interesting mechanisms for selecting a specific cation to bind at a given site on a protein rather than just fitting any cation that has the best charge and size to fit. The chemistry of the metal ion, as well as its size and charge, is taken into account. Other ions may also bind if they mimic these ions in shape and size, but their chemistry may differ and the result may be an inactive enzyme. For example, sulfate ions bind well at sites meant for phosphate groups (which have a similar size); metal
106
JENNY P. GLUSKER Table 2.
Relative Concentrations of Cations (mM)
Medium
Sodium
Fluids in cells Fluids outside cells Sea water
11 160 450
Magnesium
Calcium
Potassium
10"4 2 10
2.5 2 52
92 10 10
ions bind in areas that expect other metal ions of similar size and charge. How well these foreign ions bind depends on how well the lining of the cavity suits their individual chemistries. In the body, the metal cations available in high concentrations for binding to proteins are few. Ion concentrations in the body are high only for Na + , K*, Mg2+, and Ca2+, as shown in Table 2. These concentrations (except in the case of potassium ions) generally lie between those for sea water and pure water. Magnesium ions have about the same concentration within the cell as in its surrounding extracellular fluids. Potassium ions are present in higher concentrations in the cell, while sodium ions are essentially excluded from it by a membrane-bound ionic pump specifically designed for the purpose. Calcium ions, because they form insoluble salts rather readily and therefore might cause problems within the cell, are found mainly in the extracellular fluids and in bone. These four metal ions do not have unshared valence electrons; they bind by purely electrostatic interactions. In addition, because they are not readily deformed (polarized) by an electric field from a neighboring atom, they are called "hard" and tend to bind to hard ligands, particularly oxygen (Table 3) (Ahrland et al., 1958; Pearson, 1963). Very few enzymes utilize sodium or potassium in their catalytic mechanisms because of concentration problems, that is, the potassium ion concentration is very
Table 3.
Hard and Soft Metal Ions
(a) Characteristics of the metal ions Hard +
+
+
Borderline 2+
2+
2+
H , Li , Na , K+, Be2+, Mg 2+ , Ca2+, Sr2+, Mn 2+ , Al 3+ , Cr3+, Co3*, Fe3+ (b) Stabilities of complexes With Hard Cations
Fe , Co , Ni Cu 2+ , Zn 2+ , Pb2+
F > CI > Br > I O » S > Se > Te N » P > As > Sb
F < CI < Br < I O « S ~ Se ~ Te N « P > As > Sb
With Soft Cations
Soft +
+
Cu , Ag , Au + , Tl + , Cd 2+ , Hg 2+ , Pd2+, Pt2+
Ion Binding to Proteins
107
high in the cell and the sodium ion concentration is low, controlled by the ion pump. Additional control of the concentrations of these ions by enzyme-mediated agents would be difficult (Glusker, 1991,1994). Therefore many proteins in the cell utilize the remaining ions present in high concentrations—calcium and magnesium ions—in a variety of ways. Certain proteins have engineered cavities within them that can bind these metal ions specifically. There are, however, many different types of biochemical reactions that need to be catalyzed for the maintenance of life. These reactions will proceed more readily with "softer" metal ions, that is, those like the transition metal ions that are more readily deformable than the alkali metal and alkaline earth cations. These softer metal ions are only present in trace amounts in the cell, but can be selected out by an engineering of the appropriate binding site within the protein, as will be described. There are two types of enzymes that bind metal ions—the metalloenzymes, which tightly bind transition metal ions, and the metal-activated enzymes, which loosely bind alkali metal and alkaline earth metal ions. Of the less common elements used by enzymes, probably the most important are divalent zinc, copper, and iron, which are bound to many enzyme systems and take part in their catalytic mechanisms. We wish to know which ions will bind to a given protein, how they are bound, and, when they bind, what they do within the protein under physiological conditions. Metal ion-binding sites on proteins are selective if they provide a cavity with a required diameter that will just accommodate the required metal ion and will also contain enough negative charge to neutralize the charge on the metal ion. They also should provide binding groups with the appropriate deformability (hard or soft). We will first consider the relationships of metal ions to their binding groups; carboxylates, imidazoles, and sulfhydryl groups are the most common metalbinding groups in proteins. Of these, the oxygen atoms of carboxylate groups can be considered hard, the sulfur atoms of sulfhydryl groups as soft, and the nitrogen atoms of histidine groups as somewhat softer than oxygen atoms (see Table 3). The stabilities of complexes of the borderline ions with a given ligand are expressed in the Irving-Williams series (Irving and Williams, 1953). In this series, shown below, the ionic radius decreases from left to right, while the ionization potential increases. Ba2+ < Sr2+ < Ca2+ < Mg2+ < Mn2+ < Fe 2+ < Co2+ < Ni2+ < Cu2+ > Zn2+ The main protein side-chain groups that bind metal ions are the carboxyl groups of aspartic and glutamic acids. The relative positions of ions with respect to carboxyl groups in metal ion-carboxylate interaction have been investigated in our laboratory (C. J. Carrell et al., 1988). There are two lone-pair electrons on an oxygen atom of an ionized carboxylate ion. We asked which lone pair is preferred for metal cation binding—the one that is syn or the one that is anti to the other C-O bond (Gandour, 1981) (see Figure 4). The C-COO carboxylate group is planar. Where do the metal ions bind with respect to this plane? In order to investigate these geometrical queries, we examined the structures of small-molecule crystal structures in the Cambridge Structural Database (Allen et al., 1979). These crystal structures are
108
JENNY P. GLUSKER
O
M"
-
0
O
0-
0HZn"
-OH-Ser
Zn2+-
OH-
1 (a)
(b)
Figure 22. The inferred transition state of the action of alkaline phosphatase, based on the data in Figure 2 1 . (a) The active and (b) the inactive forms.
JENNY P. GLUSKER
134
\
C o
/
0H 2
P I O 0
(a)
h /
P"
O--. / " - - H Arg369 N HN Arg47 N /'H2N
NHv'
H
H20'
Asp223
Figure 23. The binding of an NADPH cofactor pyrophosphate group to liver alcohol dehydrogenase, (a) Diagram and (b) crystal structure (Al-Karadaghi et al., 1994).
Ion Binding to Proteins
135
methane monooxygenase acetate CH 3 Glu243 O
H20
OL
Glull4 C .
Glu209
.-O Fc-
*"Fe„ "O"'
Aspl08 N HislOl
H20 N-.. /'' \
V. I
(b)
I
Fc"
***,F,e
His25
His54
O
His73
\ NH His77
Figure 24. (a) Acetate binding to methane monooxygenase, compared with (b) the binding of an active-site aspartate in hemerythrin.
Many monovalent anions are inhibitors of the carbonic anhydrase reaction and they either displace the zinc-bound water molecule or they bind near the zinc and increase its coordination number. Examples of inhibitory anions are iodide (I"), and aurocyanide [Au(CN)p]. The crystal structure of human carbonic anhydrase complexed in different crystals with these two inhibitors has been determined (Kumar et al., 1994). In this enzyme, the iodide ion replaces the fourth coordination position on the zinc ion that is normally occupied by the H 2 0/OH" ligand. Otherwise the
136
JENNY P. GLUSKER
distorted tetrahedron around the zinc ion is not perturbed, but the product is inactive. The complex has a Zn2+— I" distance of 2.7 A as shown in Figure 25 (Kumar et al., 1994). The Au(CN)2 group does not bind in the same way. Instead of displacing the H 2 0/OH" ligand, it forms a hydrogen bond to the H 2 0/OH" group. The N = C - A u group is bent by 13°. The hydrogen bond that it forms to the metal-bound hydroxyl group will prevent the latter from forming a hydrogen bond to the hydroxyl group of Thrl99. The hydrogen atoms on the hydroxyl group then point towards the substrate (carbon dioxide) binding site thereby interfering with substrate binding
Zn Zn Zn
P 2.7 A Au 6.1 A N 3.4. 9.2 A
His96
H20/OH" , ^..2.9 A C
aurocyanide Au
\
N
Figure 25. (a) Diagram of binding of aurocyanide, water, and iodide to carbonic anhydrase. Some metal ion-ligand distances are listed, (b) Crystal structure (Kumar et al., 1994).
Ion Binding to Proteins
137 H20/OH"
7„2+ .-Zn
His94Ne2'''
(a)
His96Ne2
Hisll9N5l
Thrl99 "H 3.0 A
O >
:c
H2
°^,2.lA
CH3
'2.4 A
2.2 A ^ ; z n ? t
2.i A
His94Ne2-'
* Hisll9N6l
/ 2.2 A (b)
Thrl99
His96 NE2
\
N
_
3.0 A H
Qv ,C
CH 3
"^Zn2*
2Aky'' His94Ne2''' (c)
\
^..2.2 A
'. 2.2 A
*^
Hisll9N6l
His96 Ne2
Figure 26. Binding of (a) water/hydroxide, and (b) acetate to carbonic anhydrase. Also shown in (c) is the binding of acetate to a mutant enzyme.
JENNY P. GLUSKER
138
\
/
Argl74
W^N
/ H^
\
Gln200 backbone
\ O
»S. O-
0'
I
H O
GDP
6
II
Thrl77 backbone
(a)
X Glu43 backbone
/
Lvs46
\
NH
-H,N^-
/ HCT Scr47
NH 2 HN
. ^ - ^ - - 0 H NH 2
Aral78
-F--
"**F HO OH2
ThrlSI
\
/ (b)
2
\
GK203 backbone
(continued) Figure 27. Binding of (a) Y-thioGTP, and (b) aluminum fluoride to the Giocl protein, (c) Crystal structure of aluminum fluoride binding (Coleman et al., 1994).
139
Ion Binding to Proteins
(c)
GIn204 Figure 27.
Continued
and inhibiting the enzyme reaction. This reorientation of the hydrogen atom of the hydroxy 1 atom is what the hydroxyl-Thrl99-Glul06 is believed to prevent in the normally functioning enzyme. This hydrogen bond network also facilitates nucleophilic attack of the oriented lone pair on carbon dioxide, the substrate. In acetate complexes of human carbonic anhydrase II and its E106Q mutant, it is found that acetate is bound (Hakansson et al., 1994b). The zinc ion is bound to three histidine groups, one water molecule, and one oxygen atom of the carboxyl group of acetate. There is a need for enzyme activity to exclude atoms that are not hydrogen bond donors from coordinating to the zinc-bound water by virtue of the Glul06-Thrl99 hydrogen bond network. In the wild-type structure, acetate binds in what is called the carboxylate site. The hydrogen bond acceptor Thrl99 hydroxy 1 group prevents the carboxylate oxygen atom from entering the zinc-water position. In E106Q, the hydrogen bond network is reversed and the hydroxy 1 group on
JENNY P. GLUSKER
140
-CH ooc A
/ H ^CH-
\
(b)
HO-
/
/
\
fluorocitratc
OOC
(c)
Figure 28. Binding of (a) citrate, (b) isocitrate, and (c) fluorocitrate to metal ions. The two prochiral -CH 2 -COO- groups of citrate are labelled A and B to show that the binding of fluorocitrate is in the opposite direction to that of isocitrate in the enzyme aconitase.
Thrl99 can act as a hydrogen bond donor or acceptor with respect to the zinc ion. Therefore the acetate can bind carboxylate oxygen near the zinc-water position. This is illustrated in Figure 26. Aluminum fluoride, in the form of AlF^ can bind to the site of Ga-GDP occupied by the y-phosphate in the Ga-GTP complex (Sondek et al., 1994). When it does so, it activates the enzyme. The structure of transducin oc-GDP (at 1.7 A resolution) activated by aluminum trifluoride shows four fluoride ions in an octahedral plane around the aluminum ion and two apical oxygen atoms from the (3-phosphate group of GDP and from water, respectively. The GDP-0-(AlF^-H 2 0) complex is held firmly in place by the two terminal nitrogen atoms of an arginine residue (Argl74), which forms hydrogen bonds to one fluoride ion and the oxygen atom linked to the GDP. Thrl77 and Gln200 help stabilize the water molecule bound to the A1FO". As shown in Figure 27, a calcium ion also binds to two of the fluoride ions indicating
Ion Binding to Proteins
141
other interactions in this region of the enzyme-bound GDP molecule. This crystal structure led to a suggested mechanism of action of the enzyme (Sondek et al., 1994). A word on fluorine and fluorides seems appropriate here. Confusion exists on the role of this, the most electronegative element, that often appears fairly inert when covalently bound to carbon. However, when there are activating groups nearby in the molecule such as the carboxylate group of trifluoroacetate the anion shows its own idiosyncratic model of binding. This is particularly true for fluorocitrate, which has been suggested not to bind to the enzyme aconitase in the same way that citrate does (H. L. Carrell et al., 1970). The tendency for fluorine in the C-F bond to bind to the coordination sphere of metal ions has been shown by X-ray diffraction studies (Murray-Rust et al., 1983). This may explain the highly poisonous character of one isomer of fluorocitrate, one that has the fluorine atom on the arm of the molecule that, in citrate, would not be acted on by the enzyme. Thus, addition of a fluorine atom to citrate has made it bind the "wrong way" round as shown in Figure 28 (H. L. Carrell et al., 1970).
METHODS OF PREDICTION OF ION-BINDING SITES The overall picture that emerges is that ions bind readily to an appropriately sized cavity lined with several groups carrying the opposite charge. Metal ions bind well to negatively charged groups such as carboxylate groups, and it has been shown that an inviting cavity is produced in D-xylose isomerase in order to bind metal ions. Similarly, anions bind in cavities that have plenty of groups (preferably positively charged) that bind them, generally by hydrogen bonding (i.e., binding to hydrogen ions). In conclusion, some methods currently in use for identifying such sites in proteins will be described. Several computer programs have been written in order to allow for the prediction of ion-binding sites in proteins. One of the best known is the program GRID (Goodford, 1985; Boobbyer et al., 1989; Wade and Goodford, 1993; Wade et al., 1993), which has been successfully used to determine optimum sites on proteins for the binding of specified ions or functional groups (chosen by the investigator). An empirical energy function is used to calculate the interaction energy between a chemical probe such as a water molecule and the target molecule (a protein). The result is the identification of binding sites for the selected chemical probe on the surface of the target molecule. Therefore the program can be used to analyze how cations or anions would bind to a protein. The energy function used involves Lennard-Jones, electrostatic and hydrogen-bonding terms that have been derived from experimental data from crystal structure determinations. Energies are displayed as energy contours around the target molecule by use of the program FRODO (Jones, 1978). The charge distribution around metal ions is positive and approximately symmetrical. It gathers around it atoms, ions, or groups that are negatively charged, that is,
142
JENNY P. GLUSKER
electron-pair donors (Lewis bases). These electron-pair donors, in proteins, are oxygen, nitrogen, and sulfur atoms. In addition to binding metal ions, these groups also readily bind water molecules and hence may be described as hydrophilic. In the side chains of proteins, however, these electron-donor atoms are covalently bound to carbon atoms that are hydrophobic. With this in mind, the environments of metal ions in proteins have been critically evaluated by Eisenberg and coworkers (Yamashita et al., 1990). The common feature of metal-binding sites in proteins was identified by them as an area with a shell of hydrophilic groups (containing oxygen, nitrogen, or sulfur atoms) that is embedded in a larger shell of hydrophobic groups (containing carbon atoms). They described these sites as ones of high hydrophobicity contrast, that is, a rapid change from hydrophilic to hydrophobic as a function of the distance of the atom (up to 7 A) from the metal-binding site. In addition, the hydrophobic outer sphere provides an interior region of low dielectric that may serve to enhance electrostatic interactions within it. A program has been written by them to search for such areas of hydrophobic contrast and to consider them as potential metal-ion binding sites. It is presumed that the hydrophobic sphere around the hydrophilic sphere can restrict the flexibility of the metal-binding site (Serpersu et al., 1986). By Coulomb's Law, the force between two charges, qj and q2, along the line between their centers depends on the dielectric constant D and the square of the distance r between the charges F = qi q 2 /kDr 2
(1)
where k is a constant. When a charged particle (q) interacts with a dipole moment F = qji cos0/kDr2
(2)
where 0 is the angle between the direction of the dipole moment and the line joining the point charge to the center of the charge displacement of the dipole. Thus charge-charge interactions depend on the distance between them, while chargedipole interactions also depend on the distance and orientation of the two. The electrostatic potential on the surface of a protein can be calculated by classical electrostatic theory and it will give an indication of charged areas on the molecule; these would readily attract charges of the opposite sign (Gilson and Honig, 1987). The interior of a protein may be considered as a homogeneous dielectric medium that can be polarized by electric charges. The dielectric constant of proteins is low (2 to 3) because the reorientation of dipolar groups is restricted. Since water has a high dielectric constant (approximately 80), the protein-water interface is the boundary of two dielectric media. These principles can be used to derive electrostatic potentials within the protein from the calculated partial charges and the known atomic coordinates. The protein is divided into cubes, 1 A on an edge with a dielectric constant assigned to each cube (Sternberg et al., 1987). The charge
Ion Binding to Proteins
143
position and the shape of the molecule are taken into account. The result can be used to estimate pKa values and shifts. These ideas have been extended to give precise locations of calcium-binding sites in proteins (Nayal and Di Cera, 1994). They used the principle, mentioned earlier, put forward by Pauling (1929) and developed further by Brown and coworkers (Brown and Wu, 1976; Brown, 1978, 1988) that when a divalent metal ion (charge +2) binds to a protein, the total charge lining the cavity must be - 2 . This negatively charged lining is provided by oxygen, nitrogen, and sulfur atoms, each with partial or complete negative charges between 0 and -1.0. The coordination number of the cation (the number of atoms arranged around it) will depend on the relative sizes of the cation and anion. The larger the cation and the smaller the anion, the larger the number of groups that can bind around the cation. This can be analyzed in terms of the cation-to-anion radius ratio. Values of radius ratios and the most likely coordination numbers of metal ions are given in Table 6. Thus if the radius ratio lies between 0.16 and 0.24 the cation will bind four anions. Each site around the protein (calculated on a grid with a selected spacing) is checked to see if it can provide an environment with a total negative charge of -1.4 to -2.0 so that a calcium ion can comfortably bind with the expected coordination number (6 to 9). The higher the charge or partial negative charge on an anion surrounding a positively charged atom or group, the closer it will be presumed to come to the cation. As a result, some measure of the partial charge on each oxygen, nitrogen, or sulfur atom in the first coordination sphere can be obtained from the relative distances between each of these atoms and the central metal ion. The charges found by a formula of the type v = (IVR i r N
(3)
can fit to the experimental data to give values for the bond valence v between two atoms, one a metal ion and the other an atom in its first coordination sphere at an experimentally determined cation-oxygen distance R. For example, values Rl and N for various ions are as follows: 1.909 and 5.4, respectively, for calcium ions; 1.622 and 4.290, respectively, for both sodium and magnesium ions; and 2.276 and 9.1 for potassium ions (Brown and Wu, 1976). The general equation for cations is v = s(R^ l )"< a6CN+2 - 2)
(4)
where s is the average bond valence, R{ is the average bond length, and CN is a typical coordination number. For example, if a magnesium salt has Mg2+—O bond distances of 2.07, 2.13, 2.08, 2.14, 2.12, and 2.03 A, then by equation (1) each metal-oxygen interaction has a bond valence that can be calculated as -0.351, -0.311, -0.344, -0.303, -0.317, -0.382, respectively. These add up to -2.009, which will balance the charge of +2 on the metal ion. In the simplest case when a magnesium ion with a charge of +2 gathers six oxygen atoms around it, the charge on each would be expected to be -1/3 (-0.333) and all metal-oxygen distances would be equal (Figure 29). In the example given above, the distances are unequal
144
JENNY P. CLUSKER
Table 6. Radius Ratios, Ionic Radii, and Average Coordination Numbers (a) Cation-anion radius ratios Coordination
number
Coordination
Radius ratio
3
0.155
Triangle
4
0.244
Tetrahedi on
6
0.414
Octahedron
8
0.645
Square antiprism
8
0.732
Cube
1.0
Cube-octahedron
12
polyhedron
(b) Ionic radii and average coordination numbers (Brown, 1988) Cation
Cation radius, A
Average coordination
beryllium
0.31
aluminum
0.50
5.3
cobalt(lll)
0.54
5.9
4.0
chromium(lll)
0.58
6.0
lithium
0.60
5.3
manganese(lll)
0.62
5.8
iron(lll)
0.62
5.7
magnesium
0.65
6.0 5.9
nickel(ll)
0.66
copper(ll)
0.69
5.1
cobalt(ll)
0.70
5.7
zinc(ll)
0.71
5.0
tin(IV)
0.71
5.9
copper(l)
0.72
2.2
iron(ll)
0.74
5.9
manganese(ll)
0.80
6.0
lead(IV)
0.84
5.7
palladium(ll)
0.86
4.4
cadmium(ll)
0.91
6.1
sodium
0.95
6.7
mercury(ll)
0.98
5.5
calcium
0.99
7.3
silver(l)
1.10
5.1
strontium
1.13
8.6
potassium
1.33
9.0 10.2
barium
1.35
rubidium
1.48
9.8
cesium
1.69
10.4
number
Ion Binding to Proteins
145 -0.333 -0.333
^ -0.333
-0.333 ,0
equal '