Foundations of Structural Biology
This Page Intentionally Left Blank
F
oundations of Structural Biology LEONARD J. BANASZAK Department of Biochemistry, Molecular Biology and Biophysics University of Minnesota Minneapolis, Minnesota
San Diego Sydney
San Francisco
Tokyo
New York
Boston
London
∞ This book is printed on acid-free paper. 嘷 Copyright 䉷 2000 by ACADEMIC PRESS All Rights Reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Requests for permission to make copies of any part of the work should be mailed to: Permissions Department, Harcourt Inc., 6277 Sea Harbor Drive, Orlando, Florida, 32887-6777.
Academic Press a division of Harcourt Brace & Company 525 B Street, Suite 1900, San Diego, California 92101-4495, USA http://www.apnet.com
Academic Press 24-28 Oval Road, London NW1 7DX, UK http://www.hbuk.co.uk/ap/ Library of Congress Catalog Card Number: 99-63001 International Standard Book Number: 0-12-07700-2 PRINTED IN THE UNITED STATES OF AMERICA 99 00 01 02 03 04 EB 9 8 7 6
5
4
3
2
1
Contents Preface
xi
CHAPTER 1
Introduction to Protein Structure PROBLEMS REFERENCES
1
5 5
CHAPTER 2
Solid-State and Solution Methods for Determining Biological Macromolecular Structure 6 DIFFRACTION METHODS 6 X-RAY CRYSTALLOGRAPHY 7 CRYSTALLIZATION 8 X-RAY DATA COLLECTION 9 RESOLUTION 11 STRUCTURE FACTORS, PHASES, AND REFINEMENT MATHEMATICAL RELATIONSHIPS 13 Obtaining the Amplitude 13 Calculating the Electron Density 13 The Structure Factor 13 Multiple Heavy Atom Derivatives 14 DETERMINING THE POSITIONS OF HEAVY ATOMS Calculating Patterson Maps 14
12
14
14 MOLECULAR REPLACEMENT: CALCULATING PHASES USING A HOMOLOGOUS STRUCTURE 17 TEMPERATURE FACTORS 18 REFINEMENT 19 ELECTRON DENSITY MAPS AND DIFFERENCE ELECTRON DENSITY MAPS 19 SUMMARY: DETERMINATION OF THE CRYSTAL STRUCTURE OF PROTEINS 21 Locating Heavy Atom Sites from Patterson Maps
v
vi
Contents
NUCLEAR MAGNETIC RESONANCE METHODS FOR DETERMINING SOLUTION STRUCTURE 21 PROBLEMS 24 REFERENCES 24
CHAPTER 3
Crystallographic Coordinates and Stereodrawings INTRODUCTION 25 HEADER RECORDS OF PDB FILES 26 COORDINATES 29 ATOM RECORDS IN A PDB FILE 30 ATOM LABELS 30 STEREODRAWINGS 32 STEREOGLASSES 34 DIVERGENT EYES 36 CROSSED EYES 37 THE AMINO ACIDS 37 VIEWING PROTEIN MODELS WITH A COMPUTER THE FOUR-LETTER IDENTIFICATION CODE 39 DOWNLOADING COORDINATES WITH FTP 40 WORLD WIDE WEB PAGE FOR PDB 40 DISPLAYING PDB COORDINATES 41 PREKIN AND MAGE 41 Text, Captions, and Colors 43 Views 43 Adding Lines and/or Labels 44 Local Rotations 44 Making Measurements with the MAGE Display Hardcopy
SUMMARY PROBLEMS
25
39
45
45 46 46
Learning about PDB Files Practicing Stereovision Using Computer Graphics
46 46 47
REFERENCES 47 APPENDIX 1: KINEMAGE COLOR PALETTE (DCR) 47 APPENDIX 2: MAGE KEYWORDS AND PARAMETERS (DCR)
48
CHAPTER 4
Properties of Biomacromolecules in the Crystalline State INTRODUCTION 50 PROTEIN CRYSTALS 50 PHYSICAL PROPERTIES 51 CHEMISTRY OF CRYSTALLINE PROTEINS CHEMICAL REACTIVITY 53 ENZYMATIC AND BIOLOGICAL ACTIVITY
52 54
50
vii
Contents
CRYSTAL VERSUS SOLUTION NMR STUDIES 55 CRYSTALLOGRAPHIC TEMPERATURE FACTORS 56 STRUCTURAL HETEROGENEITY IN PROTEIN CRYSTALS SUMMARY 59 PROBLEMS 59 Practicing Stereovision 59 Learning about PDB Files 59 Using Computer Graphics 60 REFERENCES 60
58
CHAPTER 5
Quaternary Structure of Proteins
61
INTRODUCTION 61 ASSOCIATION OF PROTEIN SUBUNITS 61 HELICAL OR CONTINUOUS PROTEIN POLYMERS 62 THE QUATERNARY STRUCTURE OF CLOSED AGGREGATES: OLIGOMERIC ENZYMES 64 BIOLOGICAL IMPLICATIONS OF QUATERNARY STRUCTURE SURFACE ACCESSIBILITY 69 GENERATING COORDINATES FOR OTHER SUBUNITS 71 SUMMARY 72 PROBLEMS 72 Practicing Stereovision 72 Using Computer Graphics 73 REFERENCES 73
CHAPTER 6
Secondary Structure of Proteins
75
INTRODUCTION 75 THE CHEMICAL NATURE OF A POLYPEPTIDE CHAIN DEFINITIONS OF SECONDARY STRUCTURE 78  STRUCTURE 81 TURNS 82 THE COLLAGEN TRIPLE HELIX 82 PREDICTION OF SECONDARY STRUCTURE 83 SUMMARY 83 PROBLEMS 84 Practicing Stereovision 84 Using Computer Graphics 84 REFERENCES 84
CHAPTER 7
Domains and Supersecondary Structure INTRODUCTION DOMAINS 87
86
86
75
68
viii
Contents
Supersecondary Structure
SUMMARY PROBLEMS
Practicing Stereovision Using Computer Graphics
REFERENCES
95
97 97 97 98
98
CHAPTER 8
Conformational States in Crystal and Nuclear Magnetic 99 Resonance Structures INTRODUCTION 99 COMPARISON OF TWO CONFORMATIONAL STATES 100 THE OXYGENATION OF HEMOGLOBIN: TWO CRYSTAL CONFORMATIONS 102 CONFORMATIONAL STATES IN OTHER CRYSTALLOGRAPHIC ANALYSES 109 SUMMARY 110 PROBLEMS 111 Practicing Stereovision 111 Using Computer Graphics 111 REFERENCES 111 APPENDIX 112
CHAPTER 9
Hydrogen Bonds and Water Molecules in Crystalline Proteins 114 INTRODUCTION 114 HYDROGEN-BONDING POSITIONS IN PROTEINS 115 NEUTRON DIFFRACTION 117 WATER MOLECULES OBSERVED IN CRYSTALLINE PROTEINS THE DISTRIBUTION OF PROTEIN-BOUND WATER 120 WATER NETWORKS IN CRYSTALLINE PROTEINS 123 SUMMARY 124 PROBLEMS 124 Practicing Stereovision 124 Using Computer Graphics 124 REFERENCES 125
119
CHAPTER 10
Protein and Nucleic Acid Complexes
126 INTRODUCTION 126 STRUCTURAL DATA DESCRIBING DNA 127 INTERACTION BETWEEN DNA AND SITE-SPECIFIC PROTEINS DNA-BINDING MOTIFS 130 HELIX–TURN–HELIX 131 PHAGE 434 REPRESSOR–DNA COMPLEX 132
130
ix
Contents
A LEUCINE ZIPPER SUMMARY 134 PROBLEMS 135
132
Practicing Stereovision Using Computer Graphics
REFERENCES
135 135
135
CHAPTER 11
Metal Ions Bound to Proteins
137 INTRODUCTION 137 COORDINATION OF METALS IN PROTEINS 138 FUNCTIONAL REASONS FOR METAL ION BINDING 139 A METAL ION PROVIDING A CONFORMATIONAL FUNCTION 140 METALLOPROTEINS FOR TRANSPORT AND STORAGE 140 METALLOPROTEINS AS REDOX INTERMEDIATES 141 METALLOPROTEINS THAT BIND DIOXYGEN 144 METALLOPROTEINS THAT SERVE A CATALYTIC FUNCTION 144 SUMMARY 146 PROBLEMS 147 Practicing Stereovision 147 Using Computer Graphics 147 REFERENCES 147
CHAPTER 12
Lipid–Protein Interactions
148
INTRODUCTION 148 ELECTRON MICROSCOPY, ELECTRON DIFFRACTION, AND MEMBRANE PROTEIN STRUCTURE 149 MEMBRANE PROTEIN STRUCTURE BY ELECTRON MICROSCOPY AND X-RAY METHODS 150 LIPID-METABOLIZING ENZYMES 152 LIPID TRANSPORT AND STORAGE PROTEINS 154 SUMMARY 156 PROBLEMS 156 Practicing Stereovision 156 Using Computer Graphics 156 REFERENCES 157
APPENDIX 1
Extra Reading in Structural Biology
158
APPENDIX 2
Macromolecular Structure Information Resources Index
163
160
This Page Intentionally Left Blank
Preface he revolution that has taken place in the biological and medical sciences during the last half of the twentieth century surpasses developments in other areas of the physical sciences by a significant margin. A notable part of this quantum leap is due to the advances in structural biology. This field includes methods for the accumulation of data that describe the conformations of biological macromolecules at close to atomic resolution. Furthermore, in the last decade, the rate at which it has been possible to determine three-dimensional structures has increased significantly. As a result, a serious gap is beginning to form between the hardcore data from X-ray diffraction and NMR studies and the biological scientist’s ability to recognize and use the conformational information. To link structural and biological properties, the scientist must be able to visualize the conformations and understand the fidelity and flexibility of the macromolecule. However, the structure and chemistry of these biological molecules are complex. There are only a few ways to make visualization feasible. The biological scientist must be able to view the molecule in three dimensions. This can be done with stereoviewing or by viewing the molecule in motion on the console of a computer. In the author’s opinion, the optimal procedure combines stereoviewing with the motional capability of computer graphics. The text that follows focuses on encouraging students of biology, chemistry, or physics to make use of the macromolecular structural database using visualization tools commonly available. These tools involve the eyes, atomic coordinates, and an edited drawing included either in the text or on the computer screen.
T
ACKNOWLEDGMENTS My thanks to the many wonderful students who have used and criticized this material in the past. For you, it will be an especially exciting new millennium in which using three-dimensional computer graphics becomes an important part of your repertoire of biological research tools. Most of all, I gratefully acknowledge the hidden contributions of my wife, Joyce. A journalist and writer, she constantly reminds me of the value of the written word and how to produce it correctly.
xi
C H A P T E R
1 Introduction to Protein Structure
n the 1950s, the biochemical world had essentially no information about the molecular structure of globular proteins and enzymes. Even protein characterization procedures, which today can be done in a matter of minutes, were then very complex. For example, a now simple task such as determining the molecular weight of a protein was a major undertaking. Most often it necessitated the use of an analytical ultracentrifuge and other physical chemical measurements. Months of work could be involved. In the same way, very little was known about amino acid sequences, and for that matter, the number of proteins purified to homogeneity was limited. Methods for determining amino acid sequences were crude until the first amino acid analyzers became available, and even then there was great difficulty in obtaining chemical sequences; years of work were required. On the structural side, there was essentially no information on protein conformation. It is fortunate for the biological world that this all changed dramatically and reasonably rapidly. Back in the mid-1930s, under the wing of the physicist Sir Lawrence Bragg, Jr., Max Perutz had started his studies on the structure of hemoglobin (Hb; Perutz et al., 1960). But to calculate a crystal structure it is necessary to have information that cannot be obtained directly from the intensities of the scattered X rays. In addition to the measured intensity, each X-ray reflection has a so-called phase angle that is of critical importance to solving the crystal structure. Factors relating to the measurement and interpretation of X-ray diffraction data from protein crystals are described in Chapter 2. The inability to measure the phases directly is called the phase problem. Many years of work were required to devise a way to solve the phase problem. The very patient and persevering Perutz tried to solve this problem in several different ways, many of which were too cumbersome to be useful. Eventually structural biology was born with Perutz’s discovery of a procedure for phasing: the method of multiple isomorphous replacement—the MIR method, for short. In the MIR method, phase information is obtained by observing the effect of adding heavy atoms to well-defined positions in the protein crystal. The word isomorphous refers to the property that the native crystal and the crystals containing the heavy atoms have the same unit cell dimensions. Stated more simply, the native and heavy atom derivative crystals should differ only by the presence of bound heavy atoms. The size of the repeating unit in the crystal cannot vary, nor can the orientation of the protein. Perutz worked side by side with John Kendrew, who had taken up the study of another heme-containing protein called myoglobin (Mb). In the late 1950s, both Perutz and Kendrew were successful! The phase problem was solved for the first time by using isomorphous heavy atom derivatives. With this new method and X-ray diffraction data,
I
1
2
Introduction to Protein Structure
Fig. 1.1 Low-resolution model of hemoglobin. The model is a balsa wood representation of the electron density of Hb, an ␣22 tetramer. It is historically of major significance, because it was the first three-dimensional structure of a protein to be determined.
the crystal structures of both hemoglobin and myoglobin were completed to a resolution ˚ , respectively (the term resolution is defined in Chapter 2). The structure of 5.5 and 2.8 A of hemoglobin provided a beautiful picture of the molecular organization of an oligomeric protein—a subject that is discussed in more detail in Chapter 8, on the quaternary structure of proteins. The hemoglobin model was first made from balsa wood and a picture of the wood model used for study is shown in Fig. 1.1. Balsa wood models were to remain on the scene for at least another decade, much to the chagrin of the unacquainted biochemist. Max Perutz said about his famous molecular model: ‘‘Could the search for the ultimate truth really have revealed such a hideous and visceral looking object?’’ (Perutz, 1963). The more detailed molecular model of myoglobin provided immediate insight into many aspects of protein folding and stabilization. John Kendrew, in describing his Mb structure, said ‘‘The molecule is compact . . . no channels through the molecule. . . . Almost all the polar groups are on the surface. . . . The interior of the molecule is made up of non-polar residues, almost everywhere close-packed. . . . Bound water molecules are attached to all polar groups.’’ Historically, a fascinating part of the early myoglobin studies was the methods used to interpret the electron density map. To build a molecular model, electron density
1. Introduction to Protein Structure
3 was plotted in a three-dimensional grid (forest) of steel rods. Brass stick models were then built into this electron density ‘‘map.’’ The first Kendrew–Watson models were on a scale of 5 cm per angstrom! At this scale, the relatively small protein molecule, myoglobin, required a reasonably large room for the brass crystallographic model. The principles laid down in these early studies of protein structure by Perutz and Kendrew remain the foundations of nearly all we know about protein structure. Perutz and Kendrew received Nobel Prizes in 1962 for this pioneering effort. These early structures and those to follow were received with great interest but always with more than a little skepticism. Doubts about the usefulness stemmed from the fact that these conformations were obtained from the protein in the crystalline state. Why should they be the same in solution? In a living cell? In the Perutz–Kendrew era, these appeared to be legitimate questions. In fact, in the late 1960s, some investigators tried to prove that myoglobin must be different in solution than Kendrew and coworkers described in the crystalline state. The solution-versus-crystal structure question is dealt with further in Chapter 4. Suffice it to say that to date there is practically no evidence to indicate that the two physical states produce any major differences in protein conformation. After stating this as positively as possible, it must be qualified. It should be an accepted principle that the crystal and solution structures of proteins and, for that matter, other biological macromolecules are essentially the same. Are they the same in every minute detail? Of course not! Are protein structures static, as is implied by the stereoimages so often viewed in the literature? Of course not! There are many reasonably well-documented cases in which small conformational differences between a protein in solution and the same protein packed in a crystal have been experimentally demonstrated. Note, however, the key adjective, ‘‘small.’’ Up to now, no evidence exists that any protein maintains a completely different fold in aqueous solution versus aqueous crystals. A detailed comparison between crystal and solution structures became more feasible after the first nuclear magnetic resonance (NMR) structure of a protein was determined. This historical moment occurred in the early 1980s, when Williamson et al. (1985) determined the solution structure of a protease inhibitor. A schematic representation of the first solution structure is shown in Fig. 1.2. The protein had 57 amino acids and is held in a fairly rigid conformation by three disulfide bonds. It contained a segment of ␣ helix and antiparallel  structure. Now, approximately 15 years later, several hundred solution structures have been determined. Although limited somewhat by the size of the protein, NMR methods are being used with increasing success not only to determine solution structure but also to offer insight into the dynamic properties of proteins. The chapters that follow summarize some of the principles of structure that can be derived from study of the protein crystallographic and NMR literature. It is meant for beginning graduate students—students in biochemistry, molecular and cell biology, and chemistry. Unfortunately, proteins are complex molecules and it is this very complexity that imparts their biological function. The link between structure and function is often fairly obvious. But when one studies the structure of a protein molecule, it is easy to be overwhelmed by the complexity. To deal with this, crystallographers and NMR scientists have had to publish their results in the form of stereoimages of the molecules. Schematic drawings or colored descriptions are useful, no doubt; but they rarely can be used to illustrate the threedimensional nature of the atomic constellations that are often necessary to produce the desired biochemical effect. The material that follows therefore contains many stereodrawings. Regrettably, the use of these diagrams requires some practice on the part of the reader, who must learn to use this information with the same facility used to read graphs, charts, sequences, or gels. A chapter is included to help the reader do this, but it cannot be emphasized enough that the real contents of this study guide are the stereo-
4
Introduction to Protein Structure
Fig. 1.2 The first solution structure of a protein. The ribbon diagram represents the backbone structure of protease inhibitor IIA from bull seminal plasma (Williamson et al., 1985). The structure was determined by 1H nuclear magnetic resonance and distance geometry methods. The schematic was drawn from coordinates obtained from the Protein Data Base (PDB), with accession number 2bus. As is described in Chapter 2, NMR methods result in an ensemble of structures. The schematic represents the average of the ensemble that has been subsequently energy minimized.
images that accompany the text. The drawings are the data! The data are the vital issue. It is easy to suggest what the model may mean. It is always up to the student to look at the data and decide whether they accept the interpretation of the structure in terms of its all-important biological function. In searching for the basis of biological function from detailed molecular models, there is one additional warning that may be appropriate. Through the years, certain proteins or biological phenomena have, more than others, lured the interest of the uninitiated. The molecular structure of a repressor protein and its cognizant DNA fragment can appear far more enticing than the structure of a simple enzyme such as lyzozyme, a structure completed many years ago. It isn’t! Proteins operate by binding to other molecules. The binding is vital to its function. The principles involved in molecular recognition are always the same. The principles that govern the final folding of a protein are always the same. The factors that lead to conformational changes, to chemical catalysis, to inhibition, to immunological phenomena, to transport, and to other properties are always based on the same thermodynamic principles. The hydrophobic effect and the same rules of covalent and noncovalent bonding govern all of these factors. The ‘‘all proteins are created equal’’ principle should not be forgotten when looking at the examples that follow. Finally, the reader should always remember that the material that follows focuses mainly on crystallographic and multidimensional NMR studies. The accompanying text largely ignores the host of experiments that are an equally important part of learning about the structure and function of proteins. On the biophysical side, there are numerous, vital protein studies based on the use of spin labels and hydrodynamic and thermodynamic measurements. This text does not cover the elegant studies using molecular dynamics and statistical mechanics. On the biological side are thousands of important but ignored topics: amino acid sequence determination, enzyme catalysis and mechanism, posttranslational modification, allosterism and cooperativity, folding, chemical modification, etc. And last of all, the development of cloning techniques and sitedirected mutagenesis has added a powerful tool to our study of proteins. The strength of these methods is greatest when used with accompanying structural data obtained from crystallographic or NMR analyses.
5
1. Introduction to Protein Structure
The material that follows is aimed at guiding the reader toward certain principles, empirical though they may be, of protein structure derived from studies of biomolecular structure. Other biophysical studies of structure and function are largely ignored, but only for the sake of brevity.
PROBLEMS 1. Start developing your computer skills. Using the World Wide Web go to the Protein Data Base (PDB) and determine how many insulin structures are known. How many by X-ray crystallography? How many by NMR? How many mutant forms are known? World Wide Web:
http://www.rcsb.org/pdb
2. Write out the complete amino acid sequence for human insulin, using the single-letter code. Learn the single-letter code if you don’t already know it. Now, using one of the amino acid sequence databases, complete the table of sequence alignments by adding two other mammalian insulins. What is the significance of such a homology table? Compare the insulin amino acid sequence from humans with that of the muscovy duck. Try getting this data from the Swiss Bioinformatics Center. World Wide Web:
http://expasy.hcuge.ch
REFERENCES Perutz, M. F. (1963). ‘‘X-Ray Analysis of Haemoglobin.’’ Les Prix Nobel, Stockholm. Perutz, M. F, Rossmann, M. G., Cullis, A. F., Muirhead, H., Will, G., and North, A. C. T. (1960). ˚ resolution, obStructure of hemoglobin: A three dimensional Fourier synthesis at 5.5 A tained by X-ray analysis. Nature (London) 185, 416–422. Williamson, M., Havel, T., and Wuthrich, K. (1985). Solution conformation of proteinase inhibitor IIA from bull seminal plasma by 1H nuclear magnetic resonance and distance geometry. J. Mol. Biol. 182, 295–315.
C H A P T E R
2 Solid-State and Solution Methods for Determining Biological Macromolecular Structure oth crystallographic and nuclear magnetic resonance (NMR) methods for determining macromolecular structure are complicated. Each involves multiple steps in both data collection and analyses. However, aside from the stage involving structure refinement, they have practically nothing in common. With X-ray methods, it is possible to observe directly the position of every ordered protein atom. There are no size limitations on the method and everything from small proteins to whole viruses has been studied. NMR methods are limited somewhat by the size of the protein. However, the researcher using NMR has the important advantage of being able to observe some of the motional properties of the macromolecule in solution.
B DIFFRACTION METHODS
Crystallography involves a form of diffraction that can be observed with several different forms of radiation. Our major concern in this section is that the reader become acquainted with the principles and results of diffraction methods. Such methods are generally applicable to studies using either X rays, electrons, or neutrons. Under each of these categories, the topics could be subdivided into diffraction from crystals, fibers, or other two-dimensional arrays, and solution scattering. The branch of diffraction involving X rays and crystals is the most important because such studies most often lead to the three-dimensional (3-D) structure of proteins. Before going on to summarize X-ray crystallography, there are a few salient facts about the lesser used methods. Neutron crystallography has been used to identify the precise location of hydrogen atoms in crystalline proteins. Because of accompanying technical difficulties and the fact that the investigator must have access to a nuclear reactor or a spallation source, its application is limited. To date, electron diffraction has limited applications to crystals of macromolecules except through electron imaging of crystalline or paracrystalline materials. With negatively stained electron micrographs and the resulting two-dimensional (2-D) images, low-resolution structures have been obtained from very thin crystals. In the late 1980s, the fidelity of imaged macromolecules has been improved dramatically through the use of cryoelectron microscopy. In these experiments, the biological specimen is
6
2. Solid-State and Solution Methods for Determining Biological Macromolecular Structure
7
rapidly cooled to liquid nitrogen temperatures. Such rapid cooling leads to a form of vitreous ice that itself does not alter the specimen. The low temperatures also reduce electron damage to the sample being viewed. Because each electron micrograph is two-dimensional, multiple images are required to arrive at a 3-D model. Principles of diffraction analyses nearly identical to those employed in crystallography are used to combine a series of 2-D images into a 3-D model. The digital processing of electron micrographs has many advantages, including the opportunity to correct for astigmatism and focus. For anyone interested in further reading on the processing of electron micrographs, suggested sources are two review articles focusing on three-dimensional reconstruction and image processing (Amos et al., 1982; Aebi et al., 1984). As already mentioned, crystals are not the only solid-state form of proteins and nucleic acids that may be studied by diffraction methods. Paracrystalline materials such as fibers may also be used to obtain structural information. For a variety of reasons, such analyses are even more difficult than single-crystal studies. However, they have been used successfully to study fibrous proteins, rodlike viruses, and DNA. In the same year that Perutz and Kendrew received their Nobel Prize for studies of hemoglobin (Hb) and myoglobin (Mb), Crick and Watson also received this sought-after award for discovering the molecular structure of duplex DNA. They did this by analyzing X-ray diffraction patterns from fibers of DNA. Fibers of the sort studied by Crick and Watson have an ordered structure in two dimensions. Ordered is the key word; it increases the amount of structural information that can be extracted by any diffraction experiment. Last of all, diffraction of both X rays and neutrons by protein molecules in solution can also be used to extract structural information. The methods are called small angle X-ray scattering (SAXS) and small angle neutron scattering (SANS). In solution studies, the protein molecule is tumbling rapidly on the time scale of the measurements. The structural data are therefore spherically averaged. If the molecule is approximately spherical, the size can be obtained quite accurately. If it is not spherical, it is sometimes possible to obtain information about the shape, although often with great difficulty. Last, if the macromolecule contains more than one component, e.g., protein and nucleic acid, some information can be obtained about the relative radial distribution of the two components. This is best done with SANS and the results are obtainable only at low resolution. Such studies involve the judicious use of H2O and D2O because hydrogen and deuterium scatter neutrons quite differently. The textbook Biophysical Chemistry (Cantor and Schimmel, 1980) can be used as a starting point for further reading on SAXS and SANS.
X-RAY CRYSTALLOGRAPHY Perutz and Kendrew started out with some crystals of Hb and Mb and ended up with a set of coordinates for all of the nonhydrogen atoms in these two proteins. The same experiments have now been repeated for many proteins. The coordinates can then be used to make a variety of stereodrawings, representations, and even space-filling pictures of the macromolecule for further study. Students who study protein structures should at least be aware of some of the important factors in crystallographic analyses. To study the details of this topic is itself a major undertaking. The definitions that follow have been carefully selected because they are relevant to understanding the final results—the protein models. For those interested in learning more about these subjects, there are numerous books written on X-ray crystallography of small molecules and a few on protein crystallography. For the former, a good general introduction can be found in the book by Stout and Jensen (1989). For protein crystallography, despite its age, the book by Blundell and Johnson (1976) is excellent. It is also good to keep in mind that protein crystallography is still a
8
Crystallization
TABLE 2.1 Preparation of Protein Crystals 1. Precipitants a. Salts: Ammonium sulfate, K(Na) phosphates, citrate, NaCl, Na2SO4 b. Polymers: Polyethylene glycol c. Organic: Methanol, 2-methyl-2,4-pentane diol 2. Important variables a. pH b. Protein ligands c. Metal ions, detergents 3. Attaining slow supersaturation (2 days to 3 weeks) a. Bulk crystallization b. Hanging drop method-vapor diffusion c. Free interface diffusion d. Microdialysis e. Batch in vials
dynamic area of biophysics. New theoretical and experimental approaches have appeared every year since the early 1960s, when it began in earnest. Two more recent texts on X-ray crystallography as applied to proteins and protein complexes are those by Drenth (1994) and McRae (1993).
CRYSTALLIZATION Oftentimes the most difficult part of protein crystallography is the first step—preparing suitable crystals. Some of the major factors that must be considered in the preparation of protein crystals are listed in Table 2.1. An excellent book by McPherson (1982) describes in detail all of these and many more principles to be considered in the preparation of crystals. When all is said and done, finding crystallization conditions is a trialand-error problem with many variables. To reduce the number of trials that are necessary to find a workable procedure, Carter and Carter (1979) have described a factorial sampling method. By using microtechniques and by judicious trial-and-error experiments, crystals can often be obtained from a few to a few hundred milligrams of pure starting material. By far the most popular procedure for growing crystals is the hanging drop method. This involves the use of a simple apparatus for bringing a protein solution through a saturation point. A schematic diagram of how the trials are set up is shown in Fig. 2.1. The droplet of about 10 l is placed on a silanized glass coverslip. The dustfree coverslip causes the droplet (which contains protein in precipitant) to round up. Using vacuum grease or or some other sealant, the coverslip holding the protein droplet is sealed over a larger volume of precipitant, the concentration of which is higher than that of the protein droplet. As water leaves the protein-containing droplet, both the protein and precipitant concentrations increase slowly in the droplet. Crystals begin to grow. If the changes occur too rapidly, a shower of microcrystals may form. By altering the difference between the precipitant concentration in the droplet and well, the rate of growth can be slowed and often fewer but larger crystals obtained. A useful rule of thumb is to start with the precipitant in the protein droplet at a level two-thirds of the way to the crystallization concentration. Once grown, protein crystals can be distinguished from organic small molecule or inorganic crystals by the simple softness test: to confirm that a protein is crystallized, try pressing on it with the tip of a needle—under a microscope, of course. If it is protein
2. Solid-State and Solution Methods for Determining Biological Macromolecular Structure
9
Fig. 2.1 Diagram of hanging drop method trial setup.
crystal, the smallest mechanical pressure will shatter it. Small molecule crystals cannot be easily crushed. A commercially available dye solution called Izit (Hampton Research, Riverside, CA 92507) has also been used to distinguish between protein and buffer crystals. Most protein crystals will also be sensitive to changes in pH and ionic strength, although they dissolve only slowly. They cannot be allowed to dry. Protein crystals must be kept in equilibrium with their mother liquor or at low temperatures, where the interstices of the crystal lattice either contain aqueous buffer or so-called vitreous ice.
X-RAY DATA COLLECTION With crystals in hand, the next step is to collect the data. This process is summarized in Fig. 2.2. Note that two different diffraction patterns are shown. First, look at the basic elements in the experiment. One needs a source of X rays (not specifically shown in Fig. 2.2). It is then necessary to select a single wavelength. This is done with either a crystal monochromator or some sort of filter. The X-ray data collection system also needs a collimation system so that the X-ray beams impinging on the crystal are parallel. Combining reconstructive interference and the fact that the crystal itself is a threedimensional lattice results in scattered X rays occurring at various special angles. Such scattering is represented by the arrows in Fig. 2.2, and crystallographers refer to them as reflections. The diffraction maxima are represented by the black dots contained within either the dotted or solid circles. Larger and smaller black dots or reflections are meant to indicate, respectively, more or less scattered intensity. The large circles represent the fourth component—a detection system. This can be either a piece of film, a scintillation detector, or some sort of electronic area detector. Reusable ‘‘film’’ is yet another form of detector and is the standard in the 1990s. These so-called image plates have a large acceptance angle, meaning that many reflections can be recorded simultaneously. Figure 2.2 also illustrates what would happen to the diffraction pattern if the crystal were rotated slightly, going from diffraction pattern 1 to 2. Such small movements result in new reflections of different intensities occurring at different positions. The key factor in this movement is the device for orienting the crystal. This is the final major component of our X-ray experiment. The orientation device can be quite different
10
X-Ray Data Collection
Fig. 2.2 X-ray diffraction. Monochromatic X rays from a parallel source impinge on a crystal and are diffracted into a set of unique spots or reflections. The intensity and location of the reflections are recorded. Rotation of the crystal and another round of observation leads to a different pattern of reflections. The intensity and location of each of the spots change. The process is continued until all possible reflections have been measured. The solid versus dotted circles are meant to indicate how diffraction data to higher resolution are obtained.
for different instruments, and is called an X-ray camera or a Goniostat. The device must be capable of describing any movement of the crystal with a precision on the order of less than 0.01⬚. One such orientation device is a precession camera. It moves the crystal with a precession motion producing X-ray reflections on easily definable rows. Such precession photographs are now frequently shown in biochemistry textbooks. Measuring the intensities of a three-dimensional lattice of X-ray reflections would be meaningless without a bookkeeping system. Therefore during data collection, the investigator must also set up some system for indexing the data. Such a bookkeeping system is shown in Fig. 2.3, where each reflection is characterized by three integers, h, k, and l, called the Miller indices. As the crystal is moved from one setting to the next, new reflections occur. How is the crystallographer able to identify each reflection or assign correctly the set of Miller indices? Three pieces of information are needed: (1) the exact orientation of the crystal at the start of the experiment (each crystal has a set of three axes defining the internal organization; the orientation of the incident beam relative to the crystal axes must also be known); (2) an exact log of any motion the crystal has undergone during the experiment; and (3) the angle 2, as shown in Fig. 2.3. The initial orientation can be derived from a study of the positions of some preliminary X-ray reflections. Keeping a log of the crystal movements can be done by hand, but most often is tracked by a computer linked to the crystal-orienting device. The angle 2 is obtained by studying the position of the X-ray reflection on the detector.
2. Solid-State and Solution Methods for Determining Biological Macromolecular Structure
11
Fig. 2.3 Indexing X-ray reflections to obtain h, k, and l. Multiple X-ray reflections on the face of an area detector are shown. The position of each X-ray reflection relative to the incident beam must be determined, as well as its location relative to the crystal orientation. The Bragg angle 2 is shown and the underlying grid has units of the Miller indices h, k, and l. Remember, the drawing is meant to simulate a diffraction experiment with the crystal in front of the page and the incident beam perpendicular to the plane of the paper.
The X-ray data from the native or parent crystal are recorded as described in Figs. 2.2 and 2.3; frequently this process is repeated for heavy atom-containing crystals or other derivatives. To obtain the X-ray diffraction data from Hb or Mb in the early days (e.g., for Perutz in the 1950s) required years of labor and months of X-ray time. Today a complete data set for crystalline hemoglobin could be obtained in 1 week or less. Better crystals? No! Vastly improved instrumentation. Furthermore, new synchroton technology make it possible to record these patterns in minutes to milliseconds. Now let us define some of the important terms needed while studying protein crystal structures. They are worth committing to memory because they will crop up repeatedly in the discussions of protein structure.
RESOLUTION If the radius of the circle describing the data collection device in Fig. 2.2 is increased, more X-ray data will be collected on the newly formed edges. Alternatively, if the detector is brought closer to the crystal or swung to a higher angle, reflections occurring at higher angles will be recorded. The dotted circles in Fig. 2.2 indicate such an effect. Note that some of the reflections fall outside the solid but within the dotted circle of the detector. Such reflections occur at higher angles. The term resolution is related to the reciprocal of the sine of the largest angle any reflection makes with the main X-ray beam. It can be calculated in angstroms by the variable d in the Bragg equation: ⫽ 2dhklsin hkl
(2.1)
12
Structure Factors, Phases, and Refinement
where is the X-ray wavelength and hkl is described in Fig. 2.3. The variable dhkl has additional meaning in terms of the crystal lattice, but that is unimportant to the concept of resolution. ˚ resolution structure of hemoglobin was calculated, large globs of When the 6-A electron density containing many atoms were visible. Only the subunits were resolvable, with perhaps hints of where helices were located. Later, when more higher-angle reflec˚ , it was possible to tions were measured and the resolution improved from 6 to 2.8 A trace the course of the hemoglobin polypeptide chain. To the crystallographer the resolution concept goes as follows: (1) measure reflections at higher angles or greater hkl; (2) the maximum Bragg angle describes the best resolution, or lowest dhkl; and (3) with more reflections and better resolution, there will be increased detail in the shape of the electron density function (2hkl, the Bragg angle, is shown in Fig. 2.3). The bad news is that the number of reflections, which must be measured to obtain a higher resolution map, goes up with the cube of the ratio of the resolution. Going from ˚ resolution means measuring (6/2.8)3 more reflections. The best protein 6- to 2.8-A structures have been obtained with relatively small molecules to a resolution of about ˚ . The resulting electron density maps show nearly resolved atoms. A favorite 1.5 A example in the literature is to show the electron density of a phenylalanine side chain. A visible hole in the center of the electron density resulting from a phenyl ring can be ˚ or better. However, at 3-A ˚ resolution, a phenylseen in the map at resolutions of 1.5 A alanine ring will look like a blob indistinguishable from isoleucine, cysteine, or histidine.
STRUCTURE FACTORS, PHASES, AND REFINEMENT Many of the protein structures that have been determined have also been refined. The definitions for structure factors and phases are key to understanding refinement. First, the reader should think about refinement in general terms. Imagine that we measure a diffraction pattern from a lattice of objects called X. The diffraction pattern of X consists of scattered waves or reflections having both amplitude and phase. Imagine that after measuring this pattern, we can mathematically transform all of these waves back to the image of X. In X-ray crystallographic terms, each of the scattered waves is called a structure factor, Fhkl. The structure factors are complex variables having an amplitude and a phase. Amplitudes (see below) are obtained directly from the intensities of the X-ray reflections or in terms of Figs. 2.2 and 2.3, the blackness of a spot on the X-ray detector. Obtaining the phases is much more difficult. Several methods are available for obtaining trial phases, and they are described in general terms below. Perutz and colleagues first developed the method of multiple isomorphous heavy atom replacement. However, there is no widely accepted chemical procedure for adding small numbers of heavy atoms to crystalline proteins. The preparation of heavy atom derivatives remains a trial-and-error procedure. To work well, the crystalline protein must bind compounds containing elements with atomic numbers in the range of 70 or higher at a limited number of sites. Mercury compounds are often used and frequently the protein-binding site is near a reactive cysteine. Once phases are obtained along with the measured amplitudes, it is possible to calculate an electron density map. The map must then be interpreted in terms of a model. The resulting preliminary model means that every ordered atom forming the protein molecule(s) in the crystal has been located. Now, a very important principle: If it is known where all of the atoms are in the crystal lattice, structure factors may be calculated: Fhkl(calc). To be sure the model is optimized, refinement is then used to move atoms in the experimentally determined model around slightly until |Fhkl(obs)| is close to |Fhkl(calc)|. In other words, a number related to the observed intensity on the X-ray detector is nearly
2. Solid-State and Solution Methods for Determining Biological Macromolecular Structure
13
the same as the (Ihkl)1/2 calculated from the coordinates of the protein model that was built and refined. To be complete, the refinement also keeps track of bond distances, angles, planarity, etc., of the protein model, trying not only to make the observed structure factors agree with the calculated factors, but also to make the model geometry canonically correct. So far the process has been as follows: (1) measure X-ray data and calculate the amplitudes of the structure factors; (2) experimentally determine structure factor phases; (3) calculate an electron density map; (4) build a trial model; and (5) refine the trial model so that it best agrees with X-ray measurements and the known geometry of chemical bonding.
MATHEMATICAL RELATIONSHIPS The principles of crystallographic structural analyses have been laid down; all that remains is to look at a few of the more important variables and the mathematical relationships from which they are derived. Consider first three important equations for describing the basis of X-ray crystallography. In the preceding text, the term structure factor was used several times. It is the complex variable given the symbol Fhkl. Remember that it is a variable with both amplitude and phase (real and imaginary components, a vector).
Obtaining the Amplitude We obtain the amplitude of the structure factor by measuring the intensity of all the Bragg reflections. That is, |Fhkl(obs)| ⫽ [Ihkl(obs)]1/2
(2.2)
Calculating the Electron Density Assuming we know the phase angle for all reflections, hkl, we can calculate the electron density at any point x, y, z in the unit cell, using the relationship h k l xyz ⫽ ⌺ ⌺ ⌺ |Fhkl| e⫺ihkl e2i(hx ⫹ ky ⫹ lz) ⫺h ⫺k ⫺l
(2.3)
In Eqs. (2.1)–(2.3), h, k, and l are the Miller indices derived experimentally from the location of the reflection (Fig. 2.3). For each X-ray reflection, a structure factor exists— Fhkl. This complex variable needs both amplitude |Fhkl| and phase Øhkl to calculate the electron density xyz. The variables x, y, and z are fractions of the unit cell axial lengths, usually given the symbols a, b, and c. Ihkl is, once again, the measured intensity of an X-ray reflection.
The Structure Factor Finally, once we know the positions of all of the n atoms in the unit cell, we can calculate a value for the structure factor. This is the Fhkl(calc) already mentioned: n Fhkl(calc) ⫽ ⌺ fje⫺2i(hxj ⫹ kyj ⫹ lzj) j⫽1
(2.4)
All of the variables have already been defined except fj, which is the atomic scattering factor for the jth atom. The atomic scattering factor varies with the atomic number of
14
Determining the Positions of Heavy Atoms
the scattering atom and represents the diffraction power of an atom at different scattering angles. The normal carbon, nitrogen, oxygen, and sulfur components of a protein have reasonably similar values because they are close to each other in the periodic table. Being able to calculate Fhkl may appear useless, because if the structure is known, why calculate the diffraction pattern? However, if it is desirable to refine the structure (optimize the accuracy) it is possible to compare the calculated and observed structure factors. Using least-squares methods, the positional parameters of each atom (xj, yj, zj) can be varied until they best agree with |Fhkl(obs)|.
Multiple Heavy Atom Derivatives Now we have the background to see how the multiple isomorphic replacement (MIR) method can be used to obtain the phase angle hkl. Think about what would happen to a single structure factor when another atom is added to the unit cell. The method is shown schematically in Fig. 2.4. The effect is that the structure factor is changed: Fhkl(nat ⫹ ha) ⫽ Fhkl(nat) ⫹ fhkl(ha)
(2.5)
(where nat means native, and ha means heavy atom) or as shown in Fig. 2.4: F3 ⫽ f1 ⫹ F2
(2.6)
where the variables are complex! But using X-ray measurements from the two crystals, native and heavy atom soaked, only the amplitudes can be obtained: |F3| and |F2| in Fig. 2.4. Both the real and imaginary components of the heavy atom structure factor, f1, can be calculated if the location of the heavy atom in the unit cell has been determined. The lower half of Fig. 2.4 simply shows that the phase angle can be calculated by using what is called a Harker construction. Two circles drawn from each end of the vector f1, one with radius |F2|, the other with radius |F3|, intersect at two locations. The angle F2 makes with the ‘‘real’’ axis is the phase angle for that reflection. Note there are two such intersections and a twofold ambiguity in the correct phase angle exists, but this can be resolved with a second, new heavy atom derivative.
DETERMINING THE POSITIONS OF HEAVY ATOMS Patterson methods may be used to locate the coordinates of heavy atom (HA) sites: From the |Fhkl| values alone, a Patterson map can be calculated. To locate heavy atoms, one uses the differences between the amplitudes of the heavy atom-containing crystal (FHA) and the native protein (FP).
Calculating Patterson Maps The equations used to calculate Patterson maps are given below. Note the similarity of ⌬Pxyz to xyz [Eq. (2.3)]. But also note that no phase information is needed. ⌬Puvw or
h k l ⌬Pxyz ⫽ ⌺ ⌺ ⌺ |⌬Fhkl|2cos 2(hx ⫹ ky ⫹ lz) ⫺h⫺k ⫺l
(2.7)
where |⌬Fhkl|2 ⫽ (|F HA,hkl| ⫺ |FP, hkl|)2 and |FP, hkl| and |FHA, hkl| are the amplitudes from the native protein and the protein with heavy atoms, respectively.
Locating Heavy Atom Sites from Patterson Maps Locating the HA sites from a 3-D Patterson map requires an understanding of the symmetry in the unit cell and some trial-and-error work. This is only partly illustrated
2. Solid-State and Solution Methods for Determining Biological Macromolecular Structure
15
Fig. 2.4 Multiple isomorphous heavy atom replacement. After soaking a heavy atom into a crystalline protein, the two unit cells remain the same size. The heavy atom position in the unit cell can be determined by difference Patterson methods as described in Fig. 2.5. Knowing the coordinates of the heavy atom, the vector f1 can be calculated. A circle of radius |F2| is drawn from the end of f1. The amplitude or radius is obtained directly from the intensity of the specified reflection h,k,l from the native protein crystal. Another circle of radius |F3| represents the amplitudes of that reflection from the protein crystals containing heavy atoms. The two circles intersect at the two possible phase values for the native protein, as shown by the angles fhkl. To use this heavy atom method, this construction must be done for every measured X-ray reflection.
in Fig. 2.5. In this drawing, three mercury (Hg) atoms are bound to a protein crystal in one unit cell as shown by the heavy lines. If the diffraction data could be measured with no error, these three atoms would give the difference Patterson map shown in the bottom half of Fig. 2.5. The principle used to construct the hypothetical difference Patterson map shown in Fig. 2.5 is relatively simple. Take the positions of the heavy atoms and plot them in the protein unit cell. Draw a vector between all pairs of heavy atoms. Now, redraw an empty unit cell and move the origin of all of the vectors to the origin of the newly drawn unit cell. At the end of each vector, place a ‘‘Patterson atom.’’ If there were n atoms in the original unit cell, there are n ⫻ (n ⫺ 1) Patterson atoms or vectors in the new map. Remember, though, that for an unknown system, the positions of the heavy atoms are not known. This must be worked out from the Patterson map. In actual practice finding the coordinates for heavy atoms bound to a protein crystal is often difficult. This is because the measured amplitude differences can be quite small if the level of heavy atom substitution is not stoichiometric. Such being the case, measured
16
Determining the Positions of Heavy Atoms
Fig. 2.5 Difference Patterson or vector maps. Top: Four unit cells for a hypothetical protein crystal are shown. The black dots represent three mercury atoms bound to the protein. Bottom: A difference Patterson map between X-ray data from native and mercury-containing crystals. The vector between atoms 1 and 2 produces peaks on the Patterson map labeled 1,2 and 2,1 as shown. The other labeled peaks arise in an analogous fashion from atom pairs 1,3 and 2,3. If the positions of heavy atom-binding sites can be derived from these maps, native protein phases can be calculated as described in Fig. 2.4.
changes in the reflections can be near the experimental noise level. Lack of isomorphism (same cell dimensions and protein orientation) also leads to noise in the difference Patterson map. And, finally, if too many heavy atom-binding sites occur, the vector map becomes too complicated to solve by inspection. Computer methods are now available to randomly search and check peaks in a experimentally determined difference Patterson map. With the availability of tunable X rays at a synchrotron, a simpler and less errorprone procedure for obtaining phase information has been developed by Hendrickson and co-workers at Columbia University. Now widely used, it depends not on high atomic number elements but on elements that scatter X rays anomalously. That is, because of the properties of inner shell electrons, certain elements scatter X rays differently, depending on the direction. These differences are observable in reflections related by Miller indices h,k,l and ⫺h,⫺,k,⫺l. Without any anomalous scattering, |Fhkl| ⫽ |F⫺h⫺k⫺l|; hkl and ⫺h⫺k⫺l reflections are called Bijvoet pairs. With elements such as selenium, X-ray wavelengths can be found where relatively large changes can be observed between Bijvoet pairs. Through recombinant methods, it is now possible to assemble proteins with selenomethionine rather than its normal sulfur-containing form. It has been shown repeatedly that the conformation of selenomethionine-containing proteins is not changed by this relatively small substitution. The positions of the selenium atoms are then obtained by Patterson methods and phases may be obtained in a manner similar to the heavy atom method. To summarize, if a crystal is soaked in a solution containing a heavy metal compound and the heavy ion binds at a single site on the crystalline protein the effect can be seen in the amplitudes of the Bragg reflections of the diffraction pattern. With the appropriate wavelength, similar changes are obtainable from a crystalline protein containing an element that scatters anomalously. The soaked crystal is called a heavy atom derivative crystal. It is then necessary to measure all of the diffraction data again (we did it once for the native protein). The coordinates of the heavy atom(s) in the unit cell must then be obtained. Finding coordinates of a few heavy atoms in the protein unit cell
2. Solid-State and Solution Methods for Determining Biological Macromolecular Structure
17
Fig. 2.6 Molecular replacement for phase determination. Phases for a crystalline protein can sometimes be derived from another homologous protein. Top: Two crystals and sets of structure factors, one of unknown conformation and a homologous protein of known coordinates. As described in text, trial phases can be calculated if the orientation and position of the probe structure in the unknown unit cell can be determined. This is done in two steps. First, the correct orientation of the probe coordinates is found by a rotation function. Next, the correct translational position of the probe coordinates in the unknown unit cell is determined by a translation function.
can be done through the use of a function called a difference Patterson function. The structure factor for the heavy atom alone can be calculated using the relationship given in Eq. (2.4). By combining this information, using the Harker construction and the amplitudes of the scattering from native and native plus heavy atom-containing crystals, phases for the native protein reflections are calculable and the model fitting can begin.
MOLECULAR REPLACEMENT: CALCULATING PHASES USING A HOMOLOGOUS STRUCTURE The use of heavy atom derivatives is just one way to obtain a set of starting phases for the calculation of electron density maps. If coordinates of a homologous protein are known, it is possible to avoid the use of heavy atom derivatives. This involves some rather complex calculations but the principal steps are shown in Fig. 2.6. The success of the method appears to be related to the degree of conformational homology between the unknown protein and the known probe molecule. The computational steps in crystal structure determination by molecular replacement require two sets of information: (1) the coordinates of the atoms in the probe molecule, and (2) an X-ray diffraction data set from crystals of the unknown protein. The protocol for obtaining a set of phases for an unknown protein by molecular replacement is as follows: 1. Measure X-ray data from crystals of the native protein. 2. Compare the Patterson function of the unknown protein with that of a known protein to obtain a rotational transformation, placing the probe molecule in the correct
18
Temperature Factors
orientation in the unknown unit cell; see Fig. 2.6. Remember, to calculate the Patterson function of a known crystalline structure, no phase information is needed. The relationship is similar to that described in Eq. (2.7) for a difference Patterson. Replace ⌬Puvw with Puvw, and ⌬Fhkl with Fhkl(calc) in Eq. (2.7). The Patterson function of the probe molecule in molecular replacement is obtained from the calculated structure factors, which in turn were obtained from the coordinates of the known structure. The Patterson function of the unknown is calculated from the observed structure amplitudes, which in turn were measured from a crystal of the unknown molecule. The three-dimensional Patterson function of the probe molecule is rotated until maximum overlap is observed between it and the Patterson from the unknown crystal. This formulation is called a rotation function. It is a time-consuming calculation even on fast computers. If it works, the orientation (but not the position) of the known protein in the unknown unit cell is determined. 3. Now find the correct translational position of the properly oriented probe molecule in the unknown unit cell. This can be done by trial and error. Structure factors are calculated at different increments on a three-dimensional grid, using the atomic coordinates of the correctly oriented probe molecule. The calculated |F(probe)| values are compared with the |F(obs)| values until a good correlation is found. 4. Calculate a set of test phases, (test, hkl), using the oriented and translated coordinates of the probe molecule in the unknown unit cell. The amplitudes for the unknown crystal, Fhkl(obs) are combined with the aforementioned phases to produce a trial electron density map. h k l xyz ⫽ ⌺ ⌺ ⌺ |Fhkl(obs)|e⫺i(test, hkl)e2i(hx ⫹ ky ⫹ lz) ⫺h ⫺k ⫺l
(2.8)
A new model incorporating the unknown protein is built and refined.
TEMPERATURE FACTORS As is described in the next section, crystallographic refinement uses least-squares methods to bring the collection of F(obs) values as close as possible to the F(calc) values. When done properly these calculations improve the accuracy of the protein coordinates. Equation (2.4), described previously for determining F(calc), also should have contained an exponential term called the temperature factor, Bj. Every jth atom has its own temperature factor. The temperature factor effectively smears the atom from a point to a sphere or ellipsoid of electron density. The smearing seems to be temperature sensitive, hence the name. Therefore a new equation for the structure factor, |F(calc)| is n Fhkl(calc) ⫽ ⌺ fje⫺2i(hxj ⫹ kyj ⫹ lzj) e⫺Bj(sin hkl/)2 j⫽1 Bj ⫽ 82j2
(2.9) (2.10)
where j is the mean square displacement in the three principal lattice directions. Temperature factors are obtained only through refinement and are of biochemical interest because they appear to suggest conformational mobility. Atoms with high B values are believed to have more motional freedom in the crystal (and in the protein) than those with small B values. Protein atoms usually have high B values compared with atoms comprising a small organic molecule. Normal B values for atoms in a small ˚ 2. This is equivalent to a mean displacement of about 0.1 A ˚ . B values molecule are 2–6 A 2 ˚ for well-behaved protein atoms range from 10 to 20 A , corresponding to a mean ˚ . B values for atoms in a single protein appear displacement of between 0.15 and 0.5 A
2. Solid-State and Solution Methods for Determining Biological Macromolecular Structure
19
to vary according to their location in the 3-D structure, and this is discussed in more detail in Chapter 3.
REFINEMENT In the early days of protein crystallography, an electron density map was interpreted by building a stick model into the contoured surfaces of an electron density map. No computer graphics were available to aid this process. Once the model was built, it was accepted and studied to determine how it could function in the biological world. Now all but the first reports of the protein structure are refined, and in a literature report of a crystal structure, the word refined will be used even in the title. What is crystallographic refinement and what does refinement do? Effectively, it changes x, y, z, and B for each component atom until the calculated and observed structure factors are in the best possible agreement. That is, |Fhkl, cal| ≅ |Fhkl, obs|
(2.11)
Because there are four parameters—x, y, z, and B—per protein atom, refinement can be a monumental task. For example, if a model contains 5000 atoms, there are 20,000 variables to be optimized. Although it is partly done with a computer, many manual readjustments of the atoms must be made at several steps during refinement with computer graphics. To do this, new phases are calculated with the current atomic positions. An electron density map calculated with the new phases and the original |Fhkl(obs)| is then compared with the partially refined model and further adjustments in the conformation are made. This manual correction process removes major errors in the model. The accuracy of the model can be assessed with what is called an R factor, which is obtained numerically by carrying out the following summation over all measured reflections: R factor ⫽
⌺ (㛳Fhkl(obs)| ⫺ |Fhkl(calc)㛳) ⌺ |Fhkl(obs)|
(2.12)
As the R factor begins to show that the refinement of the protein model is near completion, R factor ≅ 0.2, bound water molecules are then added to the list of protein coordinates. The addition of bound water molecules must be done with great care because some peaks in the electron density map may also be due to noise. Most often a water molecule is included if a peak is visible and if that peak is within hydrogen-bonding distance of a heteroatom of the protein or another water molecule. The reader should be aware that bound water molecules are found only in coordinate lists of crystal structures that have been refined!
ELECTRON DENSITY MAPS AND DIFFERENCE ELECTRON DENSITY MAPS The image of any crystalline protein that can be calculated from the structure factors is a three-dimensional matrix of electron density values, xyz. This grid of numbers typically may contain 100 ⫻ 100 ⫻ 100 values, 106 numbers! To use this map, it is necessary to visualize the electron density. A visual map can be made by selecting some threshold value for xyz and then connecting all points of equi-electron density with sets of lines. The lines are called contours. The contours can be traced onto balsa wood, cut out and glued together as Perutz did with hemoglobin. This is shown in Chapter 1. They can also be traced onto transparent sheets and stacked to form a 3-D map. However, computer graphics make possible a much simpler approach. To visualize the map, lines
20
Electron Density Maps and Difference Electron Density Maps
are again drawn around all numbers of equi-electron density. If this contouring is done in three orthogonal directions, the ‘‘contoured’’ electron density map appears like chicken wire molded into a continuous and complex surface. The computer-displayed chicken wire electron density maps are the ones most often shown in the literature. Remember, it is some form of the electron density map that is used to build the model and obtain the crystallographic coordinates. The interpretation of an electron density map is not a trivial pursuit. Packing of other molecules or subunits in the crystal lattice confuses the determination of a single molecular envelope. Breaks in the continuity of the electron density function may make it difficult to find the ends of polypeptide chains. Sometimes because of resolution and noise, it may be difficult to determine which way the polypeptide chain is passing through a tube of electron density. The final stages of model building are possible only if the primary structure is available. Even getting the correct amino acid sequence aligned throughout the entire map can take a long time. Be aware that a model from a small protein at good resolution, ˚ , is usually a relatively easy task. But interpreting a map of a large multisubunit ⬍2 A ˚ , is a different matter. Usually minor protein at perhaps marginal resolution, ⬍2.8 A errors will occur in spite of the fact that major elements of secondary structure, etc., are correct from the start. It is also possible to calculate a difference electron density map. For example, if we measure the X-ray diffraction data from two crystals with and without a bound ligand, a map of electron density differences can be calculated, ⌬xyz. It can be contoured like an ordinary electron density map. Only the electron density from atoms that are present in the first crystal, but not the second, will appear in such a map. A positive peak would represent the location of bound ligand. A good example of how difference electron density maps may be used is in solving the problem of locating the active site of an enzyme (E) whose crystal structure is known. By soaking substrate (S) into the crystalline protein and remeasuring the X-ray data, we obtain a new set of amplitudes: |Fhkl(E ⫹ S)| We already know |Fhkl(E)| which represents the amplitudes from the original crystal structure determination. The amplitudes for a difference electron density map, a map containing only the positions of any newly bound substrate, would be |⌬Fhkl| ⫽ |Fhkl(E ⫹ S)| ⫺ |Fhkl(E)|
(2.13)
and the difference electron density map would be calculated from Eq. (2.14) [compare with Eq. (2.3)]: h k l ⌬xyz ⫽ ⌺ ⌺ ⌺ |⌬Fhkl|e⫺i(E, hkl)e2i(hx ⫹ ky ⫹ lz) ⫺h⫺k ⫺l
(2.14)
The difference map should be flat or featureless except near the position of the bound substrate, where electron density should be visible. Such difference electron density maps or variations thereof can also be used to determine any conformational changes in the enzyme that accompany the binding of the substrate. The conformational change must be relatively minor because the crystalline form of the protein must remain isomorphous with the unliganded molecule. To remain isomorphous, the liganded and unliganded protein must be in the same orientation and in the same size unit cell for both crystalline forms.
2. Solid-State and Solution Methods for Determining Biological Macromolecular Structure
21
SUMMARY: DETERMINATION OF THE CRYSTAL STRUCTURE OF PROTEINS To form the groundwork for discussing the crystal structure of proteins in detail, a summary of X-ray crystallography applied to protein structure determination has been given. The reader is advised to keep nearby a textbook on crystallography while studying this chapter. Only principles have been described, and this often is not sufficient. First, methods for growing and analyzing protein crystals have been described in outline form. Without describing theory, several terms used in crystallography have been explained and these are critical to understanding the protein structure literature. X-Ray analyses can be carried out at different resolutions. The lower the resolution the more accurate the crystallographic model, but more data are needed. The measurement and bookkeeping of crystal diffraction data are based on the angular position of the X-ray reflection relative to the known orientation of the protein crystal. The X-ray diffraction measurements are only half of the necessary information. To proceed, not only the amplitudes of the reflections must be measured but phases must also be calculated. Phases are derived by multiple isomorphous heavy atom replacement, multiple anomalous dispersion, or molecular replacement. When amplitudes have been measured and phases have been determined, electron density maps can be calculated. At this point, a preliminary molecular model and atomic coordinates are obtained by fitting a stick model to the contoured shapes of the electron density function. After the protein coordinates have been determined, it is possible to calculate the structure factor amplitudes. Then, by comparing (indeed optimizing) the calculated and observed amplitudes, it is possible to refine the coordinates and to calculate temperature factors. Finally, looking at the structure factor equations and those used to calculate electron density maps, always remember the cardinal principle of diffraction methods: every atom in the crystal unit cell contributes to both the phase and amplitude of each structure factor. And conversely, every structure factor contributes to each point in the electron density map.
NUCLEAR MAGNETIC RESONANCE METHODS FOR DETERMINING SOLUTION STRUCTURE A visit to the Protein Data Base (PDB) makes one quickly aware of the growing library of protein oligonucleotide structures obtained by NMR methods. The PDB is a structural databank more fully described in Chapter 3. The NMR methods have the distinct advantage of eliminating the witchcraft associated with the preparation of specimens for X-ray crystallography. However, solution methods do have limitations imposed by the tumbling rate of the protein molecule, the identification of peaks, and the resolution obtainable from the spectrometer. In addition, NMR methods differ from diffraction methods in the sense that one obtains data describing the distance between specified atoms and constraints on dihedral angles present in the polypeptide chain or side chains. These distances and dihedral angle relationships must then be used in some mathematical fashion to calculate model coordinates, a procedure now referred to as distance geometry. Algorithms for distance geometry analyses have been improving rapidly but still present difficulties. Nuclei with observable nuclear spin systems useful for biological molecules include 1H, 13C, 15N, 31P, and 19F. Natural abundance of the isotopes makes 1H the most advantageous nucleus (natural abundance, 99.98%). However, the preparation of recombinant proteins in microorganisms grown in isotope-enriched media has led to an increased use of 13C and 15N (natural abundance, 1.11 and 0.37%, respectively). Still, with thousands of active nuclei present in a protein molecule, resolution of the peaks in a
22
Nuclear Magnetic Resonance Methods for Determining Solution Structure
Fig. 2.7 NMR–vicinal scalar coupling. Information on the torsional angle may be obtained by determination of the vicinal scalar coupling constant 3J. Such data are obtainable for atoms separated by three covalent bonds, as numbered in the schematic drawing.
spectrum requires a number of improvements in instrumentation, and major advances in NMR methods. The feasibility of structural studies of biological macromolecules was enhanced by a series of technical and theoretical developments. Pulsed fields and Fourier methods of data collection and reduction were of fundamental importance. A good description of these methods applied to simple chemical compounds can be found in the textbook by Derome (1987). For proteins and nucleic acids, the textbook by Wu¨thrich (1986) should be consulted. The availability of superconducting magnets marked another major advance by increasing the resolution. The continuing progress in distance geometry methods has been vital to the field of NMR because it allows the study of larger and larger molecules. Finally, the use of both homo- and heteronuclear methods has led to additional constraints that can be used in the determination of solution conformation. For protein and nucleic acid structure determination, the relatively short text by Gu¨ntert (1998) is excellent. In addition to a careful description of NMR methods, it also describes methods involved in distance geometry calculations, energy refinement, and the method of simulated annealing. Because of its breadth and straightforward presentation a great deal of the text is partially applicable to refinement methods in X-ray crystallography as well. In the typical NMR spectrum, various nuclei are separated owing to differences in their chemical shifts. These differences arise because of shielding from the external magnetic field by neighboring nuclei. The next essential bit of information in the NMR method is to measure the data that characterize the nuclear Overhauser effect (NOE). The NOE is due to dipolar interactions with nearby neighbors. This involves direct transfer of magnetization from one nuclei to another. The NOE data can be observed in a so-called NOESY spectrum, which is a 2-D experiment with a carefully contrived pulse sequence (Derome, 1987). The volume of the NOE peak is proportional to the sixth power of the separation of the two atoms, r. Because of this r⫺6 relationship, only ˚ are of significance in a NOESY experiment. The atoms separated by less than about 5 A mixing time of the pulse sequence must take into account the fact that in principle all hydrogen atoms form a matrix of spins coupled by dipole–dipole interactions. This must be coupled with the idea that the volume of the peak is proportional to the mixing ˚ distance, it is time. Because of possible errors in setting the NOE values to a precise A customary not to assign exact values but rather to characterize the NOEs by dividing ˚ are considered strong, them into three classes. Distances with an upper bound of 2.7 A ˚ ˚ are weak. those with an upper limit of 3.3 A are medium, and values up to 5.0 A Although NOE data produce information about long-range interactions in a macromolecule, the determination of scalar coupling constants leads to information on local conformation (Derome, 1987; Gu¨ntert, 1998). The so-called 3J value may be used to determine the torsional angle ⍜ between atoms separated by three covalent bonds, as indicated in Fig. 2.7. In Fig. 2.7, the torsional angle is identical to . This angle is one of the three main-chain torsional angles described in more detail in Chapter 6. The coupling constants may be determined by so-called correlation spectroscopy (COSY) (Gu¨ntert, 1998). Values for are especially useful for determining 1 values, the torsional angle linking C␣ to C. With current enrichment techniques, it is also possible to
2. Solid-State and Solution Methods for Determining Biological Macromolecular Structure
23
Fig. 2.8 Determining a solution structure from NMR data. The schematic diagram is meant to illustrate typical steps that are necessary for solution structure determination. (A) The accumulated distance constraints are assembled from a variety of NMR spectra. To show how the method is used to obtain macromolecule coordinates, three constraints of different length are shown. The distance between atoms 1 and 2 is |x|, etc. (B) A trial model is shown, in which the distance between atoms 1 and 2 is determined to be |x'|, etc. The discrepancy between |x| and |x'| is obvious, and so the process is repeated many times until it converges on a suitable model such as is shown in (C). In this model the distance constraints obtained from the NMR data agree with the atomic coordinates and an ‘‘acceptable’’ model has been determined.
derive torsional angles from 3J values for bonding through coupling involving 13C– 1H and 15N– 1H. Convergence of the distance geometry method is more rapid in proportion to the number of NMR constraints determined. Assembling NMR distance constraints into a set of three-dimensional coordinates is a difficult and only partially automated procedure. A simple illustration is given in Fig. 2.8. A set of distance constraints is shown in Fig. 2.8A. A random starting model is illustrated in Fig. 2.8B. Finally, after many rounds of calculations involving estimation of the gradient for the target function followed by appropriate changes in the coordinates, an acceptable model is shown in Fig. 2.8C. Two distance geometry approaches have evolved and further development in this area of NMR is likely (Gu¨ntert, 1998). One method involves comparisons of experimentally determined distances with those observed in a trial structure of random conformation; the other is in principle the same but codes the evolving structure in terms of torsional angles. Minimizing the difference between the experimental data and the intermediate conformation is an iterative computation with a target function that includes canonical constraints such as bond lengths, bond angles, poor contacts, etc., and of course the distance constraints observed experimentally. To prevent the function from residing in a local minimum, dynamics are included in the minimization process. Molecular dynamics involves movement of the atoms by imposing a small force on each atom and allowing the force to act for a very short time (femto- to nanoseconds). New positions for each atom can then be calculated using Newton’s equations of motion. By following the negative gradient of the potential energy function, the trial model begins to converge toward a global minimum. During the determination of a solution structure, the combination of energy minimization and dynamics is repeated multiple times. Usually at least 20 different starting models are subjected to the distance geometry calculation. As a result, an ensemble of structures is determined. Some of the resulting models may be eliminated on the basis of unsatisfactory agreement with the distance constraints or other elements of the target function. The remaining sets of coordinates should agree in their overall structure, although small conformational differences between each member of the ensemble usually are found. To select a single model for study and/or comparison with another model, several methods have been used. The best approach is to take the average coordinates
24
References
and subject them to rounds of energy minimization. In this way, canonical values for bond lengths, bond angles, etc., are returned to the final model. A second, less acceptable, approach is to select from the ensemble the model closest to the averaged coordinates. Although crude, one model from the ensemble can be visually selected and these coordinates used for comparative purposes.
PROBLEMS 1. List the basic components of an instrument used to measure intensities from a protein crystal. 2. A single X-ray reflection has contributions from how many atoms in the crystalline protein? 3. Would the molecular replacement method or the or heavy atom method require more crystalline specimens? 4. Name the two sources of data that are used to refine the crystal structure of a protein. 5. Using the PDB file for a small protein, find the highest and lowest temperature factors for the component atoms. (Note: If a coordinate set from a large protein is used, it will take more time to look through the coordinates.) Find j, the mean square displacement for those two atoms. 6. Describe the steps in performing a careful comparison between the coordinates of a protein derived from an X-ray study and the ensemble of structures found after an NMR analysis?
REFERENCES Aebi, U., Fowler, W., Buhle, E., and Smith, P. (1984). Electron microscopy and image processing applied to the study of protein structure and protein–protein interactions. J. Ultrastruct. Res. 88, 143–176. Amos, L., Henderson, R., and Unwin, N. (1982). Three-dimensional structure determination by electron microscopy of two-dimensional crystals. Prog. Biophys. Mol. Biol. 39, 183–231. Blundell, T., and Johnson, L. (1976). ‘‘Protein Crystallography.’’ Academic Press, New York. Cantor, C., and Schimmel, P. (1980). Part II. Techniques for the study of biological structure and function, in ‘‘Biophysical Chemistry.’’ W. H. Freeman & Company, San Francisco, 1980. Carter, C., and Carter, C., Jr. (1979). Protein crystallization using incomplete factorial analysis. J. Biol. Chem. 254, 12219–12223. Derome, J. (1987). ‘‘Modern NMR Techniques for Chemistry Research.’’ Pergamon Press, Elmsford, New York. Drenth, J. (1994). ‘‘Principles of Protein X-Ray Crystallography.’’ Springer-Verlag, New York. Gu¨ntert, P. (1998). Structure calculation of biological macromolecules from NMR data. Q. Rev. Biophys. 31, 145–237. McPherson, A. (1982). ‘‘Preparation and Analysis of Protein Crystals.’’ John Wiley & Sons, New York. McRae, D. (1993). ‘‘Practical Protein Crystallography.’’ Academic Press, New York. Stout, G., and Jensen, L. (1989). ‘‘X-Ray Structure Determination—a Practical Guide,’’ 2nd Ed. John Wiley & Sons, New York. Wu¨thrich, K. (1986). ‘‘NMR of Proteins and Nucleic Acids.’’ John Wiley & Sons, New York.
C H A P T E R
3 Crystallographic Coordinates and Stereodrawings INTRODUCTION n January 1999, 9236 sets of protein coordinates were available through a structural biology database called the Protein Data Bank (PDB).1 Included are atomic coordinates derived by X-ray analysis (7587) and nuclear magnetic resonance (NMR) methods (1442), and by model building (207). The number of known macromolecular structures continues to grow steadily. The brief description of X-ray crystallography and NMR given in Chapter 2 describes how these model coordinates are obtained directly from electron density maps in X-ray studies and from NMR distance and angle constraints in solution models. On completion of a structural study of a crystalline protein, the investigators usually send the crystal or solution coordinates to the Protein Data Bank. Indeed, many journals require the deposition of coordinates before publication. In addition, the Protein Data Bank also keeps records on protein models that have been reported in the literature but not submitted to the databank or held for 1 year before release. The reason for withholding coordinates can be sound but frequently controversial. Occasionally, the first publication and X-ray results are ambiguous and the investigators are uncertain about sections of the molecular model. Sometimes such uncertainties are removed during crystallographic refinement. In addition, preliminary coordinates, if not studied with special care, can and have been used to generate completely erroneous conclusions or hypotheses. Coordinates are occasionally withheld because they are being used by a biotechnology company to generate patentable products. Even when the coordinate data are not in the PDB, it is often easy to obtain this information. Generally, after a personal request by telephone or letter, the investigators will send coordinates along with an explanation as to why they should be used with caution. To be able to use these structural data, one must understand the syntax and organization of atom coordinates in a Protein Data Bank file. For simplicity, throughout this chapter a data file obtained from the Protein Data Bank will be called a PDB file. These files include multiple types of information. Usually at the start of the file is a series of records containing ancillary information
I
Protein Data Bank (Chemistry Department, Brookhaven National Laboratory, Upton, NY). Relocated to Rutgers University in New Jersey in July, 1999. Affiliated centers are scattered around the globe. 1
25
26
Header Records of PDB Files
related to the crystallographic or NMR study. Some of these are discussed in more detail below. Next, the PDB file has a single record for each atom in the current crystallographic model—including bound water molecules, metal ions, and cofactors. If the coordinates were obtained by NMR methods, an ensemble of coordinates may be present in the PDB file. Each member of the ensemble was derived from the NMR measurements and distance geometry calculations. This makes the size of the PDB file much larger for an NMR structure than for a structure that has been derived by crystallographic means. In studying NMR-derived structures, it is necessary to choose between any or an average of about 5–10 coordinate lists. If the structure was obtained by X-ray crystallographic methods, each atom record often contains a value for the temperature factor. As mentioned in Chapter 2, the temperature or B factor is a measure of the mobility of the atom and the molecule of which it is a part. Whether each atom in a PDB file has a unique temperature factor depends on the resolution of the diffraction data and the nature of the refinement. If individual temperature factors were not determined during the crystallographic study, a constant ˚2 temperature factor will be found in this position in the atom record: often B ⫽ 20 A is used. As noted already, the first records in a PDB file are not coordinates but are ancillary records that describe the biochemical properties, publications summarizing the structural studies, the amino acid sequence, etc. They can be called header records because they precede the coordinate list. Incidentally, some graphics programs may require the user to strip off these records before using the coordinate files. For example, many crystallographic programs that can be used to manipulate or view the molecule, and require PDB files, expect to find only three preliminary records before the coordinates and will not work with a PDB file obtained directly from the databank. Each record in a PDB file begins with a descriptive word of six or fewer uppercase characters; examples include the words REMARK, HELIX, ATOM, etc. In PDB jargon, this is called the record type. Each of these records then contains specified information about the protein, its source, the structure determination, etc., and, of course, the coordinates under the record type ATOM and HETATM. To learn about these data, the simplest thing to do is to look at some of the information present in both the header records and those containing coordinates.
HEADER RECORDS OF PDB FILES A few examples of record types and the information they contain are as follows: HEADER This is the first record of the file. Let’s say the file contains the coordinates for hemoglobin. The HEADER record for that file is ‘‘oxygen transport.’’ COMPND This record contains the name of the molecule. Wherever possible this record would include both the trivial and systematic names along with the EC (Enzyme Commission) number for enzymes. For hemoglobin COMPND was HEMOGLOBIN (HORSE,DEOXY). SOURCE This record should contain the biological ‘‘source’’ from which the macromolecule has been derived. If a specific mutant has been used, this should be indicated. Carrying on with the hemoglobin example, SOURCE was HORSE (EQUUS CABALLUS). AUTHOR Self-explanatory—the name(s) of the originator(s). This should not be taken literally, and for referencing purposes the article describing the study should be sought and used. REMARK The REMARK records contain a variety of general commentaries about the molecule, the structural study, quaternary structure and symmetry, and the refinement status. Nearly always these records also contain references to journal articles
27
3. Crystallographic Coordinates and Stereodrawings
in which the deposited set of coordinates is described. They are self-explanatory. People using the coordinates should read each of the REMARK records carefully. If journal references are given, the articles should be read as well. JRNL These records provide a journal reference to the structural studies. CRYST Contains the numerical parameters describing the crystallographic unit cell. If a nonstandard space group or axial orientation were used, this would be indicated. A number denoting the number of asymmetric units in one unit cell is also given. For horse deoxyhemoglobin the CRYST record is as follows: CRYST 76.96 81.70 92.63 90.0 90.0 90.0 C 2 2 21 8 The first three numbers are the lattice unit cell dimensions, given in angstroms. The next three are the unit cell angles ␣, , and ␥. For this form of crystalline hemoglobin, the angles are all 90⬚, and so the crystal system is orthorhombic. As indicated, the space group is C2221. In summary definition, the space group describes all of the symmetry operations included in the crystallographic unit cell. The last number in this CRYST record is 8—the number of these operations or equivalent positions in a unit cell. For anyone interested in further information about space groups or equivalent positions, study the crystallography references given in Chapter 2. In the lattice of horse deoxyhemoglobin (our example), each unit cell for symmetry reasons must contain eight equivalent structures. HET Indicates the presence and the nature of any nonstandard components except water molecules. For each component, the sequence identifier (if applicable) is given with both trivial and systematic names. For nucleic acids, the names of modified nucleotides or other nonstandard components would be described in an HET record. In our example of hemoglobin, an HET record would be present and would appear as follows: HET HEM A 1 44 PROTOPORPHYRIN IX, WITH FE2⫹ AND WATER The coordinate list for deoxyhemoglobin contains a nonstandard residue called HEM belonging to subunit A; a standard residue would be SER, LYS, GLU, etc. There are 44 atoms in HEM and the compound is PROTOPORPHYRIN IX, with an iron atom and a water molecule included. HELIX Describes the inclusive pairs of residues for each helical substructure, also indicating the kind of helix found. An example is HELIX 1 AA SER A 3 GLY A 18 1 This record indicates that there is an ␣ helix in subunit A from residues S3 to G18. SHEET This is similar to HELIX, and identifies the -sheet structures including the number of strands, the inclusive residues, the sense, and the registration. None were present in deoxyhemoglobin. TURN Denotes those nonhelical quartets of residues that form hairpin turns ( bends); for example, some turns have a hydrogen bond linking (C–O)i to (N–H)i ⫹ 3. No TURN records were found in the deoxyhemoglobin file. SITE Defines the residues comprising any catalytic, cofactor, anticodon, regulatory, etc., sites. Site identifiers should be explained in REMARKS. No SITE records were found in the deoxyhemoglobin file. ORIGX Describes the transformation matrix and translation vector that relate the submitted coordinates xsub, ysub, zsub to the orthogonal coordinates xsto, ysto, zsto (in angstroms) that are present in the PDB file. The inverse of this transformation is recorded in the file, and for readers unfamiliar with this notation or the use of this coordinate transformation, Vectors and Tensors in Crystallography (Sands, 1982) is an excellent resource. ORIGX1 1.000000 0.000000 0.000000 0.000000 ORIGX2 0.000000 1.000000 0.000000 0.000000 ORIGX3 0.000000 0.000000 1.000000 0.000000
28
Header Records of PDB Files
The preceding records were taken from the horse deoxyhemoglobin file. It is an identity transformation indicating that the coordinates as submitted are already orthogonal (in angstroms). SCALE Defines a procedure for obtaining fractional crystallographic coordinates Xfrac, Yfrac, Zfrac from the submitted coordinates Xsub, Ysub, Zsub. This information could be used to calculate a transformation from the PDB coordinates to fractional crystallographic coordinates. Again drawing on the horse deoxyhemoglobin coordinates, the transformation is SCALE1 0.000000 ⫺0.012994 0.000000 0.232500 SCALE2 0.007258 0.000000 0.009856 0.000000 SCALE3 ⫺0.008693 0.000000 0.006402 0.000000 Note that the preceding transformation has shifted the coordinates so that the a axis in the crystal is used to define the y coordinate in the PDB file. Presumably this has been done to facilitate examination of the quaternary structure. Again, see Chapter 5 of Sands (1982) or a source book on vector algebra for further information on coordinate transformations. MTRIX If the structure contained in the PDB file exhibits any approximate or exact noncrystallographic symmetry, the transformation(s) are given in these MTRIX records. Horse deoxyhemoglobin is a tetrameric protein. Only two subunits are found in the PDB file. To obtain the other two subunits, change each of the coordinates in the file by the vector transformation: MTRIX1 1 ⫺1.000000 0.000000 0.000000 0.000000 MTRIX2 1 0.000000 1.000000 0.000000 0.000000 MTRIX3 1 0.000000 0.000000 ⫺1.000000 0.000000 The symmetry defined by this matrix and translation vector found in the last four columns is equivalent to a molecular dyad around the y axis. This is explained in greater detail in Chapter 5 on quaternary structure. Coordinates for an ␣ dimer of hemoglobin plus the aforementioned matrix are equivalent to the coordinates for the entire ␣22 tetramer. CONECT The records are used to denote any key linkages that are not specified by the amino acid sequence and the secondary structures cited above. Disulfide bridges, and connectivity within HET groups and between HET groups may be indicated. Even hydrogen bonds and salt bridges may be listed. There were no CONECT records present in the deoxyhemoglobin file. However, it is useful to know how to make CONECT records because some graphics programs will use them to draw the structures of compounds other than amino acids and nucleotides. The procedure is relatively simple: Insert a CONECT record so that lines will be drawn between the atoms that are linked by covalent bonds. Let’s use n-propane as an example. It will have three carbon atoms with the atom numbers 3001, 3002, and 3003. If atom 3002 is the middle carbon atom, the CONECT record would appear as follows: CONECT 3002 3003 3001 The first integer after the record name gives the atom number from which lines representing covalent bonds will be drawn. C-2 of propane (atom 3002) should be connected to C-1 and C-3 (atoms 3003 and 3001, respectively). TER The presence of any discontinuities in the main chain(s) are given by the name and sequence identifier of each carboxy-terminal residue for proteins and the 3'-terminal residue for nucleic acids. These are explicit discontinuities present in the coordinates as in the case of proteins with multiple polypeptide chains. Common examples are found in proteins such as ribonuclease S and chymotrypsin. Records of this sort are not present to indicate poorly defined regions in the electron density map for which coordinates may have been omitted.
3. Crystallographic Coordinates and Stereodrawings
29
Fig. 3.1 A comparison of coordinate systems. (A) Orthogonal; (B) crystallographic.
SEQRES Each record contains part of the amino acid sequence (given in the three-letter code), beginning at the N terminal. With 13 residues per record, the ␣ chain of horse deoxyhemoglobin would have 141/13, or 11 SEQRES records. Many, many more types of records can be present in a PDB file. For anyone interested in a complete description of every conceivable record type, the document Protein Data Bank Contents Guide can be obtained from the Web site: http://www. rcsb.org/pdb/docs/format/pdbguide2.2. Before choosing to print it, beware: it is more than 100 pages long.
COORDINATES To make use of any list of coordinates, it is necessary to describe the coordinate frame and the units. As stated already, PDB coordinates are always given in a frame of orthog˚ ). The orthogonal system usually correonal axes, the units of which are angstroms (A sponds to the crystallographic axes describing the unit cell. The x coordinate is along the a axis, y is along the b axis, and z is along the c axis whenever this is possible. However, in the case of deoxyhemoglobin, as we saw above, this was not the case. A comparison of the coordinate systems is shown in Fig. 3.1. In most instances, computer graphics or display programs require the orthogonal system, and whether the coordinates are related to those in the crystal lattice is unimportant. Occasionally it may be necessary to describe the coordinates in the crystallographic frame, and this may be done with the SCALE parameters described above. Fractional crystallographic coordinates can be more confusing because the position of each atom is defined by the fractional length of the unit cell axes and because the unit cell axes may not even be at right angles! Nearly every textbook on crystallography describes some mathematics for the interconversion of crystallographic and orthogonal coordinates. The most complicated of these relationships occurs with the most primitive unit cell, a triclinic cell. In this case, the crystal axes are not at right angles and each is of different length. The conversion from fractional crystal to orthogonal or PDB format is complex (Sands, 1982). Once formulated using vector algebra, computer programs can be written or accessed to convert a file containing PDB coordinates for a protein back to crystal coordinates. Similarly, students interested in structural biology should be encouraged to study the coordinates with their own programs. With thousands of atoms, answers even to simple questions may require the aid of a computer. Even though we are to use stereodrawings and/or computer graphics to examine protein structure, it is still important to understand a few more aspects of both the coordinates and how they are presented in PDB files. To specify the position of any
30
Atom Labels
atom in the orthogonal frame requires only three coordinates. But clearly, such a list of xyz values by itself would be useless. Further use of these data requires that each atom be identified in terms of atom type, residue type, and residue number. With this information, it is possible to connect all atoms correctly in the bonded polymer. Keep in mind that any program that draws line representations of bonded atoms must also be given data about the connectivity between atoms. The connections between atoms in a protein crystal structure are not implicit in the coordinates or even in the residue numbers when side-chain atoms are considered.
ATOM RECORDS IN A PDB FILE An example of the ATOM records in a PDB file can be found in the sampling of coordinates for two residues of horse deoxyhemoglobin, which are shown in Table 3.1A. Note that each atom forms a single record in the PDB file following the header records already described. Not shown in each atom record are the last two variables: the fourletter PDB code for the protein and the record number in the PDB file. The four-letter code for horse deoxyhemoglobin is 2DHB, which, by the way, is quite often the file name in PDB libraries. The ATOM records shown in Table 3.1 contain 11 variables that are defined as follows. 1. A record containing coordinates begins with the word ATOM—the record name. 2. Next is the atom number, nearly always beginning with amide nitrogen of the amino acid at the NH2 terminal and continuing through the thousands of atoms present. 3. This is followed by the atom name; more on this later. 4. The fourth variable is the three-letter amino acid name. 5. This is followed by the chain name, which may be absent if a single polypeptide chain is present. 6. The residue number follows and in Table 3.1A, the coordinates for amino acids W14 and R31 of deoxyhemoglobin are listed. 7–9.The next three variables are the x,y,z coordinates in angstroms (orthogonal). 10. The second from the last variable in an ATOM record is the occupancy, which is explained in Chapter 8 on polymorphic conformation. It represents the fractional number of atoms present at this site in the crystal. Therefore it will have values from 0 to 1.0. 11. The last variable is the B factor. In some coordinate files, the B factor may be the same for all atoms. This simply means that the coordinates have not been refined or that they were refined with the same B factor for all atoms. A typical protein coordinate file may not end with the C-terminal amino acid. Quite often the position (coordinates) of a prosthetic group, a coenzyme, bound water molecules, etc., were found to be integral parts of the crystal structure. Coordinates have been measured for them and they usually follow the records listing the atoms of the polypeptide chain(s). Sometimes the investigator will report only the overall conformation of the protein. Such a PDB file might contain only the ␣-carbon coordinates.
ATOM LABELS The atom labels are the only part of a PDB file that needs further explanation. They follow the Greek alphabet starting with the ␣-carbon as shown in Table 3.2. Now it is apparent how the atom names can be used to draw stick models in computer graphics studies. Atom N is always connected to CA, CA to CB, CB to either or both SG, OG,
31
3. Crystallographic Coordinates and Stereodrawings
TABLE 3.1 PDB Coordinates for Two Amino Acids in Horse Deoxyhemoglobin and One Amino Acid in Intestinal Fatty Acid-Binding Protein ˚) Amino Orthogonal coordinates (A Record Atom Atom acid Chain name number name name name Residue x y z Occupancy B factor
A. Two Amino Acids in Horse Deoxyhemoglobin ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM
90 91 92 93 94 95 96 97 98 99 100 101 102 103 214 215 216 217 218 219 220 221 222 223 224
N CA C O CB CG CD1 CD2 NE1 CE2 CE3 CZ2 CZ3 CH2 N CA C O CB CG CD NE CZ NH1 NH2
TRP TRP TRP TRP TRP TRP TRP TRP TRP TRP TRP TRP TRP TRP ARG ARG ARG ARG ARG ARG ARG ARG ARG ARG ARG
A A A A A A A A A A A A A A A A A A A A A A A A A
14 14 14 14 14 14 14 14 14 14 14 14 14 14 31 31 31 31 31 31 31 31 31 31 31
22.881 22.622 23.783 23.800 21.213 20.831 21.004 20.248 20.681 20.271 19.825 19.879 19.414 19.442 16.467 16.174 14.696 14.307 16.892 18.378 18.813 20.276 21.173 22.418 20.965
14.854 15.155 15.846 15.897 15.701 15.274 15.974 14.057 15.140 14.013 12.944 12.872 11.769 11.733 ⫺2.155 ⫺2.970 ⫺3.056 ⫺3.945 ⫺2.495 ⫺2.646 ⫺1.788 ⫺1.615 ⫺2.152 ⫺1.721 ⫺3.346
⫺2.876 ⫺4.314 ⫺5.110 ⫺6.364 ⫺4.614 ⫺6.025 ⫺7.181 ⫺6.363 ⫺8.189 ⫺7.736 ⫺5.616 ⫺8.445 ⫺6.300 ⫺7.717 ⫺11.004 ⫺9.786 ⫺9.412 ⫺8.624 ⫺8.550 ⫺8.586 ⫺7.432 ⫺7.468 ⫺6.643 ⫺6.717 ⫺6.097
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
B. An Amino Acid from Intestinal Fatty Acid-Binding Protein ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM
36 37 38 39 40 41 42 43 44 45 46 47 48 49
N CA CB CG CD2 CE2 CE3 CD1 NE1 CZ2 CZ3 CH2 C O
TRP TRP TRP TRP TRP TRP TRP TRP TRP TRP TRP TRP TRP TRP
A A A A A A A A A A A A A A
6 6 6 6 6 6 6 6 6 6 6 6 6 6
⫺8.567 ⫺7.899 ⫺7.382 ⫺8.461 ⫺9.144 ⫺10.023 ⫺9.119 ⫺8.903 ⫺9.853 ⫺10.815 ⫺9.885 ⫺10.735 ⫺6.816 ⫺6.056
⫺5.331 ⫺6.067 ⫺5.043 ⫺4.188 ⫺4.415 ⫺3.350 ⫺5.428 ⫺2.977 ⫺2.473 ⫺3.294 ⫺5.360 ⫺4.301 ⫺6.888 ⫺6.333
10.429 9.371 8.377 7.767 6.508 6.290 5.555 8.197 7.334 5.151 4.438 4.229 10.014 10.763
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
7.69 4.82 10.78 6.46 8.89 9.33 9.20 5.36 18.85 24.03 22.60 14.32 29.03 13.90
32
Stereodrawings
TABLE 3.2 Atom Names for the 20 Standard Amino Acids Atom name
Greek symbol
N⎯CA⎯C⎯O ⎮ CB ⎮ CG SG OG OG1 CG1 CG2 ⎮ CD ND1 OD1 SD CD1 CD2 ND2 ⎮ CE NE OE1 CE1 NE1 CE2 CE3 NE2 OE2 ⎮ CZ CZ2 CZ3 ⎮ CH2 NH1 NH2 OH
Alpha, ␣ Beta,  Gamma, ␥ Delta, ␦ Epsilon, ⑀ Zeta, Eta,
OG1, CG1, or CG2, etc. Nearly all computer programs capable of drawing stick model representations have a template file that describes the proper covalent bonding for each of the amino acids. A few exceptions exist, in which lines (bonds) are drawn between ˚ atoms separated by a specified distance, for example, any two atoms less than 1.6 A apart. When new compounds are bound to a crystalline protein and need to be displayed, the investigator may have to generate a template that describes the atom connectivity (see CONECT under Header Records of PDB Files, above). As a final note, it should be emphasized that X-ray studies of crystalline proteins do not identify the element identity of atoms positioned in the electron density map. The protein crystallographer always needs to use the principles of chemical bonding in proteins; moreover, he/she needs to know the amino acid sequence!
STEREODRAWINGS The coordinates that are derived from crystallographic or NMR studies represent conformation in three dimensions (3-D). It is possible to use shaded or colored diagrams to derive some small sense of the 3-D nature of a protein structure. However, such representations cannot be used to illustrate information-rich details of the three-dimensional atomic structure. Instead, there is a simple trick that can be done so that a viewer can sense true dimensionality. The trick is to produce two images in exactly the same orientation side by side, but with the rightmost image rotated approximately 6⬚ around a vertical axis perpendicular to the viewing direction. This small rotation makes the right image appear slightly different from the left. The image rotation mimicks the right and left eye images of a single object. For the viewer the trick is to look at a stereodrawing such that the one image is seen by the left eye and the other by the right eye. Stereodrawings are shown in Figs. 3.2–3.9. The drawings were purposely selected to be simple and therefore contain only a few amino acids. Learning to use stereovision should also be combined with learning the appearance of the various amino acid side chains and the one-letter amino acid code. The latter is given in Table 3.3. It is critical that the student of protein structure know both. The one-letter code will be used exclusively in the remaining chapters. The ability to recognize side chains by their stick drawings is vital because it is often impossible to label every amino acid in a stereoview of even a part of a protein molecule. Such labels often obscure an important part of the
3. Crystallographic Coordinates and Stereodrawings
33
Fig. 3.2 Methionine–lysine–valine–alanine. This stereodrawing (and those in Figs. 3.3–3.9) represent peptide fragments extracted from the PDB file on horse deoxyhemoglobin or malate dehydrogenase from Escherichia coli. They therefore are missing one oxygen atom on the C terminal. All are coded in the same way. A single sphere or circle is a carbon atom. Oxygen atoms are represented by two concentric circles and nitrogen and sulfur atoms by three concentric circles. The stereodrawings have the correct chirality if viewed with stereoglasses.
Fig. 3.3 Isoleucine–glycine–glutamine.
Fig. 3.4 Tryptophan–lysine.
34
Stereoglasses
Fig. 3.5 Arginine–phenylalanine.
Fig. 3.6 Tyrosine–aspartic acid.
structure. It may not be labeled, so you must be able to recognize it! Finally, it should be noted that most of the stereodrawings in later chapters are simple stick representations. A few of the amino acids will not be distinguishable from ‘‘isostick’’ equivalents. How would you tell the difference between N and D in a stick model? You can’t! Before trying to use the stereodrawings you should have a good idea how images can be separated to fulfill the criteria described above. In fact, there are several ways of viewing such representations to produce a stereoeffect.
STEREOGLASSES Special glasses are available2 for viewing the stereodrawings in biochemical journals and this book. They are used by laying the drawing on a flat surface. With the glasses about 6 inches above the drawing and centered over the stereoimages, look straight down through the glasses. If you are lucky you will mentally visualize only a single Stereoglasses may be obtained from various suppliers. Manufacturers include Abrams Instrument Corporation (Lansing, MI) and Hubbard Scientific Company (Northbrook, IL and Denver, CO). 2
3. Crystallographic Coordinates and Stereodrawings
35
Fig. 3.7 Leucine–serine–histidine.
Fig. 3.8 Threonine–cysteine–proline.
image. It will be in stereo! The stereoglasses usually have low-powered lenses and the correct focal distance from the drawing can be determined by raising or lowering the glasses until the stereoimage is in focus. If you see two or more images, concentrate on looking straight down. Play with the interocular spacing of the glasses until only one image is apparent. It is best at first to view simple, ‘‘warm-up’’ images, as are found in this chapter.
36
Divergent Eyes
Fig. 3.9 Glutamic acid–glutamine–asparagine.
There are, in addition, at least two ways to study stereoimages without the use of glasses: the divergent and crossed-eyes methods.
DIVERGENT EYES An article by McKeon and Gaffield (1990) describes a method for getting the right eye to see the right image, and the left eye to see the left image of a stereopair. Some individuals can do this without glasses. This is often referred to as the ‘‘walleye’’
TABLE 3.3 The One- and Three-Letter Codes for the 20 Common Amino Acids Amino acid
Three-letter code
Alanine Cysteine Aspartic acid Glutamic acid Phenylalanine Glycine Histidine Isoleucine Lysine Leucine Methionine Asparagine Proline Glutamine Arginine Serine Threonine Valine Tryptophan Tyrosine
Ala Cys Asp Glu Phe Gly His Ile Lys Leu Met Asn Pro Gln Arg Ser Thr Val Trp Tyr
One-letter code A C D E F G H I K L M N P Q R S T V W Y
3. Crystallographic Coordinates and Stereodrawings
37
method. If you can do it, it will be very useful since you can see stereo even when your glasses are at home. Most people find it easier to cross their eyes to visualize stereo pairs as described below.
CROSSED EYES Unlike what dear old mother may have told you, crossing your eyes will not leave them that way permanently. In fact, oculists say it is a healthy thing to do. Many more people can observe stereo in this manner than by using the divergent eye method. To test yourself, place two nickels—any coins will do—about 6 cm apart. Hold a pencil or pointed object exactly between them. Now stare at the point of the pencil as you bring it closer to your eyes. At some point as the pencil tip is being raised toward your eyes, three nickels will appear. Practice this for awhile and you will soon be able to accomplish the same thing without a pencil! If you were looking at stereoimages, the middle one would appear to be in three dimensions and you will be able to tell which atoms are close and which are farther from you. There is one problem with this method. When mentally observing the mirror image of the stereodrawing, a right-handed helix appears left handed. Atoms that are close to you will be in the back for a person using stereoglasses. Beware of this when answering questions in the accompanying problem sets. One additional advantage of the crossed-eyes method is that it works equally well for viewing slides projected at some distance from you. If the slide contains stereoimages, learning to cross your eyes will make it easy to see the object in stereo. Incidentally, most computer graphics programs have a side-by-side stereo feature. Once you acquire the skill to use crosseye or walleye stereo, it will be extremely useful for computer graphics work.
THE AMINO ACIDS To learn from stereodrawings, it is important to be able to identify each atom in the main chain of the protein or nucleic acid. At first, therefore, ignore the side chains; look at the stereodrawings in this chapter and learn to recognize the direction of the chain: N terminal to C terminal. As you visually follow the conformation of the polypeptide chain in stereo, identify each atom: N–CA–carbonyl carbon–carbonyl oxygen–next CA. Sometimes the identification is simplified by color coding of the elements: C, N, O (C is yellow or gray, W is blue, and O is red). In the text that follows, shading has been added to aid in the identification. Nitrogen and sulfur atoms are shown as three concentric circles, oxygen atoms as two, and carbon atoms as one circle. Identifying amino acids in a stereodrawing of a protein will be simple if you practice on Figs. 3.2–3.9. The four amino acids in Fig. 3.2 are M, K, V, and A. Here is where knowledge of the one-letter code for the amino acids is necessary. The data for Figs. 3.2–3.9 were taken from the ATOM records of a PDB file, starting with the ␣-amino group. These will be absent in stick stereodrawings! It will be necessary to identify the polarity of the main chain by the fact that side chains come off the ␣-carbon and the carbonyl oxygen comes next as one moves toward the C terminus. Incidentally, in Figs. 3.2–3.9 note that there is only one oxygen atom coming from the C terminus. Once again, this is because the images were taken from a large protein file and no coordinates are available for the extra oxygen because it is not a true C-terminal amino acid. In Fig. 3.2 note the conformation of the M side chain. The sharp right-angle bend at the sulfur is characteristic of the conformation of the methionine side chain. Look carefully at the V side chain; in stick models it can be mistaken for a T. There are two ways of distinguishing V and T. If one of the  branches is near another oxygen atom,
38
The Amino Acids
suggesting a hydrogen bond, this must be the oxygen and the side chain is a threonine. If this is not possible the trivial approach must be taken; look at the primary sequence and correlate it with the stereostick model. Threonine has two chiral carbons: C␣ and C. One additional point: see if you can convince yourself that proteins are made of L-amino acids. While looking in stereo at the ␣-carbon from what would be the single hydrogen position, the other three bonds should be methyl (side chain)–amino (nitrogen)–carbonyl carbon (MAC) in a clockwise direction. Others use CORN: CO stands for the carbonyl carbon, R for the side chain, and N for the peptide nitrogen appearing in that order and again in a clockwise direction. Any one of the amino acids can be used for this purpose. Of course, if the stereodrawing is viewed with crossed eyes, the bonds (MAC/CORN) will be in a counterclockwise orientation! The amino acids I, G, and Q are shown in Fig. 3.3. Glycines (G) are frequently found at sharp bends in globular proteins and a bend is visible in Fig. 3.3. Incidentally, notice how the carbonyl oxygen of I12 is near the nitrogen of Q14. If the PDB coordinates were available, a distance check could be made to see if a hydrogen bond was present. Isoleucine (I), like T, also has two chiral carbon atoms: C␣ and C. A lysine (K) and a tryptophan (W) are depicted in Fig. 3.4. By now you should be able to determine in which direction the chain is going without looking at the amino acid sequence numbering. Is it K–W or W–K, assuming the conventional numbering from the N to the C terminal? Figure 3.5 shows both an arginine (R) and a phenylalanine (F). Like W, both R and F side chains are easily identified by their covalent bonding, even in stick models! An aspartic acid (D) and a tyrosine (Y) are shown in Fig. 3.6. Asparagines (N) can be confused with D in stereostick models. Unlike V/T, both N and D have two potential hydrogen-bonding groups and for stereostick models, the identification must be made using the primary sequence. In Fig. 3.7, three amino acids are shown: leucine (L), serine (S), and histidine (H). The covalent and stick structure of histidine should be studied carefully. Histidine side chains frequently have important biochemical functions. The imidazole side chain serves both as a metal chelator and as a general acid/base. With a normal pK of about 6.5, it can easily change ionization states in the physiological pH range. Furthermore, two tautomers are possible; one is shown at the bottom of Fig. 3.7. To visualize the other tautomer, move the hydrogen atom attached to NE2 over to ND1, and the ND1=CE1 double bond to CE1=NE2. H87 in deoxyhemoglobin is often referred to in the literature as the ‘‘proximal’’ histidine binding directly to the heme iron atom. If the deoxy- and oxyhemoglobin PDB file is checked, the NE2 nitrogen atom is bonded to the Fe2⫹ atom. Which of the tautomers could be expected at this position at H87 in deoxyhemoglobin? In addition to tautomeric forms, histidine side chains present even more difficulties. Electron density maps cannot be used to distinguish between two notably different orientations of the imidazole ring. Look at Fig. 3.7 and imagine rotating the imidazole ring of H48 180⬚ around the C –C␥ bond. If the difference between C and N atoms is ignored, the ring has the same appearance. The crystallographer must use chemical intuition along with nearby polar atoms to select an orientation. A peptide containing threonine (T), cysteine (C), and proline (P) is shown in Fig. 3.8. Proline is the only amino acid that has no hydrogen atom on the ␣-amino nitrogen when incorporated into a peptide; this prevents it from serving as a hydrogen bond donor. A note concerning a rare event among amino acids in general: proline, more often than the other amino acids, is found with its ␣-amino nitrogen involved in a cis peptide bond. Finally, Fig. 3.9 gives examples of the amino acids glutamic acid (E), glutamine (Q), and asparagine (N). Like D and N, glutamic acid and glutamine are isosteric. They cannot be distinguished from each other by their shapes, even in a high-resolution electron density map. The reader should by now have become comfortable with stereo-
3. Crystallographic Coordinates and Stereodrawings
39
drawings. By limiting each to a few amino acids, viewing is somewhat easier. These simple stereodrawings should be used again and again until viewing with stereoglasses or by one of the other methods (crossed or divergent eyes) is as easy as reading a graph!
VIEWING PROTEIN MODELS WITH A COMPUTER Once the coordinates of a protein molecule are known, they can be viewed not only via diagrams and stereodrawings but also on a computer terminal. It is even possible to view the models on the color monitors of Macs or PCs. To do this it is important to become acquainted with the steps involved. The problems at the end of this chapter will require an ability to carry out each of the following four steps: (1) determine the fourletter identification code for the coordinates to be obtained from the Protein Data Bank; (2) extract the coordinates from the databank and put them on disk; (3) run any preconditioning program on the PDB coordinates and prepare the file that will actually be displayed; and (4) run the display program. The text that follows has been cribbed from various program writeups that are used in these studies. Whenever known the author is listed, but in a few instances the guides were lifted from various places and the author is unknown. Apologies to these individuals; the trail would have been too difficult to follow.
THE FOUR-LETTER IDENTIFICATION CODE Before downloading coordinate files from the Protein Data Bank, it is useful to know the four-letter identification code for the protein of interest. In the data bank, every entry is catalogued by this four-letter code. There are several ways to obtain this key. Usually, they are included in any published articles that describe the structure. In this book, for example, the accession code for conformational data used for illustrations is frequently included in the figure legends. A few examples of PDB accession codes are as follows. If a search is done for the enzyme malate dehydrogenase, eighteen entries are returned. The first two entries are given below: 1BDM Deposited: 02/16/1993 Exp. Method: X-ray ˚ Classification Diffraction Resolution: 1.80 A Oxidoreductase (Nad(A)-Choh(D)) Compound Malate Dehydrogenase (E.C. 1.1.1.37) Mutant With Thr 189 Replaced by Ile (T189I) Complexed With -6-Hydroxy-1,4,5,6Tetrahydronicotinamide Adenine Dinucleotide (Referred To As (6Htn)Ad. Or Nadhx) 1BMD Deposited: 11/10/1992 Exp. Method: X-ray ˚ Classification Diffraction Resolution: 1.90 A Oxidoreductase(Choh(D)-Nad⫹(A)) Compound Malate Dehydrogenase (E.C. 1.1.1.37) (Bacterial) Complexed WithNadh 1BDM and 1BMD are the accession codes of two of the eighteen malate dehydrogenase structure. If you have the code name in hand, computer resources such as ‘‘FTP or Fetch’’ are excellent tools for downloading Protein Data Bank files. However, to use them easily and effectively, you must become familiar with the directory system at the PDB. Incidentally, in any scientific report where you use the coordinate data, reference should be made to the relevant publication and the four-letter accession code. The PDB accession codes also describe a second format for the structural data. The new format is given the acronym CIF, or crystallographic information file. The
40
World Wide Web Page for PDB
International Union of Crystallography initiated the project in 1990. The new CIF formatting and cataloguing system is designed to be a subset of a broader form of software for archiving in any order both textual and numerical data. Development of the CIF format continues, and sometimes in the future it may supercede the existing PDB format. Few have yet explored the use of CIF files, and tools to use this new format are limited.
DOWNLOADING COORDINATES WITH FTP There are two ways of downloading macromolecular coordinate files. The use of direct file transfer (FTP protocol) is complicated, but less time is required to download the file. FTP is the acronym for file transfer protocol. The second method uses all the point and click features of a well-developed Web site and is described briefly below. Use of the PDB Web site can be particularly slow if the communication is through a modem. First look at the FTP procedure. To access the FTP site, go to the node: ftp.rcsb.org. You must login with the username anonymous. General practice is to use your e-mail address for the password. Remember, users logging in as anonymous have only limited access to the server’s file system. If you choose to use anonymous FTP to bring the file to your computer, each entry resides in a directory containing several files. Files are added and updated on a continual basis. The four-letter accession code will help you choose the correct directory. For example, if the accession code is 1BDM, you will have to go to the directory /pub/pdb/data/structures/divided/pdb/BD You can then download the file pdb1bdm.ent.z with the get command. Note that the final branch of the directory tree is the same as the middle two letters of the accession code. The data is stored in a packed unreadable format as characterized by the filename extension .z. To format the file, use Stuffit Expander or any similar program. In 1995, additional coordinate files for oligomeric proteins were made available for users. The need for these files is explained in detail in Chapter 5. Often a submitted structure contains only a subset of the coordinates for a known oligomeric protein. This happens when an oligomer has point symmetry that is also used crystallographically. Because the number of coordinate files is so large and the accession codes so obtuse, it is often easiest to use the World Wide Web to obtain the code name. The Web browsers will then also allow you to download the file you want directly or you can download the file with FTP once you know the identification code.
WORLD WIDE WEB PAGE FOR PDB The Protein Data Bank also maintains a WWW site and facilities for finding and downloading files in one step. The Web address is: http://www.rcsb.org Both a simple and a more complex search are possible. The latter permits searches based on the names of investigators, classes of proteins, etc. In fact, the search can be customized according to the categories listed below: PDB Identifier Citation Author Contains Chain Type PDB HEADER
Chain Length FASTA search Short Sequence Pattern Secondary Structure
41
3. Crystallographic Coordinates and Stereodrawings
Experimental Technique Deposition/Release Data Citation Compound Information EC Number Text Search
Content Resolution Space Group Unit Cell Dimensions Refinement Parameters
Although slower than FTP accesses, the Web site is easier to use. Coordinate files downloaded directly from the Web site appear in the form of ASCII (text only) files and need not be decompressed.
DISPLAYING PDB COORDINATES You should now have a PDB file on a Mac or PC diskette. This is a text-only file. Read through the header records carefully, making note of the information needed to prepare a computer image file. Most display programs ignore the header records in the PDB file, and so any word processing program may be used to edit/remove records. Remember: The PDB file used to display the image must be text-only or ASCII format! Many programs are available for displaying models from these PDB files. Most of them cost thousands of dollars. Fortunately two crystallographers, Drs. David and Jane Richardson (1992), have written software for displaying proteins. They have put this software in the public domain. In fact, each issue of Protein Science contains reference to so-called kinemages. Both the programs (PREKIN and MAGE) and a variety of kinemages may be downloaded from a WWW site maintained by Protein Science or directly from the program authors. For the former, use the WWW address: http://www.prosci.uci.edu The programs are available for both Mac and IBM-type PCs. A brief description of how to use these programs is given in the following sections. However, you are urged to read the Richardsons’ article as a prelude to using these display programs (Richardson and Richardson, 1992). PREKIN and MAGE are under continual development to increase their capabilities and newer versions appear from time to time. In addition to the Protein Science site, programs for both Macs and IBM-compatible PCs can also be obtained by FTP from suna.biochem.duke.edu A thank-you note to the Richardsons would be a gracious way of showing gratitude for the excitement you will now have. Send it to
[email protected] PREKIN AND MAGE3 PREKIN is a utility program that prepares a script file of a trial kinemage from PDB formatted coordinate files. This kinemage file is read by the MAGE program to produce rotatable color stick and ribbon images of the target protein. The PREKIN user interface employs a succession of dialog boxes to define the kinemage. Multiple passes through PREKIN can build a kinemage with complicated groupings. Once a kinemage file has been formed, it can be modified by two different mechanisms. PREKIN output is intended to be edited using a word processor program to regroup, rename, and delete unnecessary items. However, PREKIN output can always 3
Partially taken from the online directions of Prof. David Richardson (DCR).
42
PREKIN and MAGE
be immediately viewed and evaluated with MAGE. Early on, the new user should edit a .kin file so as to become aware of the usefulness of changing this file rather than restarting with the program PREKIN. Remember to use the text-only facility of a wordprocessing program to edit the .kin file. Mage cannot understand any of the extra characters contained in a word-processing file. In addition to direct editing, files produced by PREKIN and viewed with MAGE can also be edited directly in the MAGE program. First, remove the undesirable portions, using the various MAGE utilities. Next, modify and write a new .kin file, using the option under File in the menu bar. It is necessary to learn to use either the direct edit or editing routines in MAGE (EDIT/PRUNE) to produce a suitably simple image. The usefulness of computer graphics derives from the ability to change the orientation of the model image and the forementioned ability to eliminate unwanted portions of these (almost always) complex molecules. Judicious editing becomes increasingly important as the size of the molecules increases. Depending on the power of the computer, rotating more than a few hundred vectors (bonds) can become so slow that the viewer becomes quickly frustrated. There are several types of control specifications to prepare images with PREKIN. They appear through a series of dialog boxes. File selection for input and output is done as soon as PREKIN is started. Look under File at the top of the dialog screen. 1. Choice of operation, either from a menu of built-in scripts, from an external script (written for an earlier version of PREKIN), or by specifying ranges, focus, etc., individually: Built-in scripts presently include options for producing kinemages ranging from ribbon diagrams to ball-and-stick drawings. For nucleic acids, the option used to produce a C␣ model gives a virtual bond pseudobackbone drawn between P, C-4', and C-1' atoms. Option d gives the pseudobackbone, plus the bases, grouped and color coded. Option e gives all-atom backbone, sugars, and bases. Hydrogen bonds are not calculated between base pairs, but it is easy to add them in MAGE. For carbohydrates, which are treated as ‘‘hetatm’’ in PDB files, PREKIN will look for all possible connections between sugar residues if you set that option under Kludges in the initial Rangestart dialog box. It is slow, so use it only when needed. 2. Subunit selection: This selection controls which subunits will be considered for display according to the options mentioned above. The subunits are recognized by the chainID field between amino acid (aa) name and residue number in the PDB file, or by a line starting with MODEL n, as in multiple NMR structures. Each subunit will be put into a new group of display objects. For nucleic acids, remember to ask for enough subunits to get both strands of a DNA duplex. 3. Range controls: If Range controls is chosen, a range-control dialog box appears. For each range it is necessary to specify the starting and ending residue numbers and/or residue type (res; e.g., Lys) to set the extent of the range. Next, check the box for each kind of display object to be produced for that range (mc, main chain; sc, side chain; hb, backbone hydrogen bonds; hy, hydrogens; ca, C␣ atoms; ht, nonwater heteroatoms; wa, waters; at, atom markers; lb, labels). Options will be presented: to accept that range, either going on to the next range or ending the range set; or to end and write a script file. If PREKIN does not find the expected heteroatoms or waters, look in the coordinate file to make sure the beginning of the lines say HETATM rather than ATOM, and also check which subunit chainID they have. For example, to make a kinemage showing a  hairpin, the following two ranges could be entered: 14 to 35: mc, hb 24 to 25, sc This will produce vectors connecting main-chain N, C␣, C, and O atoms, plus main-chain hydrogen bonds for residues 14–35, and side chains for residues 24–25.
3. Crystallographic Coordinates and Stereodrawings
43
As another example, it is possible to produce side-chain vectors and atom markers at the S for just the cysteine residues in a protein, plus all C␣ atoms, with the following ranges: ⫺999 to 9999: sc, at, cys ⫺999 to 9999: ca 4. Focus controls: Focus controls are used to make a display list of things within some radius of a specified point. The x, y, z of the focus point can either be typed in or read from a file made by MAGE (see below), or PREKIN can find the center point of a given-numbered residue to serve as the focus point. PREKIN will then ask for radii within which it will output side chains, main-chain, C␣ atoms, waters, and nonwater ˚ . The heteroatom groups; the default values for those radii are 8, 10, 15, 10, and 10 A radius is tested at each atom of the target side chain, main chain, etc.; any atom inside that radius is included. This is useful for looking at a special volume of a given molecule, for example, the region around a specified residue in an active site, a metal-binding site, or a site at a subunit–subunit interface. After all of the preceding items have been specified or defaulted, PREKIN will go to work and print messages about its progress, ending with a count of the number of triples written out and a message to select Quit or New Pass from the File menu. A new pass allows the user to do another run through script, menu, or focus selections, using the same input coordinate file and writing onto the end of the same output file.
Text, Captions, and Colors PREKIN writes short default text and caption fields, and it assigns default colors to each display list. The text and caption fields are easily changed in the .kin files. They are found following the @text and @caption statements, respectively. Coloring is set in the @vectorlist line by the color⫽xxxx text. Changing colors to help visualize the model can be very important and should be tried early in the learning stage. A list of the color names available can be found in Appendix 1 at the end of this chapter. There are many other useful commands that can be put into the .kin file. For example, to make hard copy the statement @whitebkg can be used. The command @stereoangle can be set for any angle of stereoviewing; ⫺6 gives cross-eye stereo. The command @keepthinline draws the bonds in a thin-line format and improves rotation performance. Many of the same effects can be produced within MAGE by pressing single keys, all of which act as on–off toggles: the S key to keep stereo, the C key to switch between normal and cross-eye stereo, the T key for thin-line, and the P key for perspective.
Views An important early step in the editing process is to choose one or more good views in MAGE. This will include adjusting the zoom and zslab, and recentering the molecule, if appropriate. Press the S key to toggle in and out of stereo, and try to choose a size and orientation that is satisfactorily large in mono and that keeps all critically important parts of the picture inside the frame in stereo. To view stereo, refer to the earlier section for viewing printed stereoviews. In brief: cross your eyes and view the screen until you think you see three images. The middle image will be in stereo. The power of computer graphics rests in the ability of the user to rotate images. In MAGE, this is done by dragging the mouse as indicated in Fig. 3.10. Dragging the mouse along the top of the screen rotates the image around the z or viewing axis. Moving the mouse up and down rotates the image around the horizontal or x axis. Dragging the mouse along the bottom of the screen rotates the image around the vertical or y axis.
44
PREKIN and MAGE
Fig. 3.10 Rotating kinemage with the mouse.
Slides on the right-hand side of the screen can be used to zoom back and forth, to change the thickness of the viewing slab, or to translate the image in the z direction. Boxes adjacent to the slides can be used to remove objects from the display. Simply put the pointer in the box and click. These boxes are toggles: to make the objects reappear, click on the boxes again. The center of the initial MAGE view is obtained by scaling the coordinates so that they fit into the viewing window. The scale can be changed by the zoom slider and the centermost atom can be changed. To recenter, turn on the Pickcenter box and use a mouse click to identify the atom that is to be the new center.
Adding Lines and/or Labels To add other vectors to the display, i.e., vectors not calculated by PREKIN, use the Draw New feature in MAGE. It is under the Edit menu, and provides four different tools for drawing new lines or labels. The dialog box allows you to shorten or lengthen the lines by some specified amount. For example, if you wanted to mark a certain hydrogen bond, you could draw a line between the appropriate atoms. The new line can be shortened by a specific amount so that it is not mistaken for a covalent bond. For ˚ . You can undo hydrogen bonds, Professor Richardson suggests shortening it by 0.7 A these lines or labels successively by clicking the Eraselast button, and any of the tools can be turned off temporarily by clicking the appropriate button. Once a set of lines and/or labels has been assembled, they can be written to a file by selecting File and choosing to write a new .kin file. Although the Label command adds the specified atom name to the display window, this is easily changed by editing the modified .kin file. It is also possible to add comments to that .kin file. Comments are appended by adding text enclosed in angle brackets (⬍ ⬎). Such comment statements must be limited to one line. There are many other utilities available to the PREKIN/ MAGE user. They are summarized in Appendix 2. Sometimes their application is obscure and the more sophisticated user is advised to obtain the entire program writeup from the authors (anonymous FTP source, suna.biochem.duke.edu).
Local Rotations MAGE also provides a method for showing symmetry-related pieces of structure that are often absent in the PDB file. This will become important when studying quaternary
3. Crystallographic Coordinates and Stereodrawings
45
structure in Chapter 5. It is possible to add the commands @localrotation (a 3 ⫻ 3 rotation matrix) and @localcenter (a coordinate of the rotation operation). These operators are applied to a second copy of the original coordinates. This is an alternative to calculating the new coordinates outside of the MAGE program. Note: If something else follows the rotated group, use @endlocalrotation to separate them. If @localcenter is used by itself, it acts only as a translation operation. Its scope is ended with @endlocalcenter. To add local symmetry operators, the .kin file is edited as follows. Make a copy of the vectors (atoms) to be rotated. Add the list after the original model and after the statement @localrotation ⫺1. 0. 0. 0. ⫺1. 0. 0. 0. 1. In this example, the local symmetry would involve a 180⬚ rotation around the z axis. It is probably best to change the color of the second subunit in the new @vectorlist statement. If other display objects are present after the to-be-rotated coordinates, add the statement @endlocalrotation.
Making Measurements with the MAGE Display To make distance or angle measurements, MAGE includes a point-and-click facility selected under Other in the menu bar. Once Measure is turned on, clicking on an atom will begin the process. If a second atom is chosen, the distance in angstroms between the first and second atoms will be reported in the message area at the bottom of the screen. Clicking on a third atom will list the distance between the second and third atoms and the angle between the vector connecting atom 1 to atom 2 and atom 2 to atom 3. Choosing a fourth atom provides the value for the dihedral angle between atoms 1, 2, and 3, and between atoms 2, 3, and 4. Remember, three points determine a plane and the angle between two planes is called a dihedral angle. It is possible, therefore, to pick successive points along the polypeptide backbone and see successively the values of , , and dihedral angles. These dihedral or torsional angles can be used to characterize secondary structure as discussed in Chapter 6. The Measure function draws white lines between the points it is currently using, and displays red dots that are the averages of the last two, the last three, and the last four points picked. These points are ‘‘pickable’’; that is, with the Recenter function on, clicking on a red dot will recenter the image to that point. The points can be used to draw lines; it is probably best to turn off the Measures button before using those points for another purpose. For example, a helix axis can be added to the display by invoking Measures and picking four C␣ atoms at one end of the helix, turning the Measures button ˚ ) and picking the fouroff and Drawline on (with a ‘‘shorten line’’ value of, say, ⫺2.5 A point average dot; then turn the Drawline button off and Measures back on and repeat the process at the other end of the helix. Finally, connect the two lines with the Drawline command. Such new lines may be written out to a file by saving the entire kinemage. The new lines that have been drawn can be erased, one at a time in reverse order, by clicking on Eraselast in the button panel. The Find function under the Other pulldown menu is useful for locating atoms to be measured. This tool lets you search for particular strings in the pointIDs. If you wanted to measure the distance between residues 10 and 25, for example, turn on the Measure tool and then use Find. First find residue 10, and then residue 25. The Find command works just like a mouse click.
Hardcopy MAGE has a function for writing PostScript files. However, PostScript printers are not always available. Instead, both Macs and IBM-type PCs have a snapshot facility. Find out how to make a snapshot of your screen with this facility. You can then edit out
46
Problems
portions of the screen with a program such as Simpletext (Mac). This allows you to remove the sliders and header menu. The resulting screen image can then be printed with most commonly available printers.
SUMMARY Crystallographic coordinates for protein molecules are most frequently found in PDB format. Each record contains a number of variables describing the atom; the chapter explains what each of these variables mean. Learn to use the Internet or the World Wide Web to obtain these coordinates. Those familiar with the use of such software may even find a shortcut for searching the database and obtaining the coordinates. Learn to extract the header records, using a text editor, and study these before proceeding to use the coordinates. To study protein structure you must learn to use stereodrawings. Practice with the simple drawings in this chapter until you can see them in three dimensions, easily and comfortably! It doesn’t matter whether you use stereoglasses, crossed eyes, or the walleye method; the important lesson is that you are able to visualize the protein in three dimensions. A few people have visual handicaps that will make this impossible. Such being the case, the computer graphics methods provide a suitable alternative. Next, learn to use a PC and the PDB coordinates to study the protein in question. This chapter contained an extensive description of the public domain software PREKIN/MAGE, developed by Drs. David and Jane Richardson. A few other public domain programs are available, but the author is not as familiar with their use or suitability.
PROBLEMS Learning about PDB Files 1. The PDB coordinates of horse deoxyhemoglobin in Table 3.1 contain some ATOMs with negative x, y, or z values. How can this be? 2. Using PDB standard ATOMNAMES, list the ATOMs present in the amino acids T, F, and M. 3. Calculate the length of a –C–C– bond, using the coordinates for W6:CB and CG given in Table 3.1. 4. Obtain a PDB file from the database and compare the temperature factor for the NE atom of a lysine with the CA atom of a glycine. Report the PDB code name of the protein, the residue number, and the temperature factors for both atoms.
Practicing Stereovision 5. In an ␣ helix, the C=O bonds all point in the same direction. Does the tetrapeptide MKVA shown in Fig. 3.2 belong to an ␣ helix? 6. Show schematically the stereochemical arrangement about the C of isoleucine (Fig. 3.3). 7. Could the two residues ⫺R⫺F⫺ in Fig. 3.5 be part of an ␣ helix? 8. Serine side chains in proteins are often oriented so that a hydrogen bond forms between the hydroxyl and the previous carbonyl oxygen. Is that happening in Fig. 3.7?
3. Crystallographic Coordinates and Stereodrawings
47
Using Computer Graphics The first step in learning to use the programs described in this chapter is to create some computer-generated kinemage files. Because most PCs have limited power it is easiest to begin with a very small protein. Obtain the file 2ETI from the PDB; in this way you will learn to use FTP. 9. Edit the PDB file and write out the amino acid sequence, using the oneletter code. 10. Using PREKIN, create an ␣-carbon model of 2ETI. This is more like a peptide than a protein, but it makes learning to use the programs less painful. 11. Using MAGE and your .kin file, identify the disulfide bonds in 2ETI, a trypsin inhibitor from squash. Identify the disulfide bonds by clicking on the mates and recording the answers. 12. Use the Measure facility in MAGE to measure and record the distance between the C␣ atoms of the cysteines linked by disulfide bonds. 13. Create another model that contains the main-chain atoms and hydrogen bonds between them. Identify the main-chain atoms that are hydrogen bonded to each other. 14. Identify the one bifurcated hydrogen bond and the participating atoms. 15. Create another model that includes all of the C␣ atoms and the acidic and basic side chains. The charged side chains of one pair of atoms (acidic/basic) are relatively close and form an ion pair. Study the model and identify this pair. 16. Extra credit! Edit the .kin file that contains the acidic and basic residues so that the acidic side chains are red and the basic side chains are blue. Turn in to the instructor the page of the .kin file you changed, noting which statements were altered.
REFERENCES McKeon, T., and Gaffield, W. (1990). Viewing stereopictures in three dimensions with naked eyes. Trends Biochem. Sci. 15, 412–413. Richardson, D., and Richardson, J. (1992). The kinemage: A tool for scientific communication. Protein Sci. 1, 3–9. Sands, D. E. (1982). ‘‘Vectors and Tensors in Crystallography.’’ Addison-Wesley, Reading, Massachusetts.
APPENDIX 1: KINEMAGE COLOR PALETTE (DCR) MAGE has provision for 50 nameable colors in a kinemage, each having 5 different values for depth-cuing. Twenty-five of the colors are the basic set used on black backgrounds, while the other 25 are approximate equivalents on a white background. The 25 basic colors are depth-cued by intensity, that is, they darken to merge with the background as they get farther away. The white-background colors are depth-cued by saturation, so that they become whiter, or ‘‘foggier,’’ as they get farther away. This means that the starting color values, when at the front, need to be somewhat different for the two cases; the extreme example is that the default white becomes black when on a white background. The color names are chosen to fit the black-background colors, since that is the option strongly preferred for molecules (except perhaps in extremely highglare situations). White background is essential for screen capture for black-and-white printing, and is also good for 3-D plots where you don’t want to lose information at the very back (see file Atkins.kin in Protein Science 1 3). There are 20 or basic colors that can be defined in MAGE. Twelve form a spectral color wheel: red, orange, gold, yellow, green, sea, cyan, sky, blue, purple, magenta, and hotpink. There are four low-saturation pastels: bluetint, greentint, pinktint, and yellowtint. Pink, sea, and sky function as midsaturation colors. The other three colors are white, gray, and brown. By changing
48
Appendix 2: MAGE Keywords and Parameters (DCR)
the Color⫽ statement in any .kin file, you can produce better images that will help you analyze the problem at hand.
APPENDIX 2: MAGE KEYWORDS AND PARAMETERS (DCR) List of keywords for MAGE4 @text @kinemage i @caption
What follows is put in text window, until @kinemage or EOF Starts a new kinemage; numbers should be unique and monotonic What follows goes in caption window, until another keyword
@onewidth @thinline @perspective @whitebkg @compare @multibin i @keepstereo @stereoangle f @keepthinline @keepperspective @noscale @plotonly
Makes all lines 2 pixels wide (if omitted, width depends on z) Makes all lines 1 pixel wide Replaces normal orthographic projection White background, not black (uses alternate colors) Makes side-by-side comparison of animate groups Improves fineness of hidden-line removal Invokes stereo, for rest of session (can be turned off manually) Change stereo angle (default is 6 degrees, walleye stereo) Invokes thin-line, for rest of session (speeds rotation, especialy on PCs) Invokes perspective, for rest of session Initial scaling and centering will not be done (watch out!) Disables rotation, so stays in 2D
@zoom f @2zoom f 앗 @9zoom f @zslab i @2zslab i 앗 @9zslab i @center f f f @2center f f f 앗 @9center f f f @matrix f f f f f f f f f @2matrix f f f f f f f f f 앗 @9matrix f f f f f f f f f
Scaling for startup view (1.0 nearly fills window) Scaling for View2
@plotonly (before all points) @plotonly (after all points) @localrotation f f f f f f f f f @endlocalrotation @localcenter f f f @endlocalcenter
Gives 2D plot with no rotation, no point limit Gives no-erase ‘‘kaleidoscope’’ image; rotation smears Rotation matrix applied just to part of file Ends part to be rotated, or rotated and centered, if more follow in kin With rotation: vector to axis, ⫺ before and ⫹ after rotation; alone: just ⫹ Ends part to be translated, if Localcenter used without rotation
@group { } @subgroup { } @vectorlist { } { } P x, y, z { } L x, y, z @dotlist { } { } x, y, z @labellist { } { } x, y, z
High-level display object (button name in {}, up to 11 characters) Midlevel display object (button name in {}, up to about 10 characters) List of lines, low-level (button name in {}, up to about 9 characters) Individual vectorlist entry (pointID in {}, shown on pick) List of dots Individual dotlist entry (pointID in {}, shown when point picked) List of labels Individual labellist entry (label in {} written on screen at that point)
zslab (depth of window in and out of screen) in startup view zslab for View2
x, y, and z of center of rotation and placement (in original coordinates) Center for View2 3 ⫻ 3 orientation matrix for startup view 3 ⫻ 3 orientation matrix for View2
i, Integer; f, floating point number.
4
49
3. Crystallographic Coordinates and Stereodrawings
List of (Optional) Parameters That Modify Keywords for MAGE These parameters can be used in any order, between keyword and line end, for groups, subgroups, and lists. color⫽ c (only for lists) master⫽ { } off dominant nobutton animate 2animate moview⫽ i instance⫽ { } < >
c value is color name Master button will control all objects with same {} in master parameter Object will start out turned off No buttons for objects below it in heirarchy (for groups or subgroups) No button for this object (especially for groups) Group in an animation series (only for groups; button shows *) Group in second animated series (only for groups; button shows %) use view i (only for animate groups) Repeats the list, subgroup, or group that had unique exact name in { } Ignored comment (also OK on coordinate line, before a point triple)
C H A P T E R
4 Properties of Biomacromolecules in the Crystalline State INTRODUCTION efore going on to study protein structures and their biochemical implications, it is necessary to discuss the relationship between crystal and solution conformations. Nearly everyone not directly involved in X-ray crystallographic studies is troubled by doubts about the meaning of crystal conformation in terms of solution structure and function. The troublesome question is, are they the same? The answer, as was alluded to in Chapter 1, is yes. To the first approximation, most if not all globular proteins are believed to have the same structure in these two physical states. For a long time there was, of course, no direct experimental way of proving this hypothesis. The use of nuclear magnetic resonance (NMR) has changed that dramatically. The structures of a number of small proteins are known from NMR studies. Furthermore, there are a few examples of small protein structures that were derived from X-ray crystallography and from NMR studies in solution. Later in this chapter, an example of a comparison between X-ray and NMR structures is described. The NMR studies are by no means the first or the only evidence that address the solution-versuscrystal dilemma. Before the start of NMR analyses, there was already a large body of indirect evidence suggesting a close similarity between the crystal and solution conformations of proteins. A second but equally relevant question regarding crystal-versus-solution structure is as follows: Do subtle differences occur between these two states because of crystal packing forces? The answer in some instances is probably yes. One of the most powerful and convincing observations suggesting that packing forces can generally be ignored comes from crystallographic studies. In a few instances, X-ray structures of the same protein crystallized under dramatically different ionic conditions have been determined. These are described later in this chapter. Crystal packing effects can also be evaluated when multiple copies of the same protein or subunit are contained in a single asymmetric unit. Under these circumstances, the individual protein units are experiencing different lattice forces.
B
PROTEIN CRYSTALS On average, a protein crystal is 50% protein and 50% aqueous solvent, by volume. The concentration of protein is high, but not higher than the protein concentration in some
50
51
4. Properties of Biomacromolecules in the Crystalline State
cellular organelles, such as the inner membrane matrix of mitochondria. Under certain metabolic conditions, the concentration of protein within the matrix can reach nearly the same levels as in a protein crystal. The fraction of protein in a crystal, x, can be calculated by measuring the density of the crystal, dxtl, and the mother liquor, dml. This is usually done in density gradients of organic solvents such as xylene, bromobenzene, etc. Mother liquor is the term given to the mixture of buffer, precipitant, and protein from which the crystal was grown. The density relationship is dxtl ⫽ xdp ⫹ (1 ⫺ x)dml
(4.1)
The density of the protein, dp, can be obtained from the partial specific volume or the amino acid composition (Creighton, 1984). Most proteins have a partial specific volume in the range of 0.71 to 0.75 ml/g. The solvent inside of the crystal is readily exchangeable with the surrounding solvent. This is a corollary of the fact that most small chemical compounds can diffuse into the lattice with little or no difficulty. The interstices between protein molecules in the crystal lattice are large enough that even relatively large compounds can diffuse in readily. Nicotinamide adenine dinucleotide (NAD; MW ⬃800) diffuses into crystalline dehydrogenases rapidly and easily. As is shown below, crystals of enzymes carry out catalysis efficiently, provided the substrates are small enough to diffuse into the lattice. What about lattice interactions distorting the conformation? Interprotein contact in the crystal lattice involves only a relatively small percentage of the total available molecular surface area. Most of the protein surface interacts with the aqueous buffer. Typically, the solvent-accessible surface area is reduced in an oligomer by values ranging from 10 to 40% (Janin et al., 1988). Crystal contacts involve much smaller contact areas than do subunit–subunit interfaces. If one accepts the fact that the formation of an oligomer does not result in any major conformational change in the monomer, crystal contacts can be expected to have even less effect. One is forced to conclude that crystallization has less of an effect on conformation than does the formation of a homodimer. This fact, along with the high solvent content, suggests that protein crystals are ordered concentrated solutions. Methods used to calculate the accessible surface area are described in Chapter 5. Is there a way to test the effects of crystal packing forces? Again an indirect answer can be found in the literature. There are several examples of crystals prepared under dramatically different ionic conditions producing more than one crystal packing motif. It would seem that if protein structures were conformationally sensitive, the structure in one lattice would be different from that in another. This is not observed. Whenever structures have been determined for different crystalline polymorphs, the resultant protein conformation is essentially the same. In a sense, the same argument applies when multimeric proteins have been crystallized and their structure determined. As so often happens, identical subunits of the oligomer find themselves in different crystal environments. In other words, each subunit is subjected to different packing forces. Every such case studied to date indicates the same overall conformation for each of the subunits, although small structural differences are sometimes observed. The conformation of a protein does not seem to depend on crystal packing arrangements or the ionic conditions within the crystal.
PHYSICAL PROPERTIES Some rather obvious experimental checks on the relationship of solution and crystal structure should also include the determination of the protein size. Calculating the radius of gyration, RG, of a protein from the crystal coordinates can be done readily. For a macromolecule with n atoms:
52
Chemistry of Crystalline Proteins
n n XCG ⫽ ⌺ mj Xj / ⌺ mj etc., for YCG and ZCG j⫽1 j⫽1 RG ⫽ (RG2)1/2 n n RG2 ⫽ ⌺ mj rj / ⌺ mj j⫽1 j⫽1
(4.2) (4.3) (4.4)
In Eqs. (4.2)–(4.4), mj is the atomic weight of atom j with coordinates Xj, Yj, Zj. Because globular proteins are not spherical objects, the radius of gyration RG is the atomic weight-biased root mean square (rms) distance of all atoms from the common center of gravity, XCG, YCG, ZCG. Incidentally, the process of superimposing two homologous structures using their coordinates begins with moving their centers of gravity to the same position. This is discussed in more detail in Chapter 8. While it is relatively easy to calculate RG from a set of protein coordinates, it is more difficult to obtain an experimental value in solution. It can be obtained from small angle X-ray scattering (SAXS) or hydrodynamic measurements. Hemoglobin (Hb) is once again a useful example. The RG of Hb determined by SAXS experiments was 30 ˚ . Hydrodynamic measurements yield a value of 31 A ˚ . Finally, RG calculated from the A ˚ X-ray coordinates is 28 A. Overall dimensions obtained from crystallographic studies ˚ . Similar agreement was found for lyzozyme and are described as 64 ⫻ 55 ⫻ 50 A myoglobin (Mb). Size similarity may not be the most sensitive way of looking for conformational differences. However, the RG can be measured accurately and it is known that relatively small conformational changes can be seen by SAXS studies. To date, good agreement is found between these types of solution studies and size measurements based on the crystal structure.
CHEMISTRY OF CRYSTALLINE PROTEINS To further compare crystal and solution characteristics of proteins, one must examine properties measurable in both states. Absorption spectra can provide a sensitive tool for such comparisons. Crystalline proteins containing bound chromophores give absorption spectra similar to that found in solution. Because this spectrum is nearly always sensitive to the conformation of the protein, the similarity in spectra implies similarity in conformation. The best example of this phenomena is Hb. Oxy-, deoxy-, and met-Hb each have the same characteristic spectrum in solution and the crystalline state. Met-Hb has the iron in the ferric state and does not bind oxygen. The transition from deoxy- to oxy-Hb occurs with a dramatic spectral and quaternary conformational change. This was observed by studies of crystals of the two forms. Predictably, when crystals of deoxy-Hb are oxygenated, the crystals shatter. Just as expected, the lattice is unable to accommodate the dramatic quaternary conformational change. However, the overall absorption spectra of the different forms of hemoglobin are the same in solution and the crystalline state. It is of further interest to note that there is evidence even of dynamic properties of proteins in the crystalline state. It is a characteristic that again reinforces the similarities between the crystalline and solution states. After some early studies of Mb and Hb were completed, several derivative forms indicated binding sites that were in buried locations. For example, the azide ion binds to metmyoglobin and methemoglobin in much the same way as O2 binds to the ferroproteins. In the crystal structure, no unobstructed channel connecting the binding site and the solvent was apparent. But azide can be made to bind equally well to the met form of the heme oxygen-binding protein both in solution and in the crystalline state. The binding process requires temporal conforma-
4. Properties of Biomacromolecules in the Crystalline State
53
tional changes for the ligand to reach the binding site. Such conformational ‘‘breathing’’ is possible in both the crystalline and solution states, as described in an article by Gurd and Rothgeb (1979). Not only is the conformation the same in the crystalline state but motional properties are similar in both states. There is no need to believe that protein molecules in crystals are in a totally rigid state!
CHEMICAL REACTIVITY Many chemical modification reactions of proteins are known. They almost always depend on the steric availability of an amino acid to the modifying reagent. If an amino acid side chain is buried in the center of a protein, it is chemically unreactive. The most common target of modification reactions involves the –SH side chains of cysteine residues. Many proteins have been studied to determine their –SH reactivity in solution. For proteins containing multiple cysteine residues, a distribution of reactivity is often found. Some cysteinyl side chains are reactive; others are not. Most often the solution pattern of reactivity agrees with the crystal structure. Modification studies of this sort are not limited to cysteine. Chemical reagents are available to modify lysine, arginine, histidine, and the carboxyls of glutamic and aspartic acid. If the crystal structure predicts a side chain to be sterically unreactive, chemical studies have generally confirmed this. It is good to keep in mind that correlations of this type are valid only when the modification does not itself lead to partial or total unfolding of the protein. In general, patterns of chemical reactivity in solution correlate with crystal structure, and this is additional evidence that the crystal and solution structures are the same. There is one additional form of chemical reactivity that supports the similarity between crystal and solution structures. A number of amino acid side chains are ionizable and therefore have a characteristic proton dissociation constant. The negative log of this dissociation constant, or pK, of ionizable side chains is dependent on the amino acid type: carboxylate, imidazole, etc. Typically, aspartate (D) and glutamate (E) would have a pK in the range of pH 4–5; the pK of histidine (H) would be about pH 6.5, that of the –SH of cysteine (C) would be near pH 8.5, tyrosine (Y) and isoleucine (K) pKs would be about pH 10, and the pK of arginine (R) would be more than pH 12. In solution, specific pKs can be measured by NMR methods or, in the case of tyrosine, spectrophotometrically. The pK of each side chain in a protein varies somewhat because it is also affected by its stereochemical location, especially with respect to nearby charged residues. For example, a histidine side chain located near a positively charged arginine residue would ionize more easily; it would have a lower pK. In spite of the effect of neighboring charges, ionizable side chains on the surface of a globular protein usually have a pK in a relatively narrow range. One would expect that a buried ionizable side chain would have an aberrant dissociation constant. If not readily accessible to water, it predictably will not ionize with a normal pK. For example, sperm whale myoglobin has three tyrosines. In the three-dimensional structure, the three tyrosines are in quite different locations, as can be seen in Fig. 4.1. (Note: At this point the reader must take stereodrawings seriously; go back to Chapter 3 if Fig. 4.1 cannot be seen in stereo!) Note that one of the three tyrosine side chains is pointing into the core of the molecule. In the case of sperm whale myoglobin, the chemical observations agree qualitatively with the crystallographic structure. A pH titration of the tyrosines suggests that one has a normal pK, one has a slightly elevated pK, and one does not ionize in the experimentally accessible pH range. Although it is impossible to tell from spectrophotometric titration curves which tyrosine is buried, the chemical results in solution qualitatively agree with the crystal structure. Careful examination of the location of each of
54
Enzymatic and Biological Activity
Fig. 4.1 Tyrosines in Mb. This stereodiagram shows the C␣ atoms of sperm whale myoglobin, along with all of the atoms for the three residues Y103, Y146, and Y151. Every tenth C␣ position is numbered, beginning at the NH2 terminus. One of the tyrosines does not titrate in its characteristic pH range.
the tyrosines as shown in stereo in Fig. 4.1 should suggest whether the side chain with the abnormal pK is Y146, Y151, or Y103. The ionization properties may be so severely affected by internalization of the side chain that hydrogen ion equilibrium occurs only after denaturation or ‘‘unfolding.’’ Several of the histidine side chains in myoglobin behave in this manner. The internalized histidine becomes protonated only when myoglobin denatures or unfolds below pH 4.5. In summary, where ionization properties have been measured, there is a predictable pattern of normal and anomalous ionization constants based on the crystal structure.
ENZYMATIC AND BIOLOGICAL ACTIVITY One of the most convincing arguments for the equivalence of crystal and solution structures is biological activity. Because this activity can vary from simple processes such as ligand binding to complex enzymatic activities, it is a stringent test of the identity of crystal and solution structures. To summarize: with insignificant exceptions, every crystalline protein can be made to express its biological activity in the crystalline state. Indeed, our knowledge of atomic-level details of biological activity most often starts with a crystal or NMR structure. The exceptions are related to stereochemical problems associated with the crystal lattice. For example, it is unrealistic to expect a crystalline proteolytic enzyme to carry out hydrolysis of a large protein. The protein substrate is likely to be too large to be able to diffuse through the protease lattice to the active sites. Other cases in which ligand binding does not occur in the crystalline state of the apoprotein are also found, but frequently cocrystallization is still possible. Such is the case for repressor proteins. The ligand, a piece of DNA, is large and crystal packing leaves no space for the compound to bind. Crystals have been prepared of the DNA– protein complex. Similarly, an antibody to lysozyme could not be expected to diffuse into lysozyme crystals. However, cocrystallization is possible and also has been accomplished. Perhaps more subtle is a second aspect of crystal lattices. If the active site of a crystalline protein is sterically inaccessible by virtue of lattice interactions, the protein will appear to have no catalytic activity. This can occur even if the substrate is a small molecule and if the active site of the protein is not adjacent to a solvent channel but, rather, abuts another molecule in the crystal lattice. This has been observed for at least
55
4. Properties of Biomacromolecules in the Crystalline State
TABLE 4.1 Kinetic Constants for Muscle Aldolase
Enzyme
Vmax (mM/min/mg)
Km (glyceraldehyde 3-phosphate) (mM)
Soluble aldolase ⫺ (NH4)2SO4 ⫹ (NH4)2SO4 Crystal
9.85 Ⳳ 0.09 8.72 Ⳳ 0.09 4.42 Ⳳ 0.05
10.8 Ⳳ 0.6 11.3 Ⳳ 0.9 15.2 Ⳳ 1.2
one crystalline proteolytic enzyme: crystallized ␣-chymotrypsin behaves in this way because the active sites of adjacent molecules pack against each other in the lattice. With the exceptions noted above, numerous enzymes have been shown to be active in the crystalline state, and by a variety of experiments it can be shown that this is not the result of dissolution/recrystallization of enzyme molecules. Studies of crystalline aldolase provide a typical example of catalytic activity in a crystalline enzyme. Sygusch and Beaudry (1984) determined Michaelis constants for aldolase in the crystalline state. A few of the results are shown in Table 4.1. From Table 4.1 it appears that the Km of the crystalline enzyme is only slightly elevated relative to the Km obtained in solution. However, the Vmax is nearly halved. Apparently, the crystalline packing reduces the rate of release of product. This in turn could be due to transient conformational changes. Such changes not unexpectedly may be slower in a lattice then in solution. In summary, much of the structural information we have about protein–ligand interactions is derived from crystallographic studies. In the crystalline state, enzymes bind substrates and catalyze biochemical reactions; antibodies can bind antigens; lectins bind saccharides; transport proteins bind their ligands; repressor proteins bind their cognizant DNA fragments; Hb binds O2. The list goes on and on and provides convincing evidence of the similarity of protein conformation in the crystalline and solution states.
CRYSTAL VERSUS SOLUTION NMR STUDIES In the 1980s, improved resolution through superconducting magnets and new pulsing methods led to resolved proton resonances from proteins in solution (Wright, 1989). Coupled with methods for assigning resonances, distance geometry calculations succeeded in describing the first protein structures in solution. Although limited presently to proteins 100–250 amino acids in length, Protein Data Bank (PDB) coordinates from NMR studies are now available for many globular proteins. The NMR method differs in every respect from X-ray crystallographic methods (see Chapter 2). X-Ray crystallography measures the positions of atoms directly. NMR methods measure the distance between atoms. Furthermore, all NMR data are collected from the solution form of the protein. It offers, therefore, the ultimate check on X-ray crystallographic structures. Figure 4.2A and B contain the ␣-carbon chain tracings of thioredoxin derived by X-ray crystallography (Fig. 4.2A; Katti et al., 1990) and by 1H NMR studies (Fig. 4.2B; Dyson et al., 1990). Even without the help of stereo, it is apparent that the two structures are essentially identical. Nevertheless, compare both conformations with stereoglasses and your previous stereoviewing experience. Begin by checking residues 1 through 21 in both the X-ray and NMR structures. A strand of  structure is followed by an ␣ helix beginning at residue 9. Note the start of the helix in both structures is precisely at the same point. Only residue 1 at the NH2 terminus appears to be in a slightly different
56
Crystallographic Temperature Factors
Fig. 4.2 The crystal and solution structures of thioredoxin. These stereodiagrams show the C␣ atoms of thioredoxin derived from Escherichia coli. (A) Conformation as derived by X-ray crystallography. (B) Conformation obtained using 1H NMR measurements as described in text and by Wright (1989) and Dyson et al. (1990). Every tenth C␣ position is numbered, beginning at the NH2 terminal.
position. The second ␣ helix begins at about residue 34 in both the NMR and X-ray structures. Try to compare the conformations between the first and second ␣ helices. In fact, by glancing back and forth between the two stereoimages, it should be possible to make a relatively careful comparison of the entire two structures. Although Fig. 4.2A and B contains only C␣ atoms, it should be clear that the structures are nearly identical. The only difference is the source of the coordinates. The X-ray structure has been derived from the thioredoxin in the crystalline state; the NMR structure from distance measurements derived from the 1H resonances in solution. The conclusion to be drawn from the comparison should be clear: the crystal structure of a protein is identical to the solution structure. Once again the statement made in Chapter 1 is worth repeating: In a few exceptional regions, small conformational differences may exist between a structure in solution versus the same structure in the solid state. As noted above, these differences can only be due to the presence of lattice interactions or to the different ionic conditions necessary to form crystals.
CRYSTALLOGRAPHIC TEMPERATURE FACTORS In Chapter 2, it was noted that the structure factor equation includes a modulating term for the scattering factor for each atom called the temperature factor or B value. Recall
4. Properties of Biomacromolecules in the Crystalline State
57
Fig. 4.3 Mean temperature factors for adipocyte lipid-binding protein. The mean temperature factors are derived from the crystal structure of adipocyte lipid-binding protein (1ALB). The significance of side-chain and backbone values are described in text.
that the atomic scattering factor is a function of the element associated with the designated atom. The PDB file describing a refined crystal structure contains a B value for every atom in the protein, including bound water molecules. These can be informative even to the casual student of protein structure. Frequently, an investigator reporting a crystal structure will include a figure describing the temperature factors. Instead of plotting the temperature factor for each atom, averaged values are used. Frequently two plots are used. One is a plot of residue number versus the averaged B value for the mainchain atoms (CA, C, N, O). The same thing is done for all of the side-chain atoms. Separating main-chain from side-chain atoms is important because some side chains— the long ones that point away from the molecular surface—tend to have high temperature factors. For example, most lysines have high temperature factors for atoms in the side chain. An example of the temperature factors for a relatively small protein, adipocyte lipid-binding protein, is shown in Fig. 4.3. In most crystalline proteins, both the sidechain atoms and main-chain atoms near the COOH and NH2 termini tend to have high temperature factors. Hence the results shown in Fig. 4.3 are typical of terminal residues in many proteins. And, not infrequently, one or more residues on the termini are ‘‘missing’’ in terms of electron density. The implications of missing atoms are serious—more so than those of atoms with high temperature factors. When no electron density is visible and yet atoms are known to be present on the basis of chemical analyses, the crystallographer will say they are disordered, meaning that they occupy multiple positions in the crystal lattice. One other phenomenon is noteworthy. Noncovalently bound ligands may sometimes appear to have high B values. This may be due in part to the fact that occupancy is not complete. Some of the protein molecules in the lattice do not have a bound ligand, and this may show up as an elevated temperature factor. Regions of backbone atoms with the highest mean temperature factors can be visualized from the plot given in Fig. 4.3. Such regions appear as crests both for the side-chain and backbone atoms and four such zones are illustrated in Fig. 4.4. Roughly, residues 55–60, 76–79, 86–88, and 96–99 have the highest temperature factors for backbone atoms in the adipocyte lipid-binding protein. Most often, if the mean backbone temperature factor for a residue is high, the mean side-chain temperature factor is also high. The converse is not as likely. Many side-chain temperature factors can be elevated while those of the main-chain atoms appear normal. To define where these places occur in the overall conformation, Figure 4.4 should be studied carefully. The more mobile regions of the structure are marked by solid lines and arrows, whereas the rest of C␣ model appears as dashed lines. Note that all four places are associated with tight turns on the surface of the molecule. In all but one of
58
Structural Heterogeneity in Protein Crystals
Fig. 4.4 Adipocyte lipid-binding protein and regions of high-temperature factors. This stereodiagram illustrates the C␣ model of adipocyte lipid-binding protein as dashed lines and circles. Every tenth residue is numbered. Regions of the polypeptide chain with high-temperature factors are marked by lines with arrowheads. These regions include positions 55–60 (STFKNT), positions 76–79 (DDRK), positions 86–88 (LDG), and positions 96–100 (KWKDG).
the turns, the highest mean temperature factors are associated with the very end or tip of the turn. For side-chain atoms, the highest temperature factors tend to be associated with surface polar residues. If Fig. 4.3 is examined, the side chains with the highest mobilities are K58, D76, and D87. Lysine side chains are commonly associated with high temperature factors. Again, the worst situation would be if the CE carbon and NE nitrogen of lysine were not even visible in the electron density map (as described above). Such an occurrence would mean that the lysine side chains are disordered in the crystal lattice, or perhaps that the temperature factor is so high that the associated electron density is distributed throughout a large volume, appearing at nearly the same level as solvent in the electron density map. Temperature factors contribute some dynamic information to otherwise static crystal models. Investigators in biochemistry have tried to correlate a number of properties of proteins with crystallographic temperature factors. For example, antigenicity and epitopic segments of a protein were postulated to be related to polypeptide segments with elevated temperature factors. In a similar way, transient conformational changes accompanying catalysis may be related to regions of a protein with high temperature factors. For example, such ‘‘loose’’ conformational regions could be used by an enzyme for reorienting atoms to facilitate electronic changes occurring during a catalytic cycle. Although no definitive study has been done, the existence of more conformationally mobile segments of a crystalline protein is borne out by repeated observations such as those exemplified above. A few contiguous residues in the crystalline protein have high temperature factors and this can occur at several places in the overall three-dimensional structure.
STRUCTURAL HETEROGENEITY IN PROTEIN CRYSTALS The fact that whole segments of a polypeptide chain may be missing in the electron density map of a crystalline protein has been mentioned several times. A missing segment occupies numerous positions in the crystal lattice, with the missing portion drifting between these multiple positions; the segment behaves like solvent. There is a second form of disorder that is discrete (Smith et al., 1986): several atoms of a side chain may occupy a limited number of positions. Such heterogeneity is generally ignored by most
59
4. Properties of Biomacromolecules in the Crystalline State
crystallographers during refinement and the heterogeneity is compensated by an increase in the temperature factors for the atoms involved. It is ignored because the problem is difficult to identify and makes further refinement more complicated. Nonetheless, in the crystal structures of four different proteins, from 6 to 13% of the amino acid side chains were observed to be in multiple discrete positions (Smith et al., 1986). Such potential heterogeneity should be considered when comparing crystal and solution structures.
SUMMARY The solvent content of protein crystals makes them similar to concentrated solutions. Various physical and chemical methods suggest strongly that the conformation of most proteins is the same in the crystalline and solution states. The most sensitive way to compare crystal and solution protein structures is to monitor their biological activities, and compare structures that have been derived from NMR measurements. With a few exceptions, crystalline proteins maintain their biological activity. The structures of a growing number of proteins derived from NMR measurements have confirmed the near identity between solution and crystal structure.
PROBLEMS Practicing Stereovision 1. In Fig. 4.1, the overall conformation of Mb is shown with its three tyrosines. On the basis of this crystal structure, which tyrosine can be predicted to have an anomalously high pK? 2. Mb has eight ␣ helices, A through H. The heme prosthetic group is located between helix E and helix F. By assigning letters to the helices shown in the stereodiagram in Fig. 4.1, number the residues that are included in these two helices. The lettering should begin with the NH2 terminus. Caution: Two ␣ helices near the NH2 terminus are short; in fact, one contains only one turn.
Learning about PDB Files 3. The following data have been extracted from the PDB file containing the Xray coordinates for thioredoxin, the structure shown in Fig. 4.2A. Calculate the mean temperature factors for the main-chain and side-chain atoms in these two structures. Study the location of these amino acids in Fig. 4.2A and B. Are they in notably different conformations in the X-ray versus the NMR structures? In terms of overall location and secondary structure, how do these high-temperature factor regions compare with those shown for the adipocyte lipid-binding protein? PDB coordinates from the X-ray structure of thioredoxin: ATOM 224 N ALA 29 30.044 36.84 10.182 ATOM 225 CA ALA 29 29.216 37.76 11.003 ATOM 226 C ALA 29 28.634 36.90 12.120 ATOM 227 O ALA 29 29.167 35.86 12.515 ATOM 224 CB ALA 29 30.001 38.90 11.588 ATOM 228 N GLU 30 27.483 37.38 12.587 ATOM 234 CA GLU 30 26.750 36.60 13.580 ATOM 235 C GLU 30 27.458 36.46 14.926 ATOM 236 O GLU 30 27.290 35.42 15.588
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
16.648 18.797 20.471 19.750 17.903 23.831 27.431 25.880 26.009
60
References
ATOM ATOM ATOM ATOM ATOM
229 230 231 232 233
OE2 OE1 CD CG CB
GLU GLU GLU GLU GLU
30 30 30 30 30
22.960 22.199 23.058 24.295 25.327
35.95 37.25 36.49 36.06 37.16
11.594 13.230 12.728 13.449 13.786
1.00 1.00 1.00 1.00 1.00
44.119 45.043 42.356 37.845 32.091
Using Computer Graphics Obtain the PDB coordinates of whale myoglobin—1MBN. This is an oxygen-binding protein that exists as a monomer. It is structurally homologous with the ␣ and  chains of hemoglobin. 4. Create an ␣-carbon model of myoglobin with any heteroatoms, using PREKIN. As mentioned in problem 2, myoglobin has eight ␣-helical segments that the original investigators labelled A through H. Identify the amino acid sequence numbers that mark the beginning and end of each of the helical segments from your computer graphics images. 5. Back to PREKIN: create a new view of the myoglobin molecule containing the ␣ carbons, the heme, and all of the histidine side chains. Which histidine appears to interact with the heme iron atom? 6. Measure the distance from the closest atom in each of the histidines to the heme iron. Which atoms found in the histidines interact with the iron atom of heme? What is the distance for the semicovalent bond linking the protein to the iron atom? 7. Bound to the heme iron is a water molecule. Determine if this water is hydrogen bonded to a protein atom. If it is, identify the amino acid and the atom involved in the hydrogen bond.
REFERENCES Creighton, T. (1984). ‘‘Proteins, Structures and Molecular Properties.’’ Freeman & Company, New York. Dyson, H., Gippert, G., Case, D., Holmgren, A., and Wright, P. (1990). Three-dimensional solution structure of the reduced form E. coli thioredoxin determined by nuclear magnetic resonance spectroscopy. Biochemistry 29, 4129–4136. Gurd, F. and Rothgeb, T. (1979). Motions in proteins. Adv. Prot. Chem. 33, 73–165. Janin, J., Miller, S., and Chothia, C. (1988). Surface, subunit interfaces and interior of oligomeric proteins. J. Mol. Biol. 204, 155–164. Katti, S., LeMaster, D. M., and Eklund, H. (1990). Crystal structure of thioredoxin from E. coli ˚ resolution. J. Mol. Biol. 212, 167–184. at 1.68 A Smith, J., Hendrickson, W., Honzatko, R., and Sheriff, S. (1986). Structural heterogeneity in protein crystals. Biochemistry 25, 5018–5027. Sygusch, J., and Beaudry, D. (1984). Catalytic activity of rabbit skeletal muscle aldolase in the crystalline state. J. Biol. Chem. 259, 10222–10227. Wright, P. E. (1989). What can two-dimensional NMR tell us about proteins? Trends Biochem. Sci. 14, 225–229.
C H A P T E R
5 Quaternary Structure of Proteins INTRODUCTION emoglobin (Hb) was more than just the first protein whose structure was determined by crystallography—it was the first oligomeric protein. So, in addition to the information that described the conformation of the ␣ and  chains, the arrangement of subunits in a single Hb tetramer was also determined. From these early results and those of other researchers working on spherical viruses, a set of rules governing the assembly of subunits into limited aggregates evolved. The rules center around symmetry arguments, and make it possible to predict, or at least set some usable principles to describe, the quaternary structure of any one oligomeric protein. Such principles form the basis of any discussion of quaternary structure. They are described as follows:
H
1. The interactions that govern the association of subunits into higher aggregates are the same as those that cause proteins to fold. Entropic factors are important, as are all of the noncovalent bonding forces. 2. In oligomeric globular proteins, these thermodynamic driving forces result in the association of subunits in a constant stereochemical sense to form one unique molecule. The phrase constant stereochemical sense derives from specific noncovalent bonds that are formed to minimize the free energy of the system. This implies a form of molecular recognition in the same sense that a substrate recognizes the active site of an enzyme. 3. Association of polypeptide chains takes two forms: (a) closed association to form an oligomer and (b) continuous association to form a polymer, for example, F-actin.
ASSOCIATION OF PROTEIN SUBUNITS Figure 5.1 shows three examples of a protein protomer aggregating to form a hypothetical oligomer. The three forms are different because of the types of contacts that are formed. Each type of contact represents a different set of noncovalent bonds between the subunits. Note that in Fig. 5.1A, an oligomer forms with contacts between points u and w on each protomer. Association continues with the u–w contact beyond the dimer level; however, this form of association stops with the formation of a trimer, because steric limitations prevent the addition of a fourth subunit with u–w contacts. Although there is complementarity between the surfaces of the subunits in contact and
61
62
Helical or Continuous Protein Polymers
Fig. 5.1 Aggregation of protein monomers. Each shoe-shaped object represents a protein subunit of the same amino acid sequence and conformation. Four different surface locations are marked on each subunit by the letters u, v, w, and x. The subunits associate to form oligomers. If they do so in a nonspecific manner, the results are shown in (A) and (B). Note in (A) that subunit –subunit contacts are u to w, and association occurs until three subunits form the aggregate. In (B), subunit contact occurs via the u and v surface regions. Although not shown, continuation of the u-to-v contact would lead to a long helical polymer as described in Fig. 5.2. (C) shows the symmetrical association of the subunits to form two u-to-x contacts. The elongated dot in the center of the dimer denotes the location of the dimer dyad as described in text.
all subunit–subunit interactions are the same, the resulting oligomer is asymmetric and limited in extent. Figure 5.1B describes another form of aggregation. The same protein protomer shown in Fig. 5.1A associates to form a hypothetical dimer with an interface contact shown as u–v. Now the aggregation is not stereochemically limited. More monomers (or u–v dimers) can continue to add on to the starting dimer nucleus, producing a continuous strand of the globular protomer units (more on this in the next section). A third form of aggregate is possible. This is shown in Fig. 5.1C. It is possible to form a dimer with u–x contact(s). Again, for stereochemical reasons, when this happens two identical subunit–subunit contact regions (u–x) can be formed. The resulting oligomer, in this case a dimer, has special symmetry properties to be discussed below. It cannot aggregate further and therefore could be called a closed symmetrical oligomer.
HELICAL OR CONTINUOUS PROTEIN POLYMERS In Fig. 5.1B, a specific protein–protein contact occurred between two subunits in a nonsymmetrical manner. In this hypothetical example, there would be no reason for limiting the aggregation to a dimer. Aggregation would continue and a polymer would
5. Quaternary Structure of Proteins
63
Fig. 5.2 Aggregation to form a helical polymer. Constant protomer–protomer contacts labeled u–v lead to the formation of a helical polymer.
be generated. Polymerization based on the u–v contact is shown schematically in Fig. 5.2. Once again, in this diagram, each subunit makes the same noncovalent contacts with its neighbors and a polymer with helical symmetry is formed. Biological systems have many examples of globular proteins aggregating into helical polymers; examples include F-actin, flagellin, tubulin, sickled cell hemoglobin, tobacco mosaic virus, even some enzymes. Although it first appears that the threedimensional organization of continuous aggregates is simple to describe, this is not necessarily the case. The example given in Fig. 5.2 is one of a single-strand helical polymer. In many instances, one protein subunit makes contact with several other protomers, giving the appearance of a multistranded helical polymer. This is shown schematically in Fig. 5.3, where the helical polymer appears to have two strands. The same stereochemical rules apply to helical organization whether they have one strand, two strands, or n strands. Every subunit makes identical contacts with its neighbors. In Fig. 5.3, the two-stranded helical polymer makes a minimum of two different sets of noncovalent bonds, labeled contact set 1 and contact set 2. The specific contacts made between monomers along with their directional properties also determine
Fig. 5.3 Multistranded helical polymers. The constant protomer–protomer contacts described in Fig. 5.2 are shown as Contact Set 1. One or more additional but still constant subunit–subunit contacts lead to a multistranded polymer. Here the helical polymer appears to have two strands caused by the additional subunit–subunit interactions labeled Contact Set 2.
64
The Quaternary Structure of Closed Aggregates: Oligomeric Enzymes
the pitch of the helical polymer. The pitch is the translation along the helical axis per turn of the screw or helix. Other often-cited helical properties include the number of monomeric units per turn of the helix and the distance traveled (in the direction of the helical axis) per subunit. In biological systems, there are no rules limiting the number of strands in a helical polymer of globular subunits. F-actin is an example of a two-stranded helical polymer. Sickle cell Hb can be best described as a 14-stranded helical oligomer. In addition, depending on the arrangement of the strands, they may appear either as relatively solid objects like fibers or alternatively as tubes. The polymerized form of tubulin is an example of the latter. Data obtained by X-ray diffraction of polymerized specimens or by direct imaging with an electron microscope (EM) are most often used to characterize continuous aggregates. Digital processing of electron micrographs (see Chap. 12), coupled with major improvements in EM imaging, has made it possible to derive all of the parameters from a single image of a helical oligomer. The helical parameters described by the pitch or repeat distance, the radius of the polymer, and the number of strands can also be obtained from X-ray diffraction data of a fiber containing multiple aligned strands of a helical oligomer. In some instances, when the conformation of the basic subunit is known, it is possible to determine the orientation of a monomeric unit in a continuous helical aggregate as seen at very poor resolution. This involves a computational procedure that optimizes the agreement between low-resolution images seen in electron micrographs and the crystal coordinates of the monomeric unit determined in separate experiments. In the late 1980s several investigators were able to develop methods for obtaining phase information for X-ray scattering data on fibers. This is far more difficult than solving crystal structures, but in the end it led to a detailed atomic model of a helical rodlike virus, tobacco mosaic virus. The three-dimensional structure consists of a helical arrangement of coat protein subunits around a nucleic acid core (Namba et al., 1989).
THE QUATERNARY STRUCTURE OF CLOSED AGGREGATES: OLIGOMERIC ENZYMES As mentioned already, it is the arrangement and nature of the protein–protein recognition site or sites that determine the type of oligomer formed by globular protein subunits. If the subunits can aggregate to form two identical sets of interaction sites, a symmetrical oligomer is formed. This is shown schematically by the dimer in Fig. 5.1C. The u–x contact set occurs twice in this dimer. The contribution to the free energy of demerization is more favorable by a factor of 2! Note that the dimer appears symmetrical about a line, in this case a line perpendicular to the plane of the drawing. Some but not all forms of symmetry found in protein oligomers are the same as symmetry operators that occur in crystals. This means that the symmetry of, say, a protein dimer may be superimposed on a crystallographic operator. When this occurs, the symmetry of the oligomer is known before the structural determination is completed. When the symmetry of the oligomer is not part of the crystallographic symmetry, it is referred to as local symmetry. Two forms of symmetry most often used are called cyclic and dihedral. These are described more fully below, but several references might also be helpful (Cantor and Schimmel, 1980; Banaszak et al., 1981). A few useful definitions follow. Symmetry is a property of an array of objects in three dimensions characterized by having a regular or uniform pattern. The schematic dimers shown in Figs. 5.4 and 5.1C are symmetrical. To characterize the regular or uniform pattern, it is necessary to describe symmetry operations and symmetry elements. A symmetry element is a point, line, or plane around
5. Quaternary Structure of Proteins
65
Fig. 5.4 A twofold rotation symmetry element and operation. The symmetric dimer contains a single symmetry element—the twofold or dyad axis. The symmetry element is positioned as shown here. The symmetry operation involves a rotation of 180⬚ to produce congruency, as shown in the inset.
which a symmetry operation occurs. The darkened ellipsoid in Fig. 5.4 marks a symmetry element—in this case a twofold rotation axis. A symmetry operation(s) is the physical or mathematical transformation(s) needed to generate a symmetrical set from a single object. In Fig. 5.4 the symmetry operation involves a rotation by 180⬚ of all points (atoms) in subunit 1 about the twofold axis. The twofold or dyad axis is shown as a line perpendicular to the plane of the paper. Within the inset of Fig. 5.4, a subunit labeled 1 is rotated 180⬚. This rotation brings subunit 1 into congruency with subunit 2. Note the use of the words twofold rotation axis and dyad. In the literature, both words are used interchangeably to describe symmetry elements characterized in Figs. 5.4 and 5.5. In oligomeric proteins, other forms of cyclic symmetry are possible. The rotation may be 360⬚/n, where n is any integer. A trimeric protein may be characterized by a triad or threefold axis, wherein each subunit would be rotated by 360⬚/3 or 120⬚ for the congruency test. As noted, this and other forms of symmetry found in oligomeric systems are described in more detail in Cantor and Schimmel (1980) and Banaszak et al. (1981). Mitochondrial malate dehydrogenase (mMDH) is a dimeric protein of identical subunits. The stereodiagrams shown in Fig. 5.5A and B illustrate the symmetrical nature of the dimer. Study both diagrams carefully until you can match up C␣ atoms about the twofold rotation axis. Trace the location of the dyads in both cases. Note that in Fig. 5.5A, the dyad is nearly perpendicular to the plane of the paper. Figure 5.5B, on the other hand, describes the same subunit–subunit interface of mMDH with the dyad or twofold rotation axis running vertically, i.e., in the plane of the drawing. As should always be the case when studying stereodiagrams, read the caption carefully. To use the two stereodrawings in Fig. 5.5, it is necessary to be able to distinguish helical fragments of one subunit from those of the symmetry-related mate. This is done in Fig. 5.5 by labeling every C␣ position with an arrowhead for one subunit and every fifth residue in the other subunit. With stereoglasses, it should be possible to determine how many helices are present in the subunit–subunit interface of mMDH, although the answer is given in the caption. Even though all but one of the side chains have been omitted from Fig. 5.5A, notice also that there appears to be a hole along the symmetry element. Packing of atoms in subunit–subunit interfaces is never very good near symmetry elements. Some solvent space, however small, is always visible along rotation axes in oligomeric pro-
66
The Quaternary Structure of Closed Aggregates: Oligomeric Enzymes
Fig. 5.5 Stereodiagram of the malate dehydrogenase dimer. These stereodrawings depict polypeptide segments of the two identical subunits of dimeric mitochondrial malate dehydrogenase from porcine heart. Only C␣ atoms are shown. Both (A) and (B) contain the same model in different orientations. Included in the drawings are residues 10–22, 36–49, 149–159, 209–218, and 225–241. These are helical segments labeled B, C, 2F, 2G, and 3G according to standard dehydrogenase notation. Actual labels are B-1 and B-2, C-1 and C-2, etc., with each number signifying subunit 1 or 2. The two subunits are further distinguished by the presence of an arrowhead at each C␣ position in subunit 1 and at every fifth residue in subunit 2. The arrowheads point from the N to the C terminal. The ␣-helical segments were selected because they form the backbone of the dimer interface. A single side chain, L18, is shown. This side chain is positioned close to the twofold rotation axis describing the symmetrical dimer.
teins. This is partly because no atom can be closer than about one-half its van der Waals ˚ , since the closest atom–atom contact distance radius from the dyad. This is roughly 2 A ˚ is 3.8 A. Finally, for the unacquainted reader, it is worth noting that there is nothing special about the mMDH dimer or its subunit–subunit interface. In Fig. 5.5, the important idea is the symmetry. The structures of many different proteins composed of two identical subunits are known. With a few minor exceptions, all would have the same twofold rotational symmetry, although a few would have the the interface composed of the same arrangement of ␣ helices. Sometimes n-fold cyclic symmetry is combined with a twofold axis perpendicular to it. The proper name for such combinations is dihedral symmetry. The combination of
5. Quaternary Structure of Proteins
67
Fig. 5.6 Stereodiagram of glyceraldehyde-3-phosphate dehydrogenase. A fragment of each of the four identical subunits of tetrameric GPD is shown in stereo, using only C␣ atoms. The fragment was selected because it is near the center of the tetramer and includes residues 192–209. The amino acid sequence for this portion is DWRGGRGAAQNIIPSSTG. An arrowhead is present at each C␣ position to facilitate identication of the residue numbers. The fragments come from the four subunits labeled red, green, blue, and yellow; this is the labeling used in the PDB file and the literature. The four subunits are arranged by a common form of symmetry which the reader should be able to determine after reading the accompanying text.
n-fold plus twofold leads to a symmetrical oligomer in which the subunits are arranged about a point. The most common form of this symmetry is called 222—three mutually perpendicular intersecting twofold axes describe a tetramer of identical subunits. Hemoglobin has such symmetry, assuming that the ␣ and  subunits are conformationally the same, and to the first approximation that is true. On the basis of the simple symmetry rules mentioned above, one can predict the symmetry of oligomeric enzymes. Consider a hexameric protein as an example. Assuming a closed symmetrical oligomer, two possibilities come to mind. Either the hexamer has a sixfold rotation axis or it has a threefold rotation axis with a twofold axis perpendicular to the threefold. The latter generates a dihedral point group commonly labeled 32 symmetry. The protein aspartate carbamoyltransferase is most often found as a C6R6 oligomer; the molecule is composed of six catalytic (C) and six regulatory (R) subunits. Crystallographic studies showed that the molecule has 32 symmetry. Aside from dimers, the next most common form of oligomeric proteins is the tetramer. Nearly all tetramers have the same 222 symmetry previously mentioned for hemoglobin. Examples of a 222 symmetrical tetramer include the enzyme glyceraldehyde-3-phosphate dehydrogenase (GPD) derived from lobster skeletal muscle. GPD can be described as an ␣4 oligomer; that is, it is composed of four identical subunits. Small segments of the polypeptide chains found in a GPD tetramer are shown in Fig. 5.6. The segments are labeled as red, blue, green, and yellow. On the basis of crystal studies, the four subunits of GPD were shown to be conformationally identical. However, each of these four subunits is in a unique location in the crystal lattice and in that sense must be labeled individually. The GPD fragments shown in Fig. 5.6 were selected because they are close to the center of the tetramer. Incidentally, it is relatively easy to estimate how close atoms are in a stereodrawing. Remember that the C␣ –C␣ distance in a C␣ model is slightly more ˚ , and most covalent bonds are about 1.5 A ˚ apart. Examine Fig. 5.6 carefully, than 3.7 A looking for three hypothetical lines that would mark the location of the three dyad axes. One of them is visible without stereo—the one perpendicular to the plane of the paper
68
Biological Implications of Quaternary Structure
at the center of the drawing. The other two dyads must intersect the aforementioned line at a point. The most complex form of quaternary structure is seen in the assembled coat proteins of spherical viruses. In this case the point symmetry is called icosahedral, permitting the symmetrical arrangement of certain multiples of 60 subunits around a point. The symmetry operations include twofold, threefold, and fivefold rotations. Because most spherical viruses contain more than 60 subunits, Caspar and Klug (1962) showed how icosahedral symmetry could be maintained by using certain integer multiples of 60. To analyze spherical viruses as an isometric particle, they introduced the idea of quasiequivalence. Caspar and Klug in 1962 showed that it is possible to construct the coat protein shell with the interactions between subunits either equivalent or quasiequivalent and still maintain the icosahedral symmetry. In such shell packing of protein subunits, every member has approximately the same contact area with its neighbors. In the 1980s, Harrison was able to show how this was possible in analyses of the crystal structure of tomato bushy stunt virus (Harrison, 1984). The individual subunits of the viral coat protein(s) have two domains. (Parenthetically, the domains have similar conformations in spite of a lack of amino acid homology.) Quasiequivalent contacts are possible because of flexibility in the relative orientation of the two domains belonging to one subunit or polypeptide chain. Subunit–subunit contacts are constant; domain– domain orientations differ slightly.
BIOLOGICAL IMPLICATIONS OF QUATERNARY STRUCTURE The idea that sets of two or more homologous noncovalent contacts lead to symmetrical association of subunits about a line has some interesting biochemical implications. 1. The chemistry and/or stoichiometry of a symmetrical oligomeric molecule is always related to the number of subunits, except in reference to a site or reaction that takes place very close to a symmetry element. Near the symmetry elements, for stereochemical reasons the chemistry cannot be described by n if the operator is an n-fold rotation axis. At a dyad axis, for example, a chemical reaction may be limited to a stoichiometry of 1! A good example is seen in the binding of diphosphoglycerate to the  subunits of hemoglobin. Because of the location of the binding site, a single molecule of diphosphosphoglycerate spans a twofold rotation axis between two  subunits. Only one binding site is present per two  subunits and per hemoglobin tetramer. 2. Another important example of the functional role of symmetry can be found in the binding of dimeric repressor proteins to palindromic sequences of duplex DNA. Cro and phage 434 proteins are examples. The twofold symmetry of the DNA-binding proteins is required to bind to two strands on the DNA. The local DNA sites have the same symmetry as the protein! 3. Cooperativity through allosterism nearly always involves oligomeric proteins. The oxy to deoxy transition of the Hb tetramer is the most studied example. For Hb and probably for most other cooperative systems, the two states involve quaternary rearrangements brought on by conformational changes in the tertiary structure of a subunit accompanying binding of some effector molecule. 4. In some oligomeric enzymes, the quaternary state is inconsequential. A monomer may be as active as the oligomer. However, in some oligomeric enzymes the quaternary state is vital to the enzymatic activity. In most cases this occurs because amino acid residues in a single active site come from more than one subunit. 5. Unexpectedly, amino acid sequence homology does not seem to play an important role in the formation of oligomers. Often a subfamily of conformationally simi-
69
5. Quaternary Structure of Proteins
lar proteins will maintain the same subunit–subunit interface when their amino acid sequence identity is down in the 20 to 30% range. For example, in the malate dehydrogenase dimers, the mitochondrial form shares about 20% amino acid sequence identity with the cytosolic form. Yet the dimer interface is essentially identical. Perhaps the arrangement of secondary structural elements may play a more important role in the subunit–subunit recognition process. 6. Some oligomeric systems with many types of proteins do not obey any of the symmetry principles described in this chapter. The ribosome is a good example. Composed of numerous different proteins and RNA, the ribosome appears to be asymmetric. It is worth noting that in a few instances, departure from the symmetrical arrangements of subunits has been observed in crystal structures. One dimeric hexokinase molecule does not have twofold rotational symmetry in the crystalline state; similarly, a dimeric form of cytosolic MDH has shown a significant departure from twofold rotational symmetry. In the latter case, it is believed that crystal packing forces could account for this effect. Oligomers of identical subunits should and normally do obey the symmetry rules but exceptions have been found.
SURFACE ACCESSIBILITY One way of estimating the stability of subunit interactions is to calculate the molecular surface area lost from a protein molecule when the oligomer is formed. In a simple twofold symmetrical dimer, let A1 be the surface area of one subunit. The corresponding exterior of the dimer is designated A2. Then A2 ⬍ 2A1
(5.1)
Methods for calculating the surface area of a macromolecule from atomic coordinates were pioneered by Lee and Richards (1971). Although other methods have been developed, the Lee and Richards formulation is still frequently used. It is described in Fig. 5.7 and Eqs. 5.2 through 5.5. The surface accessibility calculation is based on the summation of thin slices through the protein, and in Fig. 5.7, they are made perpendicular to the Z-axis. As the probe of radius Rw is moved along the surface of the slice, it is not permitted to penetrate any of the atoms. Each atom is defined by its van der Waal’s radius, Rp, given in Table 5.1. Values of Rp in this table are assigned so that hydrogen atoms are included—virtual atoms. Note in Fig. 5.7 that Atom 2 is not on the indicated Z-section and its radius is changed to Rx. Rx ⫽ (Rp2 ⫺ d2)1/2
(5.2)
Where d is the distance of atom 2 from the center of the slice., L1 is the accessible surface arc for the ith atom. LI ⫽ ␣I (Rp ⫹ Rx)
(5.3)
Where ␣I is the indicated angle in radians. The accessible surface area for each atom, AI, is AI ⫽ LI ⌬Z and A ⫽ ⌺ Ai
(5.4)
⌬Z is the thickness of each section. The % accessibility for each atom is fI ⫽ 100 Ai / 4 (Rp ⫹ Rw)2
(5.5)
70
Surface Accessibility
Fig. 5.7 Finding the accessible surface area of a protein. The schematic drawing contains two atoms of a multi-atom macromolecule. The z-coordinate of atom 2 is below that of atom 1, and zslices are being used to integrate the total surface area. Therefore the ‘‘z-segment’’ of atom 2 is reduced from its normal van der Waal’s radius. The probe molecule has a radius of Rw. Usually ˚ is used. The probe tracks the surface of atom 1 until it overlaps another atom. the value of 1.4 A It then moves to atom 2, etc., until all of the atoms in the protein have been examined. The method is described in more detail in the text.
Accessibility calculated in this manner represents the static value because no account has been taken of potential motions. Other mathematical approaches have been developed to calculate surface accessibility but are not as easy to visualize. The accessible surface area lost upon formation of an oligomer is usually small but of course depends on packing between the subunits. Often it is as small as 10%. One may visualize this in terms of an orange. One segment of an orange is slightly greater than 10% of the overall surface area. Alcohol dehydrogenase is a homodimer with 11% of its surface area lost upon dimer formation. Lactate dehydrogenase is a homotetramer with 222-point symmetry, and 32% of its subunits surface area is lost upon oligomerization.
TABLE 5.1 Virtual van der Waal Radii Used for Accessibility Calculations Atom Mainchain C␣ Mainchain carbonyl oxygen Mainchain carbonyl carbon Mainchain amide nitrogen Sidechain atoms Water probe
˚ Radius ⫺A 1.70 1.52 1.80 1.55 1.80 1.40 (variable)
71
5. Quaternary Structure of Proteins
Fig. 5.8 Using symmetry and coordinate transformations for generating coordinate for all atoms of an oligomer. A dimer has twofold rotational symmetry. The twofold axis or dyad is coincident wth the Y-axis of the coordinate system. The crude representation of the molecules A1 and A2 are shown. Caution: the drawing makes it look like there is a mirror plane between A1 and A2. This cannot be the case, since all proteins have L-amino acids. The representation really means rotate A1 180⬚ around the Y-axis!
GENERATING COORDINATES FOR OTHER SUBUNITS The viewing of oligomeric proteins from coordinates found in the PDB may require some additional effort. Take the simple case where a crystal contains a homodimer with twofold rotational symmetry. If the dyad symmetry is coincident with a crystallographic twofold rotation axis, the coordinates of only one subunit may be found in the PDB file. An example is given in Fig. 5.8. The coordinates for molecule A1 may be used to generate those of A2. To do this you must understand the nature of rotation matrices. For the simplest case as shown in Figure 5.8, the molecular dyad is coincident with the Y-axis. Let a1 represent the x1y1z1 coordinates of molecule A1 and a2 represent the x2y2z2 coordinates of A2. a2 ⫽ [M] a1 ⫺1 0 0 ⫽ m11 m12 m13 Where M ⫽ 0 1 0 ⫽ m21 m22 m23 0 0 ⫺1 m31 m32 m33
(5.6)
The 3X3 transformation matrix, M, can be used to transform the banked coordinates to generate the other subunit. It is possible to carry out this transformation with many PC computer programs such as Excel, Kaleidograph, etc. The caveat is to be sure to output the second set of coordinates with the correct format. This is possible in the aforementioned programs. The transformation in Fig. 5.8 is relatively easy to visualize. Note that because the symmetry operator is around the Y-axis, the second subunit can be generated simply by negating the x1 and z1 values for each atom. The y2 values are the same in both subunits. The general form of the transformation matrix is also shown in Eq. 5.6. For oligomeric proteins, the elements of the transformation matrix may often be found in MTRIX statements in the PDB file. Sometimes generating the coordinates for another subunit of an oligomeric protein also involves a translation vector as shown in Eq. 5.7. a2 ⫽ [M] (a1) ⫹ t
(5.7)
72
Problems
The symbol t represents a translation vector having the components tx,ty,tz. In other terms this means the coordinates must be translated after the rotation is applied. The components of such translation vectors are also found in the MTRIX statement in the PDB file. To use all this in a PC program, Eq. 5.8 gives an example of how to transform the coordinates of a subunit A1 to generate the coordinates of its symmetry related mate, A2, in a dimer with twofold rotational symmetry. x2 ⫽ (m11x1 ⫹ m12y1 ⫹ m13z1) ⫹ tx y2 ⫽ (m21x1 ⫹ m22y1 ⫹ m23z1) ⫹ ty z2 ⫽ (m31x1 ⫹ m32y1 ⫹ m33z1) ⫹ tz
(5.8)
Values for these transformations are frequently given in the journal articles reporting the structure. Even when the complete coordinates for the oligomer are given in the PDB file, the MTRIX records may be used to superimpose the subunits of an oligomer to see how well they agree. It should also be noted that these coordinate transformations are a vital part of comparing two different but homologous proteins, as discussed in Chapter 7. The PDB website has a link to another site called the LQS—the Likely Quaternary Structure—in which coordinates may be obtained for many oligomeric proteins which exist only as substructures in the PDB. Part of the information in the LQS site is generated by analyzing the accessible surface area found in the unit cells of the crystallographic analyses. Methods for accessible surface area calculation was described previously. Last of all, many programs for molecular display have facilities for generating additional coordinate sets for display. PREKIN/MAGE has facilities for doing this as described in Chapter 3.
SUMMARY Because protein–protein recognition results in a stereochemically constant interaction between protomers, two types of polymers of globular proteins are most often observed: rodlike and closed forms. The rodlike aggregates have helical symmetry. Closed form aggregates are also symmetrical, most often using forms of cyclic symmetry. A dimeric protein composed of two identical subunits is nearly always found with twofold rotational symmetry. This symmetrical association optimizes the thermodynamic stabilization occurring between complementary surfaces of the protein subunits. Spherical viruses are assembled with the most complex form of symmetry: icosahedral point symmetry. Even in this most complicated assembly, equivalent or quasiequivalent contacts account for specific interactions forming the macromolecule. The rules for virus assembly are the same as for a dimeric enzyme. In at least one case where the symmetry rules were disobeyed in the crystalline state, the unusual result could be due to crystal packing forces.
PROBLEMS Practicing Stereovision 1. Using the stereodiagram of the mMDH dimer shown in Fig. 5.5A and B, estimate the distance between L18 of subunits 1 and 2. How far are the L18s from the twofold rotation axis? 2. Again using Fig. 5.5A and B, make a table showing which helical segment in subunit 1 is closest to helical fragments in subunit 2. For example, C-1 is close to 2G-2 and other (?) helices in subunit 2.
5. Quaternary Structure of Proteins
73 3. What is the symmetry of the GPD tetramer shown by the fragments contained in Fig. 5.6? Is it possible to place the positions of all of the symmetry elements (three dyads)? 4. Look again at Fig. 5.6: what two consecutive amino acids are closest to the center of the tetramer? Note that the amino acid sequence for the fragment is given in the caption.
Using Computer Graphics Obtain the PDB coordinates for malate dehydrogenase from E. coli (PDB accession code: 1emd). This protein, abbreviated eMDH, is a dimer with a twofold rotation axis. In the crystals that were used for the X-ray studies, the dimer dyad was coincident with the y axis. 5. Using the Localrotation command, edit a .kin file of a C␣ model so that the dimer is in view. Because the local rotation is around the y axis, the parameters for the Localrotation statement will be (⫺1., 0., 0., 0., 1., 0., 0., ⫺1.). Color each subunit differently while editing the .kin file. Now study the subunit–subunit interface and measure three of what appear to be the shortest intersubunit C␣-to-C␣ spacing. Record the spacing and the number of the C␣ atoms involved. 6. Orient the dimer so that the view is down the local dyad. Be sure to identify a few of the dyad-related segments on the two component subunits. 7. Find residues 48–51 and 148–163 on the displayed model. Are the subunit– subunit contacts in this region of the dimer part of the noncovalent interactions stabilizing the dimer? 8. Create a .kin file with only residues 48–51 and 148–163. Edit the .kin file so that these segments are present for both subunits in the dimer (review question 5). There should be two copies of this contact region in the dimer. Use the Pickcenter command to focus on only one of these two pieces. Identify the side chain–main chain hydrogen bond in this region of the interface. 9. Identify two atoms that are in van der Waals contact (⌬d approximately ˚ ). 3.8 A The following problem is more difficult because it involves displaying four subunits simultaneously. It should be doable after reading this chapter, and with a growing proficiency with PREKIN and MAGE. Obtain the PDB coordinates of ferritin with the PDB accession code 1rcc. Ferritin is a 24-mer with 432-point symmetry. The fourfold rotation axis is colinear with the crystallographic z axis. The PDB coordinates are for a single monomer. If necessary, review Chapter 3 before proceeding. 10. Using the Localrotation command, edit a .kin file of a C␣ model so that the view is of the arrangement of ferritin subunits around a fourfold rotation axis. Make sure to edit the Color⫽ command so that each subunit is a different color. The Localrotation commands will have values of (0., 1., 0., ⫺1., 0., 0., 0., 0., 1.), (⫺1., 0., 0., 0., ⫺1., 0., 0., 0., 1.), and (0., ⫺1., 0., 1., 0., 0., 0., 0., 1.). Measure the three shortest C␣-toC␣ spacings and the residues forming this contact.
REFERENCES Banaszak, L., Birktoft, J., and Barry, C. D. (1981). In ‘‘Protein–Protein Interactions’’ (C. Frieden and L. Nichol, eds.), pp. 31–128. John Wiley & Sons, New York. Cantor, C., and Schimmel, P. (1980). ‘‘Biophysical Chemistry—The Conformation of Biological Macromolecules. W. H. Freeman & Company, San Francisco. Caspar, D., and Klug, A. (1962). Cold Spring Harbor Symp. Quant. Biol. 27, 1–24. Harrison, S. C. (1984) Multiple modes of subunit association in the structures of simple spherical viruses. Trends Biochem. Sci. 9, 345–351.
74
References
Lee, B.-Y., and Richards, F. (1971). The interpretation of protein structures: Estimation of static accessibility. J. Mol. Biol. 55, 379–400. Namba, K., Pattanayek, R., and Stubbs, G. (1989). Visualization of protein–nucleic acid interac˚ resolution. J. Mol. Biol. tions in a virus. Refined structure of tobacco mosaic virus at 2.9-A 208, 307–325.
C H A P T E R
6 Secondary Structure of Proteins INTRODUCTION he overall conformation of the main chain of a protein is determined by three torsional bond angles, one of which is fixed by chemical bonding rules. Therefore, the definition of three torsional bond angles for each amino acid in a polypeptide chain is sufficient to describe the entire conformation of the main chain of a protein. If these torsional angles have repeating values for a set of contiguous amino acids, the polypeptide segment is said to have secondary structure. The most common forms of secondary structure are the ␣ helix and  strands. In fact, the globins, the first proteins studied crystallographically, are composed entirely of ␣ helices and turns. Turns are also often considered a form of secondary structure. This is because only a limited number of turn conformations are present in most proteins. But why should one study secondary structure, since it is the complete threedimensional (3-D) structure that is needed to understand biological activity? While it is true that the entire crystal structure is the best starting point for defining structure– function relationships, there are several reasons for doing the analyses in terms of secondary structure. They are as follows.
T
1. In nuclear magnetic resonance (NMR) studies, it is often possible to identify secondary structure in regions of the protein before the 3-D structure is completed. 2. Detailed knowledge of secondary structure is useful in describing and comparing crystal and NMR structures. It has led to recognition of some unexpected similarities among widely different proteins. 3. Secondary structural elements may be intermediates in protein folding. The word element is used here to refer to a segment of secondary structure in a protein of known conformation. 4. Secondary structure should be easier to predict than three-dimensional conformation. 5. It is the simplest way of dividing the three-dimensional structure of a complex protein into a catalog of identifiable pieces.
THE CHEMICAL NATURE OF A POLYPEPTIDE CHAIN Before beginning a discussion of helices and  strands, the factors that contribute to secondary structure and some simple rules about defining main-chain torsional angles
75
76
The Chemical Nature of a Polypeptide Chain
Fig. 6.1 The three torsional angles of a polypeptide chain. The three torsional angles of a polypeptide chain are shown labeled as ,, and . The dotted bond between nitrogen and the carbonyl carbon is meant to indicate the double-bond character of the peptide bond. Reference angles are defined as follows: i ⫽ 0 when C␣i–Ci is cis to Ni ⫹ 1 –C␣i ⫹ 1; i ⫽ 0 when C␣i –Ci is trans to Ni –H; i ⫽ 0 when C␣i –Ni is trans to Ci –Oi.
should be reviewed. The torsional or Ramachandran angles mentioned above are the most important and they are defined in Fig. 6.1. Crystal structures of small peptides and refined proteins at high resolution indicate that the peptide bond is planar or very close to planar. In other words, the carbonyl carbon, oxygen, and amide nitrogen lie in a plane. The same plane is thought to include the hydrogen atom attached to the amide nitrogen. This was a predictable observation and is due to the partial double bond character in the –C–N– bond. By itself this sounds like a relatively trivial observation but it has important implications in protein structure. The arrangement around the peptide bond can be either cis or trans, as shown in Fig. 6.2. Although nearly all peptide bonds are found in the trans configuration, crystal structures have identified a few cis peptide bonds, mostly for glycines or prolines. For proline, the example given in Fig. 6.2, a cis peptide bond for residue n ⫹ 1 has the arrangement such that ⫽ 180⬚ according to the definitions given in Fig. 6.1. Both glycine and proline present additional difficulty in an NMR structural study. Glycine has two protons on the C␣ carbon, causing a degeneracy in the nuclear Overhauser effect (NOE) to the proton on the peptide nitrogen. Proline has no proton on the peptide nitrogen and NOEs to side-chain atoms must be used. Ramachandran and co-workers pointed out that because of the double-bond character of the peptide bond, the entire conformation of any protein can be defined by two rather than three torsional angles per amino acid residue. The two torsional angles are called and . Remember, a torsional rotation involves the twisting of two groups held together with a covalent or semicovalent bond. As depicted in Fig. 6.1, the torsional angle is around the N–C␣ bond and is a torsional angle associated with the C␣ –C bond. Of course, these angles must be defined in terms of some standard starting configuration and the definitions are also given in the caption to Fig. 6.1.
Fig. 6.2 cis versus trans peptide bonds. Depicted here is a peptide bond between the Cn residue and the Cn ⫹ 1 residue of a polypeptide chain. The latter is a proline. Depending on the location of C␣ for the ⫹ 1 residue, the peptide bond is either cis or trans as shown.
77
6. Secondary Structure of Proteins
Fig. 6.3 Main-chain torsional angles for thioredoxin. The torsional angles , are shown for the protein thioredoxin. They have been calculated for both the X-ray structure and one of the NMR structures. Blocked areas of the plot are the so-called allowed regions of conformational space. Glycine residues are shown by the plus symbols (⫹). The amino acid sequence is as follows: 10
20
30
40
SDKIIHLTD
DSFDTDVLKA
DGAILVDFWA
EWCGPCKMIA
P
50
60
70
80
ILDEIADEY
QGKLTVAKLN
IDQNPGTAPK
YGIRGIPTLL
L
90
100
FKNGEVAAT
KVGALSKGQL
KEFLDANLA
Ramachandran also noted that certain combinations of , were not allowed except in glycines. Such conformational areas are blocked out in Fig. 6.3. With no  carbon, glycines can and do appear nearly everywhere on the , plot. For instructive purposes, , angles are shown for both the X-ray and NMR structures of thioredoxin. Note that the , angles for glycine in the NMR structure of thioredoxin are somewhat more scattered than in the X-ray structure. But as was shown in Chapter 4, the two structures are nearly identical. Minor variations in the peptide torsional angles do not affect the overall conformation. For most other amino acids, there are regions of this conformational space that are disallowed. Certain combinations of these angles result in steric interference between atoms stemming from the main chain, C, the hydrogen of the amide nitrogen, and the carbonyl oxygen. With important exceptions, residues appearing in the disallowed region are considered to be in error. Since disallowed torsional angles frequently mark errors in a structure, , plots are routinely used at various stages of crystallographic refinement to test validity. For similar reasons they also play a role in NMR studies. However, even when refinement is finished a few residues may appear in disallowed regions. Quite often these are at sharp turns and generally involve an asparagine or aspartic acid. It is relatively easy to look at allowed versus disallowed torsional angles in stereo and two illustrations are given in Fig. 6.4. The reader should take the time to identify the bad contacts that occur in one of the drawings as a result of disallowed torsional angles.
78
Definitions of Secondary Structure
Fig. 6.4 Disallowed and allowed , torsional angles in peptides. These two stereodiagrams depict the same three amino acids removed from two subunits of a refined crystal structure. The sequence is KVP, but the N-terminal nitrogen and one oxygen on the C-terminal carboxylate are missing. In the stereodrawing, a single circle represents a carbon atom, two rings an oxygen atom, and three rings a nitrogen atom. The torsional angles for valine are as follows: top, ⫽ 68.1⬚, ⫽ 140.2⬚; bottom, ⫽ ⫺141.9⬚, ⫽ 147.6⬚. Which structure would appear to be in an allowed conformation? Use both Fig. 6.3 and stereoglasses!
DEFINITIONS OF SECONDARY STRUCTURE Much like the helical polymers described in Chapter 5 on quaternary structure, constant or repeating values of and for a segment of a polypeptide chain lead to a repeating conformation. Two commonplace examples are the ␣ helix and the  strand. The clustering of torsional angles around ⫽ ⫺60⬚, ⫽ ⫺60⬚ in Fig. 6.3 is due to the ␣-helical residues present in the protein. Near the torsional angles ⫽ ⫺100⬚, ⫽ 135⬚,  structure is found. A  strand is rarely found alone and instead is found side by side with other  strands, forming a sheetlike structure. Although the stereodrawings and , angles are useful in describing secondary structure, much more precise definitions are needed for cataloging purposes. Kabsch and Sander (1983) used a pattern recognition approach for identifying elements of secondary structure in crystal coordinates. Although their approach is far more detailed than is necessary for this discussion, their definitions apply equally well when simply studying stereodiagrams. Their cataloging search is based mainly on hydrogen-bonding patterns. For example, a turn depends on the presence of a hydrogen bond (distance)
6. Secondary Structure of Proteins
79
Fig. 6.5 Describing handedness. (A) Four C␣ atoms in a peptide chain are shown. At C␣3 the dotted structure is a hypothetical atom that would put arrange C␣ atoms 1–4 in the same plane. C␣4 is rotated as shown. Looking down the C␣2 –C␣3 bond, C␣4 is rotated in a clockwise (⫹) direction. This is a right-handed rotation and corresponds to the appearance of ␣ helices in proteins. (B) The four arrows are meant to represent  strands in a sheet structure. The thickest arrow represents the arrow closest to the reader and the view is edge on to the  sheet. In a backward direction, each  strand is twisted in a counterclockwise direction. This is a (⫺) or lefthanded twist.
between the CO of the ith residue and the NH of the i ⫹ n residue. Assuming repeats exist, a 4-turn (n ⫽ 4) is an ␣ helix, a 3-turn (n ⫽ 3) is a 3-10 helix, and a 5-turn is a helix. Stereochemical reasons prevent 1-turns (n ⫽ 0). A second form of pattern recognition can be used to describe  structure and are called bridges. A bridge is defined as a hydrogen bond between CO of the ith residue and the NH of the ith ⫹ n residue, where n ⬎ 5. A combination of one or more of these bridges is defined as a ladder and the orientation of the hydrogen-bridged atoms can be used to define whether it is parallel or antiparallel. A  sheet is then defined as a set of one or more ladders connected by shared residues. The simplest form of chirality is exemplified by the right-handed nature of ␣ helices. Chirality or handedness is illustrated in Fig. 6.5A. Note that each C␣ is rotated in a clockwise direction around an axis running parallel to the overall direction of the polypeptide chain, the helical axis. Another form of chirality is illustrated in Fig. 6.5B. In this case, the strands of  sheet are shown, but the direction of each strand is slightly different. In fact, if one imagines an axis running through the center of each strand of the sheet and perpendicular to the plane of the page, each of the member strands is rotated around this axis in a counterclockwise direction. The  sheet is said to have a left-handed twist. Later on, in discussions of supersecondary structure, we will see that many forms of secondary structure have their own chirality and this must also be defined to fully describe the secondary or supersecondary structure. Using the preceding definitions, Kabsch and Sander (1983) surveyed the Protein Data Bank and catalogued all of the secondary structural elements found in this databank of refined crystal structures. The ␣ helices found ranged in length from 4 to about 20 residues, or 1 to 5 turns.  ladders, both parallel and antiparallel, are much shorter, usually containing 5 amino acids—10 residues near maximum. Only a few cases of 3–10 and helices have been observed, and these appeared mainly at the ends of ␣ helices. All helices have a right-handed chirality. Most  sheets have a (⫺) or slight left-handed twist. Remember the definitions given in Fig. 6.5 regarding the appearance of the twist or handedness. A segment of ␣-helix is shown in stereo in Fig. 6.6. Study the conformation of this helix and convince yourself of the hydrogen-bonding pattern between the ith and
80
Definitions of Secondary Structure
Fig. 6.6 An ␣ helix from mitochondrial malate dehydrogenase. The helix is at the C-terminal end of the protein. It has the amino acid sequence SPFEEKMIAEAIPELKASIKKGEEFVK. It includes residues 285–311. Only the ␣ carbons are shown for residues 285 to 304; all main-chain atoms are shown for residues 305 to 311. All of the atoms for P286 and P297 are shown. The coding for all but the C␣ model has nitrogen atoms as filled circles and the oxygen atoms as partially filled circles.
i ⫹ 4 residues. The ␣ helix shown in Fig. 6.6 contains two proline residues. Notice that at the N terminal of an ␣ helix, a proline fits well; the carbonyl oxygen can still participate in the normal hydrogen-bonding pattern. However, P297 is in the middle of the ␣ helix and causes a break in the hydrogen-bonding pattern and a bend in the helix. In addition to interrupting the hydrogen bonding, atom CG of the proline side chain cannot fit into the helical structure. It is noteworthy that while prolines cause bends in ␣ helices, all bent helices are not the result of the presence of a proline. A number of bent ␣ helices have been found in crystal structures that do not have a proline at the point of curvature. If by chance you viewed Fig. 6.5 by the crosseye technique, you will have observed a left-handed (counterclockwise) twist. Remember: Stereoimages prepared for glasses and walleye viewing will appear as the mirror image when viewed crosseyed! In the 1980s, helices in proteins appeared to take on special importance in the field of structural biology, although in most cases they have been predicted rather than observed. There are two forms of ␣ helices that appear in the literature most frequently. One is the so-called amphipathic helix, which by virtue of its amino acid sequence is thought to have a hydrophobic and a hydrophilic face or side. Since there is a nonintegral number (3.6) of amino acids per turn of the ␣ helix, the hydrophobic surface has a tendency to twist about the helix axis. However, if every third and/or fourth residue is hydrophobic, one side of the ␣ helix has a clear hydrophobic surface. The other sides would have polar side chains and the helix has both a hydrophobic and hydrophilic surface. Hence the name amphipathic. It is a popular structural element because it is thought that the hydrophobic face can then interact with the hydrocarbon tails of lipids. This offers a hypothetical way for proteins to interact with membrane or lipoprotein lipids. Incidentally, the prediction of hydrophobic sidedness is frequently done through the graphical helical wheel. Here the helix is projected down its axis and side chains are drawn as spokes appearing every 100⬚ (360⬚/3.6 amino acids per turn). To be amphipathic, adjacent spokes have hydrophobic residues. DNA-binding proteins also have ␣ helices with a characteristic distribution of positively charged residues such as lysine and arginine. A second form of ␣ helix is described as hydrophobic. Amino acid sequences of this predicted secondary structure contain mainly hydrophobic residues, and some workers believe this to be a structural element for anchoring proteins into membrane
6. Secondary Structure of Proteins
81
Fig. 6.7 Six-stranded parallel  sheet. This stereodiagram shows the polypeptide backbone only of  strands of a parallel  sheet. The data are taken from the crystal structure of cytoplasmic malate dehydrogenase. The position of each  strand in the amino acid sequence can be determined from the numbering of at least one residue in each  strand.
lipid. The hydrophobic helix is thought to insert directly into the nonpolar region of a bilayer lipid membrane. Several other properties of ␣ helices should be noted. As is visible in Fig. 6.6, the carbonyl oxygens point slightly away from the helical axis. This apparently makes it possible to form bifurcated hydrogen bonds—one with the –NH– and another with a water molecule. Another point about protein helices: on the C-terminal end of an ␣ helix, there are several free dipolar ⫺C⫽O groups pointing approximately in the direction of the helical axes. Each carbonyl group is itself a dipole but some investigators feel that the alignment of the groups throughout the helix magnifies the dipole moment of this portion of the protein. The helical dipole moment is thought to be nearly equivalent to a full electrostatic positive charge on the NH2-terminal end and an electrostatic negative charge on the COOH terminus. Negatively charged side chains or anionic ligands are often found near the NH2-terminal end, perhaps neutralized by the helical dipole.
 STRUCTURE The Kabsch–Sander definition of  structure is useful for scanning databanks, but this form of secondary structure is also easy to identify visually in a crystal structure. A  strand is most often seen as a relatively straight segment of polypeptide chain. It appears as if someone took the two ends of the polypeptide chain and pulled it taut.  strands have their ⫺C⫽O and ⫺NH groups pointing in alternate and opposite directions. The , angles for peptides in the  conformation are not exactly ⫺180⬚, 180⬚. Each strand, therefore, has a slight twist. Because of the twist in the strands, the resulting  sheet is also twisted. An example of a  sheet is shown in stereo in Fig. 6.7. The example is taken from a dehydrogenase but it has even more general significance. The segment that is shown also contains part of the  structure that is associated with nucleotide-binding proteins. There are a number of proteins that are composed nearly entirely of  structure even though their overall conformations are very different. These similarities and differences in the combination of secondary structural elements are the subject of the Chapter 7 on supersecondary structure.
82
The Collagen Triple Helix
Fig. 6.8 A tight turn. A tight or hairpin turn, typically found in globular proteins, is shown in stereo. It contains amino acids 108–112 (ISGNE). The atoms are as in previous drawings: one circle, carbon; two nested circles, oxygen; three nested circles, nitrogen. The dotted line indicates a ⫺C⫽O H⫺N⫺ hydrogen bond. The example was taken from the crystal structure of intestinal fatty acid-binding protein.
TURNS Although not a repeating form of secondary structure, tight turns nonetheless belong in the category of secondary structure. A tight turn is assigned to segments of proteins where the polypeptide chain reverses its direction in a few residues. The Kabsch–Sander rule for defining a turn was given above. An example is shown in Fig. 6.8. The stereochemical arrangement in Fig. 6.8 is a very common type of tight turn. Note that mainchain atoms for residues 109 through 112 are nearly in a plane. Two residues are present at the end of the U-shaped conformation and a hydrogen bond forms between ⫺C⫽O of 109 and H⫺N⫺ of 112, sometimes called the ith and i ⫹ 3 members. A somewhat similar turn occurs when there is a glycine at i ⫹ 2 or in the location of the asparagine in Fig. 6.8. Glycines are found in the turns because the sharp bend requires and values disallowed when a C atom is present. Asparagines are also frequently found in these reverse turns. Although of unknown significance, notice by using stereo that the side-chain nitrogen of N111 is hydrogen bonded (very close) to its own carbonyl oxygen. This noncovalent bonding arrangement appears to lock the orientation of the next peptide bond in an orientation that is necessary for forming the all important hydrogen bond, the turn hydrogen bond. Although clear patterns of hydrogen bonds characterize these sharp turns, many subtle variations seem possible. Richardson and Richardson (1989) have written an excellent discussion of various types of reverse turns.
THE COLLAGEN TRIPLE HELIX One of the most unusual forms of combined secondary and quaternary structure is found in the protein collagen. It is one of the most abundant proteins found in higher organisms such as mammals. Collagen is found in such systems as tendons and skin. A typical ˚ long and is characterized by repeating consensus sequence collagen fiber may be 3000 A of –G–X–Y–, where Y is frequently proline. It is a triple-stranded extended helix, with each chain having a left-handed helical arrangement.
83
6. Secondary Structure of Proteins
The collagen structure took on a great deal of additional significance when a number of genes were found that coded for protein that included segments of the collagen consensus sequence mentioned above. This generated a whole new class of proteins with a globular and fibrous portion in a single polypeptide chain. An example of a fibrous/globular system is a group of proteins called collectins. In some of the collectins, the formation of the triple helix leads to a threefold arrangement of the globular region. Such a molecule is clearly multivalent in terms of carbohydrate binding, the overall function of the lectin domain. The collectins, for example, are important in preimmune defense against microorganisms (Hoppe and Reid, 1994).
PREDICTION OF SECONDARY STRUCTURE Since the 1970s, attempts have been made to predict the location of helices and  structure from the amino acid sequence. An excellent collection of chapters on the description and prediction of secondary structure is found in the reference edited by Fasman (1989). A critical review of predictive methods has also been written by Schulz (1988). The predictive schemes take advantage of the observation that certain amino acids are found more frequently in helices. For example, such statistical analyses suggest that A, E, L, and M are good ␣-helix formers and P, G, Y, and S are poor helix formers. Although of limited use for predicting conformation, these statistical methods have other important applications. For example, when only the amino acid sequence is known, secondary structure prediction can be useful in detecting similarities to proteins of known structure. This is most useful when the amino acid sequence homology is low but other information suggests structural similarity to a protein of known conformation. Homology in the predicted secondary structure along with conservation of important residues adds credibility to any postulated similarity. The idea of trying to predict the secondary structure of a globular protein has been around almost since the discovery of the Mb/Hb crystal structures. In the late 1980s, secondary structure prediction became even more widely used because techniques in molecular biology made it feasible to obtain primary sequences in relatively short periods of time. While it is a useful adjunct in establishing homologies, it is not totally reliable. From the point of view of structure prediction, the best method does not even use empirical methods relying on helix formers or breakers. Rather, when large tables of amino acid sequence homology are available, so-called parse points can be found (Benner, 1989). These points in the sequence represent turns on the surface of the protein between elements of secondary structure (Benner, 1989). The intervening elements of secondary structure can then be predicted on the basis of certain patterns of hydrophobic and hydrophilic residues, along with the homology table.
SUMMARY Two torsional angles called , are associated with the conformation of a polypeptide chain. The peptide bond itself has double-bond character and is usually in a fixed configuration. When , angles repeat along segments of a polypeptide chain, secondary structure results. The reason such repeats occur is due in part to hydrogen bond stabilization. One of the most common forms of secondary structure is the ␣ helix. Another form of secondary structure,  sheets, consists of extended polypeptide chains with interstrand hydrogen bonds. By cataloging forms of secondary structure from crystal structures of proteins, it is sometimes possible to predict secondary structure from an amino acid sequence (Fasman, 1989; Benner, 1989).
84
References
Fibrous proteins such as collagen represent a nonglobular form of proteins with secondary structure. Recently, combinations of fibrous and globular proteins have been found in a single polypeptide chain.
PROBLEMS Practicing Stereovision 1. Review Fig. 6.4 to determine which of the two conformations is probably wrong or ‘‘disallowed.’’ 2. The amino acid sequence of thioredoxin is given in the caption to Fig. 6.3. Record the sequence numbers of the glycine residues. Look carefully at the disallowed , angles in Fig. 6.2. Note that they are all glycines. Now go back to Chapter 4 and look at the thioredoxin structure in stereo and record where the glycine residues are located. What conclusion can be drawn regarding glycines and , angles? 3. Using Fig. 6.6 if necessary, how many intrachain hydrogen bonds would be formed in an ␣-helical segment containing only four residues? 4. Review Fig. 6.7: Using stereo vision, determine that each  strand runs in a parallel direction. If the stereo drawing is studied carefully, it is possible to see the twist in the individual  strands. Watch the direction of the carbon–oxygen bond. Do these  strands have a left-handed or right-handed twist?
Using Computer Graphics Obtain the coordinates of a collagen fragment (1cag) from the Protein Data Bank. We are going to study secondary structure; therefore while surfing to the PDB, obtain the coordinates for the heart muscle fatty acid-binding protein (1hmt). Now computer graphics can be used to answer the following questions. 5. Measure three and angles for residues in the collagen structure. Plot them on a Ramanchandran graph. Make sure to identify the polarity of each chain in the triple helix. Using this model, write the consensus amino acid sequence for this type of helix. Describe a mutagenesis experiment that would prevent formation of the triple helix. 6. One of the component amino acids of collagen is formed by a posttranslational modification reaction in the cell. Look at the model and, using atom names and the covalent structure, identify the modification. Now switch to the muscle fatty acid-binding protein. PREKIN has an option for drawing main-chain hydrogen bonds that will be useful for answering the next three questions. 7. Measure and plot three and angles from the heart muscle fatty acid-binding protein that are in a -strand conformation. 8. Measure and plot three and angles from the heart muscle fatty acid-binding protein that are in an ␣-helical conformation. 9. Take two adjacent segments of  strands and determine the length of the secondary structure, using the Kabsch–Sander rules.
REFERENCES Benner, S. (1989). Patterns of divergence in homologous proteins as indicators of tertiary and quaternary structure. Adv. Enzyme Regul. 28, 219–236. Fasman, G. (ed.). (1989). ‘‘Prediction of Protein Structure and the Principles of Protein Conformation.’’ Plenum Press, New York.
6. Secondary Structure of Proteins
85 Hoppe, H.-J., and Reid, K. (1994). Collectins—soluble proteins containing collagenous regions and lectin domains—and their roles in innate immunity. Protein Sci. 3, 1143–1158. Kabsch, W., and Sander, C. (1983). Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22, 2577–2637. Richardson, J., and Richardson, D. (1989). Principles and patterns in protein structures. In ‘‘Prediction of Protein Structure and the Principles of Protein Conformation’’ (G. Fasman, ed.), pp. 1–98. Plenum Press, New York. Schulz, G. E. (1988). A critical evaluation of methods for prediction of protein secondary structure. Annu. Rev. Biophys. Biophys. Chem. 17, 1–21.
C H A P T E R
7 Domains and Supersecondary Structure INTRODUCTION tructural studies, either by crystallography or nuclear magnetic resonance (NMR), have revealed both the complexity and to some degree the simplicity of the conformation of globular proteins. A number of investigators have shown that even when looking at a large complex structure, it is often possible to divide it into smaller units referred to as domains. A domain is part of the overall structure that usually contains a linear segment of the polypeptide chain folded into a relatively compact region. In a few instances, a domain may be divided into two segments of the primary structure. For a hypothetical example, imagine a two-domain protein. Residues 1–25 belong to domain 1. Residues 26–75 form domain 2. To complete the structure, amino acids 76–100 again belong to domain 1. While a two-domain protein usually would be divided by residues 1–50 and 51–100, the exception noted above has domain 1 formed by two separated segments of the primary structure. Not unexpectedly, domains often are associated with a special biochemical function. Janin and Chothia (1985) describe two principal ways of observing domains in the crystal structures of proteins: (1) Domains may be apparent during visual inspection of the protein model. A domain may appear as a lobe; (2) distance maps may also be useful for defining domains in a protein structure. A distance map depicts a matrix of the distances between all pairs of C␣ atoms in the crystal structure. An example of a distance matrix is shown in Fig. 7.1. The plot has been calculated from the crystal coordinates of malate dehydrogenase (MDH) from Escherichia coli. This enzyme contains both a dinucleotide-binding domain and a so-called catalytic domain. For the sake of simplicity, only residues 100 to 200 were included in the plot; MDH (E. coli) contains a total of 312 amino acids per subunit. Notice that the shortest distances between C␣ atoms, the darkened squares, fall into two clusters—residues 100– 150 and 150–200. In fact, if the distance matrix for an entire subunit were shown, two clusters would still be apparent. The first would include residues 1 through about 150, and the second would include residues 150–312. The former is the dinucleotide- or NAD-binding domain. The latter is called the catalytic domain and contains residues involved in binding the dicarboxylic acid substrate. Another useful simplifying approach to understanding and remembering complex protein structures is the idea of supersecondary structure. Supersecondary structure is the combination of elements of secondary structure into a motif that is found repeatedly in different proteins. The simplest example is the supersecondary structure composed of an ␣ helix, a reverse turn, and a  strand. This is referred to as an ␣ structure.
S
86
7. Domains and Supersecondary Structure
87
Fig. 7.1 Distance diagonal plot for crystalline malate dehydrogenase (1EMD). The matrix plot is derived from distances between C␣ atoms in a polypeptide chain of crystalline malate dehydrogenase. Only residues 100 through 200 are shown. The distances are coded by shading. Any C␣˚ or greater than 13 A ˚ is white. The remaining distances are to-C␣ distance that is less than 1 A ˚ bins, with the closest atoms having the darkest shading. divided into 1-A
Supersecondary structural elements are also recognizable in distance diagonal plots. In Fig. 7.1, antiparallel  structure appears as darkened (nearby atoms) regions perpendicular to the diagonal. The reader should consider the appearance of different forms of secondary structure on distance diagonal plots. It is useful to think of supersecondary elements as occurring independent of evolutionary effects or requirements for biological function. They are strictly a thermodynamically stable combination of ␣ helices,  sheets, and turns. Thermodynamic versus functional distinguishes between supersecondary and domain structures. An excellent, color-illustrated introduction to the concept and cataloging of protein domains, supersecondary structure, families and motifs, etc., can be found in the textbook by Branden and Tooze (1991). Some stereoviews of an example of supersecondary structure are presented below.
DOMAINS A single-polypeptide chain frequently folds into a three-dimensional structure by forming a number of what appear to be independent units. This is visually apparent if the
88
Domains
Fig. 7.2 Domains in the nuclear receptor family of proteins. The domains of the nuclear receptor family of proteins are shown in a schematic fashion. (A) Individual domains within a single polypeptide chain are indicated by the letters A/B, C, D, E, and F. (B) A diagram of one of the family members, the human estrogen receptor. Numbers indicate the positions of amino acids, starting from the NH2 terminus.
crystal structure is known. However, multidomain proteins are often too large to be analyzed by NMR methods. To make structural studies more feasible, an important issue in structural biology has become the problem of identifying multidomain proteins and the location of the substructures within the total amino acid sequence. When this is possible, cloning and expression of the appropriate coding segments will permit structural studies on the individual domains. Both crystallographic and NMR methods then become more practical. There are four principal experimental approaches for characterizing domains in a polypeptide chain: (1) creating a crystal structure of the intact protein, (2) deleting segments by cloning methods, (3) treating the protein with protease and assigning biochemical functions to the fragments, and (4) identifying characteristic amino acid sequences that flag properties of that polypeptide segment. To identify domain segments in a large protein is often a difficult problem. The most successful method is attributable to the molecular biologists. By deleting large segments of the gene and observing either phenotypic results in cells or biochemical properties in vitro, they are able to assign certain properties to subsections of a protein. A good example are the nuclear receptor proteins. Together they form a large family of proteins responsible for forms of transcriptional regulation in response to such hormones as estrogens, retinoids, and glucocorticoids. They are represented by the schematic diagram in Fig. 7.2. A domain breakdown of the nuclear receptor family is shown in Fig. 7.2A. The family is divided into five domains: A/B, C, D, E, and F. The A, B, and F domains are thought to be responsible for transcriptional modulation. The C domain has been demonstrated to bind to duplex DNA and consists of zinc fingers (an example of a zinc finger is given in Chapter 10). The E domain is the segment of the protein responsible for binding the hormone. Finally, the D domain represents some sort of hinge region of unknown conformation. The idea of domains can be visualized in the primary sequence of a protein once some structural information is available. A simple example is cytochrome b5, which is a microsomal membrane-bound b-type cytochrome. The intact protein has never been crystallized, but if the membrane form is treated with a protease, a soluble hemecontaining protein results. The amino acid sequence of cytochrome b5 from calf liver is shown in Fig. 7.3. The crystal structure of the heme-binding domain has been completed and structural factors related to prosthetic group binding are visible in the resulting model. The second domain of this protein stays in the membrane lipid fraction or is soluble only in the presence of detergents. It contains 40 residues, 18 of which are clearly hydrophobic, and only 5 residues are acidic or basic. By the definition given above, this cytochrome has two domains: the heme-containing portion and the membrane-anchoring part.
7. Domains and Supersecondary Structure
89
Fig. 7.3 The amino acid sequence of cytochrome b5. The amino acid sequence of calf liver cytochrome b5 is shown schematically as its two principle parts. The small segment on the NH2 terminus is lost by proteolysis during removal from the membrane. The numbering is based on an old scheme in which the NH2 terminus of the soluble domain was thought to be A3. The soluble domain, which has had its crystal structure determined, extends from A3 to I85. The membrane-binding domain is outlined as the C-terminal segment.
Fig. 7.4 Cytochrome b5. This stereoview contains two segments of the cytochrome b5 molecule. The first segment, from the NH2 terminal to residue 87, was determined by X-ray crystallography and the coordinates obtained from the Brookhaven Data Bank. The ‘‘membrane-binding domain,’’ from residue 87 to the COOH terminal, is hypothetical. This domain was added to the crystal coordinates in the form of an ␣ helix. The helix was then ‘‘folded’’ so that three segments of helices were present in a compact domain. Like many representations of membrane-binding domains, it is a figment of the author’s imagination.
A stereodiagram of the structure of cytochrome b5 is shown in Fig. 7.4. The representation contains the two domains already mentioned. Up to residue 87, the conformation was determined by X-ray crystallography. The structure of this ‘‘soluble heme-binding domain’’ is accurately known. Viewed in stereo, the heme position can be seen at the top of the drawing. The membrane-binding domain, i.e., residues from
90
Domains
Fig. 7.5 Dinucleotide-binding domain in dehydrogenases. This schematic drawing illustrates the primary sequence of several dehydrogenases of known crystal structure. The dinucleotidebinding domain, represented by the symbols A1 and A2, is the common structural domain for this family of proteins. The other domains are labeled C, D, E, and F. They are structurally dissimilar.
position 87 to the COOH terminal, is folded in a totally hypothetical fashion as described in the caption to Fig. 7.4. Many membrane proteins are thought to contain two or more domains. Frequently, one of them is thought to act as an anchor. The other protrudes above the membrane surface, where it can interact with the cytosol or with extracellular fluid. Of course, little is known about the structure of the second domain labeled ‘‘membrane anchor’’ above. The cytochrome b5 example used here is not unique. Crystal structures of one or more domains from a multidomain protein are known for other systems including DNAbinding proteins and even some enzymes. There are, however, crystal structures of multidomain enzymes where the entire structure is known. A well-documented example is found in a class of enzymes called dehydrogenases. Most of these enzymes bind at least two substrates or reactants; one is the coenzyme, either NAD/NADH or NADP/NADPH. The other is a small molecule substrate. In the examples that follow, the substrates would be ethanol or malate. The distance diagonal plot shown in Fig. 7.1 illustrates the two domains as they appear in malate dehydrogenase. In terms of the primary structure, the relationship of the dinucleotide-binding domain as it is present in several different enzymes is shown in Fig. 7.5. Notice that the common structural element, the dinucleotide-binding domain, is located in several different regions of the primary structure. In lactate and malate dehydrogenase, it is on the NH2-terminal end, and the two enzymes are structurally similar throughout. Both have a similar C domain. The NAD-binding domain in alcohol dehydrogenase is near the middle of the amino acid sequence. The remainder of the structure is very different from the two containing the C domains. It is attractive to think of the evolution of such domain-containing enzymes as resulting from the genetic assembly of different building blocks (domains). However, the evolutionary steps must be far more complex. Although the dinucleotide-binding domains appear to be common units, they contribute only part of the structure necessary for catalytic activity. The labeling of the dinucleotide-binding domain should be noted. It has been designated in two parts labeled A1 and A2. The crystal structures of these proteins showed that crude conformational homology exists between A1 and A2. It was subsequently observed that many mononucleotide-binding proteins had a domain that appeared to be homologous with either A1 or A2. That is, they had a domain with one-half of a dinucleotide-binding domain. This is now well known as the ‘‘nucleotide-binding
7. Domains and Supersecondary Structure
91
Fig. 7.6 The dinucleotide-binding domain in dehydrogenases. (A) This stereoview shows the C␣ model of residues 1 to 149 of malate dehydrogenase (E. coli). Every tenth amino acid is numbered. (B) In roughly the same orientation, residues 193 to 318 of the crystal structure of horse liver alcohol dehydrogenase are shown. Again, every tenth residue is numbered.
domain’’ recognizable by two, three, or four parallel  strands and an ␣ helix (Birktoft and Banaszak, 1984). It is also sometimes recognizable in the amino acid sequences by a certain arrangement of glycine residues i.e., –GXGXXG–, near the beginning of the nucleotide-binding domain (Wierenga et al., 1986). It appears that there is a structural domain within a domain, i.e., a subdomain. The idea that two dehydrogenases contain a structurally similar domain for binding the same coenzyme is easy to accept. After all, they carry out similar catalytic reactions. More difficult is the problem of acquiring some idea of what the word similar means in terms of protein conformation. To assess the similarity, it is necessary to study stereodiagrams carefully. Two of the enzymes described in Fig. 7.5 are malate dehydrogenase from E. coli (eMDH) and horse liver alcohol dehydrogenase (LADH). They differ widely in the location of the dinucleotide-binding domains in their respective primary sequences. Stereoviews of the dinucleotide-binding domain found in these two proteins are shown in Fig. 7.6A and B.
92
Domains
TABLE 7.1 Subdomains in Two Dinucleotide–Binding Proteinsa Protein Subdomain
MDH
LADH
A1 Connector A2
1–60 (60) 61–70 (10) 71–149 (79)
193–243 (51) 244–262 (18) 263–318 (56)
a Residue numbers are given in reference to Fig. 7.5. The numbers in parentheses give the number of amino acids in that segment.
The NAD-binding domain of these two proteins are clearly similar. Even without stereoviewing, the ‘‘notch’’ visible at the bottom right of Fig. 7.6A and B is present in both proteins. This notch occurs near the end of the principal  strand. In eMDH, this strand begins with residue 1; in LADH, with residue 193. The reader should take the time to look carefully at both domains, counting the six parallel  strands and the interconnecting ␣ helices. Starting from the left, the principal  strand will be the fourth. It is at the end of this strand that the consensus di- and mononucleotide-binding domain sequences are to be found. Study the amino acid sequences given below and then look at the corresponding structures in the stereoviews given in Fig. 7.6A and B. -
7 G-A-G-G-IG-L-G-G-V199 G-X-G-X-X-
12 GG204 G-
eMDH (dinucleotide–binding domain) LADH Mononucleotide–binding domain
The first glycine residue in the consensus sequence is at the very end of the principal  strand. First discovered in the dehydrogenases and later in other nucleotidebinding proteins, it is at this point that the protein comes very close to the adenine ribose of nucleotides when bound to their respective proteins. It will be left to the reader to determine if the other glycines are in structurally homologous positions in eMDH and LADH. Remember, this comparison can only be done using stereoviewing! In Fig. 7.5, the schematic description of the NAD-binding domains in dehydrogenases suggests two similar subdomains; they are shown under the headings A1 and A2. Using the stereoviews in Fig. 7.6A and B, a comparison of the subdomains can be done. In both proteins, compare the rightmost three  strands and two ␣ helices with the leftmost. Observe that the overall supersecondary structure of the two subdomains is visibly similar. However, major differences in the lengths of the secondary structural components can be found. This may be the best point for the reader to acquire a feeling for the word similarity as used by structural taxonomists. The subdomains in the two dehydrogenases as defined by the crystal structures are listed by their amino acid sequence numbers in Table 7.1. Altogether, 149 residues are present in eMDH and 126 are present in LADH. But note in Table 7.1 that insertions (or deletions) are occurring in random locations. They are not easy to find in the stereodrawings and in ribbon diagrams are nearly always lost. In eMDH, the A1 subdomain includes residues 1 through 60, and is longer by nearly 10 amino acids than that in LADH. In LADH, however, the connecting polypeptide segment is about twice as long as is found in eMDH. Finally, in the A2 subdomain,
7. Domains and Supersecondary Structure
93
Fig. 7.7 The domain structure of an immunoglobulin: This schematic representation of the primary structure of half of an immunoglobulin molecule also contains the location of the disulfide bonds. The second half of the molecule would be generated by rotation around the indicated twofold axis. The individual domains, some of which are shown in Fig. 7.8, are labeled as follows: VH, CH1, CH2, and CH3, the four domains belonging to the heavy chains; VL and CL, the two domains belonging to the light chains. The dotted box encloses half of an Fab fragment.
eMDH has an extra 23 residues, most of which are in the form of a bent ␣ helix following the  strand beginning at residue 71. Study the structural components of the two enzymes, using the stereodiagrams found in Fig. 7.6, and look for these differences. Keep in mind that this is only a C␣ model, and that if side chains were included the differences would be even greater. Taken together, the similarities and differences between eMDH and LADH are a warning. Imagine new results in which an amino acid sequence of a new dehydrogenase has been determined. Even after locating the consensus sequence, in this case –GXGXXG–, in the protein of unknown structure, predicting the location of other secondary structural components of the domain would still be difficult. As seen here, variations through insertions (or deletions) occur randomly along the polypeptide chains of eMDH and LADH. Some of them can be very long—23 amino acids. Finding correct sequence frames throughout a long primary structure is difficult. While this is but one example of a structural domain, similar variations would be found in comparisons of proteases, DNA-binding proteins, or any other group of proteins containing a common domain. Perhaps the best and certainly one of the most studied examples of proteins with domains is found in the family of immunoglobulins (IgGs). A short, general review of IgG structures has been written by Davies and colleagues (1988). Amino acid sequence and X-ray studies show that one form of immunoglobulin contains two identical subunits and four polypeptide chains as depicted in Fig. 7.7. One heavy chain and one light chain are connected by a disulfide bond; each domain contains one disulfide bond, and two heavy chains are linked by a disulfide bond. Since each of the domains shown in Fig. 7.7 contains roughly 110 amino acids, a single dimeric IgG molecule includes about 1320 amino acids; six domains from one light and one heavy chain multiplied by the two subunits in a dimeric IgG molecule. Obviously the structure is very complicated. Treatment with proteases gives relatively homogeneous fragments called Fab and Fc.
94
Domains
Fig. 7.8 Stereodiagram of a crystalline Fab fragment. This stereodiagram shows the C␣ atoms of the heavy chain from the Fab fragment of an IgG molecule labeled MC/PC 603. The coordinates were obtained from the Protein Data Bank entry 1MCP. A total of 222 residues is shown, along with the side chains for C22, C98, C148, and C206. These side chains are shown as somewhat darkened spheres.
The Fab fragment would contain the domains VH, CH1, VL, and CL as shown in Fig. 7.7. This fragment also contains the antigen-binding site. For purposes of this discussion, the Fab fragment of a heavy chain is shown in stereo in Fig. 7.8. The drawing represents only part of the crystal structure, which also contained a light chain. Two domains are clearly visible—the lower segment is a VH domain, the upper a CH1 domain. The structural homology between the two domains should be clearly evident when viewed in stereo. Note, however, that the lower domain in Fig. 7.8 is a  barrel of nine strands while the upper CH1 or constant domain is again a  barrel but has only seven strands. Overall, the crystal structure included four domains, only two of which are presented in Fig. 7.8. Consider now the relative orientation of the two domains. Note that the domains are not related to each other by any standard symmetry operation, but rather they seem to be arranged as two lobes attached by a hinge or elbow segment. Residues 122, 123, and 124 form the ‘‘elbow’’ bend that connects the two domains (Davies et al., 1988). The amino acid sequence is –SES–. The arrangement of the IgG domains is variable. The elbow bend may vary from 132 to 172⬚ (Davies et al., 1988). Looking at Fig. 7.8, the two  barrels appear to make an angle of roughly 110⬚ with each other, although the angular relationship appears smaller because it is viewed in projection. The basis for variation in the elbow or hinge angle is unknown. It clearly appears to be designed to accommodate conformational changes. Whether this hinge flexibility is fixed on formation of the complete IgG molecule or later on binding antigen is still unknown. Variation in domain–domain orientation has been found in proteins other than IgGs. Furthermore, in a few cases, ligands can alter the stereochemical relationship of the component domains. Although conformational changes are discussed in more detail in Chapter 8, domain–domain reorientation is one class of conformational change that proteins can undergo. Three examples of domains have been studied. Most membrane proteins contain an anchoring domain, usually of unknown conformation. In some enzymes, no visible lobes distinguish one domain from the other. This is the case for the dehydrogenases
7. Domains and Supersecondary Structure
95 (although only one domain was shown in the stereodrawing in Fig. 7.6). Immunoglobulins represent a distinctly different type of protein built from domains—in fact, structurally homologous domains that appear as lobes. Here the domains, each a seven- or nine-stranded  barrel, are connected by a segment of polypeptide chain resembling a hinge or elbow. Different IgG molecules have different hinge angles, although the variation was over a relatively narrow angular range.
Supersecondary Structure Supersecondary structure is the combination of secondary structural elements into a motif found in widely variant proteins. It is a poorly understood aspect of protein structure, and is presently of unknown conceptual value. It addresses the question of whether there are special combinations of secondary structural elements that are themselves more thermodynamically stable than other arrangements of the same elements. To describe such stereochemical conformations three things must be considered: (1) the nature or type of secondary structure, (2) the order of connectivity of secondary structure, and (3) the handedness of the motif. Although a number of workers have contributed to our understanding of how secondary structural elements combine to give higher levels of organization, a particularly well-illustrated summary has been given by Richardson and Richardson (1989). First consider a simple case, a protein consisting of four helices. Using your imagination, you could pack these helices together in many ways. However, although there are now several examples of four-helix proteins, the orientation of the helices in all of them appears to be approximately the same. The helical axes are always nearly but not quite parallel. The four-helix proteins of known crystal structure have widely divergent biological function, and it is doubtful that they could be related in any evolutionary way. An even more frequently found form of supersecondary structure is the ␣ or ␣ element, which consists of a  strand connected by a sharp turn to an ␣ helix. A typical ␣ segment is shown in Fig. 7.9B. Note that the ␣ helix and the  strand run nearly parallel and are roughly the same length. Imagine a protein containing eight of these elements arranged in a nearly circular fashion. An ␣ barrel of this sort represents the structure of triose phosphate isomerase and a dozen or so other enzymes of different biological function. A stereodiagram of the C␣ model of triose phosphate is shown in Fig. 7.9A. Diagrams drawn of this family of proteins suggest a nearly parallel arrangement of the ␣ units. However, as can be seen when viewed in stereo, each of the supersecondary structural units is canted relative to the others. We have seen handedness as it concerns the ␣ helix, which is always right handed. But handedness also applies to supersecondary structures as is shown in Fig. 7.10. In this schematic drawing, a two-stranded  sheet is shown. The two strands are connected by a stretch of random coil and it can be seen that two crossover connections are possible. To decide about the handedness of the motif, the supersecondary structure must be defined first. In Fig. 7.10, the pattern is  strand–random coil– strand. To define the handedness, notice the line drawn through the center of the supersecondary structure. This axis or center line passes from in front to in back of the drawing shown in Fig. 7.10. Now imagine a vector drawn perpendicular to the axis in the direction of the representation of the polypeptide chain. Move up the imaginary axis and draw another vector. If these vectors are rotating clockwise (looking down the axis), the supersecondary structure has a right-handed motif and vice versa. In cataloging handedness or even reading the literature, it is easy to become confused by such statements as ‘‘. . . the twisted sheet is left-handed,’’ ‘‘. . . each strand has left-handed twist,’’ ‘‘. . . the four parallel helices are twisted in a right-handed manner,’’ etc. Most often the confusion arises because the reader is uncertain as to what has chirality. Therefore, characterization of the handedness of any form of supersecon-
96
Domains
Fig. 7.9 Supersecondary ␣ element. (A) The stereodiagram contains the complete C␣model of triose phosphate isomerase. Every tenth residue is numbered. (B) A common repeating motif of an ␣ helix–turn– strand is shown in stereo. The segment includes residues 18 through 45 of 1TIM, chicken triose phosphate isomerase. The sequence for this section is RKSLGELIHTLDGAKLSADTEVVCGAPS.
dary structure must begin with a description of what elements have this chirality. For example, the  strands of a parallel sheet structure could each have a slight right-handed twist. Each strand has the twist. This means that as you look down an extended bit of polypeptide chain, the bond connecting the carbonyl carbon to the carbonyl oxygens farther and farther down the chain move in a clockwise direction. Try making a crude drawing of the carbonyl bond directions for a  strand with a right-handed, clockwise twist. When these right-handed twisted strands form the  sheet, the sheet itself may have a left-handed motif. The sheet is left-handed! This means that if you look edge on to the side of the  sheet, the  strands farther and farther from you appear to be moving in a counterclockwise direction. This is illustrated in Chapter 6 (Fig. 6.5). Once again you must first decide what it is that has chirality. Then determine the handedness, using the principles shown in Fig. 7.10. For further discussion or help on this topic, study the review by Richardson and Richardson (1989).
97
7. Domains and Supersecondary Structure
Fig. 7.10 Handedness in supersecondary structure. This diagram contains a representation of a –coil– form of supersecondary structure. In the left drawing, the  connection is made above the sheet and in the other it is made below. Although the polarity of the  strands is of no consequence, imagine that the N- to C-terminal direction is given by the arrows on the strands. The line with the double-headed arrow represents an axis or center line, which could be used to generate the crude conformation of the polypeptide chain. Imagine a vector perpendicular to and connecting the axis to the polypeptide chain. Continue drawing vectors connecting the axis to the polypeptide chain. In the representation at the left, the lines would rotate in a clockwise direction and the leftmost motif is right handed. The other structure would have a left-handed supersecondary structure. In cataloging the supersecondary structure of crystalline proteins, it can be shown that nearly all –coil– elements are right handed.
SUMMARY Crystal structures have shown that many proteins are composed of domains. The domains usually have some functional significance. A domain is a compact portion of a globular protein containing a contiguous segment of the primary sequence. Families of proteins may have homology in one domain and not in another. In some instances, multidomain proteins may have an interconnecting polypeptide segment that permits alternative orientations of the domains relative to each other. Not to be confused with a domain, supersecondary structure is a term used to describe different combinations of secondary structure. Supersecondary structures are found frequently because they are energetically stable. Elements or forms of supersecondary structure are described by their secondary structure components, by the arrangement of these components in the primary sequence, and by any chirality that may be generated.
PROBLEMS 1. Review Fig. 7.1; list the helical residues that are present.
Practicing Stereovision 2. Look at Figs. 7.3 and 7.4; does it appear that the amino acids forming the proteolytic cleavage point for solubilization of the heme-containing domain of cytochrome b5 will be accessible?
98
References
3. Try to identify the  strands in the dinucleotide-binding domains shown in Fig. 7.6A and B. Test your stereo skills by convincing yourself of the overall handedness or twist to this -sheet structure. Is it left handed or right handed? 4. Using Fig. 7.8, determine which residues connect the bottom and top domains of this segment of a heavy chain of an IgG molecule. Also, determine where disulfide bridges are formed, on the basis of the proximity of cysteine residues. 5. In the ␣ domain shown in Fig. 7.9A, which amino acid side chains in the ␣ helix will be nearest the  strand? List the amino acid sequence through the turn. 6. Make drawings of two three-stranded  sheets, such that one sheet has the component strands arranged with a left-handed twist and one is right handed.
Using Computer Graphics Obtain the coordinates of dihydrofolate reductase (6dfr) from the PDB. This is one of the smallest proteins containing a nucleotide-binding domain. 7. Look at the C␣ model of dihydrofolate reductase and find the five  strands that form the core of the molecule. Write down the beginning and end residue numbers for each strand. While checking the beginning and end residues, determine if they are running in a parallel or antiparallel direction. 8. Change the color of the strands identified in problem 7. Study the image to determine the handedness of the  sheet. Is it right or left handed? If the image is being viewed in stereo, press the C key. This toggles the image from Crosseye to Walleye; after changing the type of stereo, determine the handedness of the strands again. Has the handedness changed? 9. Take the longest  strand and, using the Range option in PREKIN, create a .kin file with only the main-chain atoms of a single strand of  structure in the display. Estimate the smallest (acute) angle between the C⫽O vectors while looking down the strand. Is there a right- or left-handed twist to the strand? (Note: When the twist per residue is close to 180⬚ this can be a difficult question to answer!)
REFERENCES Birktoft, J., and Banaszak, L. (1984). Structure–function relationships among nicotinamideadenine dinucleotide dependent oxidoreductases. In ‘‘Peptide and Protein Reviews’’ (M. Hearn, ed.), Vol. 4, pp. 1–46. Marcel Dekker, New York. Branden, C.-I., and Tooze, J. (1991). ‘‘Introduction to Protein Structure.’’ Garland Publishing, New York. Davies, D., Sheriff, S., and Padlan, E. (1988). Antibody–antigen complexes. J. Biol. Chem. 263, 10541–10544. Janin, J., and Chothia, C. (1985). Domains in proteins: Definitions, location, and structural principles. Methods Enzymol. 115, 420–430. Richardson, J., and Richardson, D. (1989). Principles and patterns of protein conformation. In ‘‘Prediction of Protein Structure and the Principles of Protein Conformation’’ (G. Fasman, ed.), pp. 1–98. Plenum Press, New York. Wierenga, R., Terpstra, P., and Hol, W. (1986). Prediction of the occurence of the ADP binding ␣-fold in proteins using an amino acid sequence fingerprint. J. Mol. Biol. 187, 101–107.
C H A P T E R
8 Conformational States in Crystal and Nuclear Magnetic Resonance Structures INTRODUCTION n Chapter 7, two examples of conformational differences in proteins were described. Recall that in the family of immunoglobulin molecules, the -barrel domains of a single polypeptide chain can have different orientations with respect to each other. A similar example was described in Chapter 5, on quaternary structure. Domain– domain differences in the coat proteins of icosahedral spherical viruses make possible the quasiequivalence of subunit–subunit interactions in the assembled virion. While these are important discoveries, other conformational states even more closely related to dynamic biological function can also be determined by comparing three-dimensional structures in multiple states. Although crystallographic studies look at time-averaged molecular structures, the method still can be used to determine conformational changes. Principles of how this can be done are described in Chapter 2. Either the two crystalline conformational states are isomorphic and the conformational change is obtained directly from a difference electron density map, or two different crystal structures are determined. Crystalline forms of the same protein are isomorphic if they have the same unit cell dimensions and the protein molecules are oriented in the same way in both lattices. This means that after a substrate is diffused into a crystal, the unit cell dimensions should agree with the native protein within less than 1%. In both cases, the X-ray analysis gives coordinates for both the liganded and unliganded states and therefore the conformational changes in the protein accompanying binding. The X-ray studies give no information about the mechanism of the change. However, careful analysis of the crystallographic coordinates for the two states often make it possible to postulate the order of conformational changes. To determine major conformational differences occurring in nuclear magnetic resonance (NMR) structures is a similar problem. Data must be acquired and analyzed under conditions for both conformational states. Whether one uses crystallographic or NMR methods, and once the conformations of the two states have been determined, a major hurdle still exists. To obtain information about conformational changes, atomic coordinates for the two states must be carefully compared. This requires that the coordinates be as close to superimposed as is possible. Otherwise subtle differences may go unobserved! To superimpose two conformational states or even two homologous proteins, either a numerical or a visual method may be used.
I
99
100
Comparison of Two Conformational States
To illustrate how this should be done, it is easiest to start with an example. One of the most complex structural changes occurs in cooperative or allosteric proteins. In such instances, the binding of a small molecule usually results in a conformational change. The alteration in molecular structure leads in turn to changes in the binding constants for other ligands or substrates or the same ligand on a symmetry-related subunit. The most studied example is the cooperative binding of oxygen to hemoglobin. The molecular structures of the oxy and deoxy forms (abbreviated HbO and deoxy-Hb, respectively) from several species have been studied. Since the conformational changes are dramatic, crystallographic analyses of different forms of Hb almost always involve the study of different crystal forms. For example, deoxyhemoglobin crystals break up or become disordered on oxygenation. A different crystalline form of HbO had to be prepared and studied independently. Significant conformational changes in a crystalline protein almost always lead to crystal degradation.
COMPARISON OF TWO CONFORMATIONAL STATES The visual method of comparison is conceptually easy. In the transition of HbO to deoxy-Hb, visual comparisons were made even at the low-resolution stage, where all ˚ resolution. Such electron that was available was the electron density maps at 5.5-A density maps and the resulting balsa wood model shown in Chapter 1 were used. Perutz and co-workers noticed immediately that the most obvious change was in the quaternary structure. The transition from HbO to deoxy-Hb involved a major rearrangement of the ␣ and  subunits. It was also apparent to the hemoglobin scientists that to bring about such a dramatic quaternary change, conformational changes must occur at the level of individual subunits. Therefore to follow the oxy- to deoxy-Hb changes, it was necessary to make a careful comparison of both the ␣ and  subunits in both states. Most of the hemoglobin studies were completed before major graphical computer systems were available. Nonetheless, to do a visual comparison, the coordinates from the Protein Data Bank of both molecules are entered into the computer and displayed as stick models, preferably just as C␣ models. Most molecular graphics programs accommodate the display of at least two protein structures. By displaying two models and manipulating one of them, one can obtain relatively accurate superpositioning. Most molecular display programs have facilities for displaying in stereo and this simplifies the problem. If care is taken while superimposing the two structures, it is relatively easy then to compare the conformations and list the structural differences. Any difficulties at this step usually involve deciding which part of the conformation to use during the initial superpositioning. Obviously it should be a segment that is unchanged in the two conformational states, but this is unknown at the beginning of the comparison. Unfortunately, many of the PC computer programs that could be used in these chapters to study protein molecules are not equipped to move one molecule independent of the other. Mathematical methods for superimposing coordinate sets are necessarily more complex but less subjective. The mathematical approach is based on the fact that two coordinate sets of identical or homologous molecules in any orientations can be brought into near coincidence by a general rotation and a translation. The procedure is discussed in detail by Matthews and Rossmann (1985). The overall mathematical approach is illustrated in Fig. 8.1. For mathematical reasons, the general rotation mentioned above may also be described by three Euler rotations, but that is unnecessary for the present discussion. Two homologous protein molecules are shown in Fig. 8.1. In the numerical procedure, their conformations are adequately represented by the coordinates of the C␣ atoms only. There are four elements of Fig. 8.1 that should be studied carefully: (1) The line P has direction cosines l, m, n with respect to the coordinate axis x,y,z. The line P
101
8. Conformational States in Crystal and Nuclear Magnetic Resonance Structures
Fig. 8.1 Comparison of two conformational states. A way of viewing the coordinate transformation that must be done to superimpose two similar protein molecules involves a general rotation and a translation. The two molecules to be superimposed are labeled molecule 1 and molecule 2. The general rotation of ⬚ is done around the axis P, generating molecule 2'. The coordinates are then moved by the translation vector T. The core equations are described in text.
is the axis around which molecule 2 will be rotated; (2) molecule 2 is now rotated around line P through the angle . In Fig. 8.1, the coordinates are now represented by the shaded image labeled molecule 2'; (3) next, the centers of gravity must be brought into coincidence. The centers of gravity are marked by the filled circles attached to each molecule. The vector T moves the center of gravity of molecule 2' onto the center of gravity of molecule 1. In vector notation, the coordinate transformation for molecule 2 is Xr ⫽ Mr[X1] ⫹ T
(8.1)
Xr represents the coordinates of molecule 2 after it has been rotated and translated for superpositioning with molecule 1. M is a 3 ⫻ 3 matrix with members rij such as is shown in Eq. (8.2).
⎡r11 M ⫽ ⎢r21 ⎣r31
r12 r22 r32
r13⎤ r23⎥ r33⎦
⎡tx⎤ T ⫽ ⎢ty⎥ ⎣tz⎦
xr ⫽ (r11x1 ⫹ r12 y1 ⫹ r13z1) ⫹ tx etc., for yr and zr
(8.2) (8.3)
The arithmetic for the coordinate transformation can be done using PC programs such as Excel, Kaleidograph, etc. Obtaining the optimal values for the matrix M and the vector T is done by minimizing the sum of the squares of discrepancies between equivalent C␣ atoms. This is called the method of least squares and is referred to frequently in the crystallographic and NMR literature. Formally, the variable is defined as n
⫽ ⌺ [(x1k ⫺ xrk)2 ⫹ (y1k ⫺ yrk)2 ⫹ (z1k ⫺ zrk)2] k⫽1
(8.4)
The summation is done for n atoms and x,y,z are the corresponding coordinates for the two molecules. The least squares condition is ␦/␦rij ⫽ 0
and ␦/␦ti ⫽ 0
(8.5)
There are several ways to obtain the transformation matrix and translation vector that satisfy the least squares condition. However, the solution must be obtained such that the resulting matrix is orthogonal; in other words, it must not result in any change in the reference frame (angles and dimensions).
102
The Oxygenation of Hemoglobin: Two Crystal Conformations
This same method can be used to optimize the superposition of two molecules that are only partly similar. It should be obvious that this has an added complication. The numerical method must include a procedure for choosing the similar parts. Such additional complications are also discussed in the article by Matthews and Rossmann (1985). The complexity of devising algorithms for choosing similar parts of partially homologous proteins can be circumvented by visual comparisons of amino acid sequences or crystallographic models. The idea is to decide ahead of time which atoms are homologous and then apply the numerical method. There are now a number of graphics programs that allow two structures to be compared visually, and then mathematically apply the least squares criteria to selected parts. Dealing only with freeware, the program SwissPDB is available for both Macintoshes and Windows PCs. It can be downloaded from www.expasy.ch. If a Unix machine is available, the Swedish group that built the O program has an interactive program called Lsqman. Freeware again, it may be downloaded from http:// alphaZ.lmc.uu.se/asf. Learning to set up either the visual or mathematical approach will require learning all about the new software, but the process of superpositioning is a vital aspect of structural biology! The numerical methods involving a least squares procedure always produce a single number for the root mean square (rms) difference between the two segments used in superimposing the coordinates. This is a value, in angstroms, defining how well two coordinate sets agree after they are overlaid, and is described in Eq. 8.6. If the superposition of one molecule onto another worked perfectly and the two proteins were confor˚. mationally totally homologous, the rms difference would be 0.0 A N rms ⫽ [⌺ (dj)2]1/2 / N⫺1, J⫽1
(8.6)
where dj is the vector distance between the corresponding Jth atom of the two superimposed molecules and N atoms are being compared. This, of course, is not possible with ˚. two different proteins so rms distances or differences can range upward from 0.0 A What can one expect for rms values when comparing two proteins? Two different ˚ are still conformationally highly proteins that have an rms difference of 1.0 to 1.5 A homologous. In fact, two crystal structures of the same protein in different lattices may ˚ . Two proteins are still considered to be have rms differences ranging from 0.5 to 1.0 A ˚ . When the rms value is topologically equivalent if the rms value for C␣ atoms is 2.5 A ˚ greater than 3.5 A, the similarity becomes questionable (Matthews and Rossmann, 1985). In the final evaluation, it best to write to a file the new coordinates of the molecule that was moved and rotated. Once this is done, your favorite graphics program may be used at leisure to see how close the conformations are to one another and to pinpoint the major conformational differences.
THE OXYGENATION OF HEMOGLOBIN: TWO CRYSTAL CONFORMATIONS The mechanism of oxygenation of hemoglobin is functionally important because cooperativity provides a biological mechanism for the physiological reactions occurring in the lungs and peripheral tissues. If one looks at the oxygenation curves as the partial pressure of oxygen increases, the HbO/(HbO ⫹ deoxy-Hb) ratio changes in a sigmoidal rather than exponential fashion. The nature of the oxygenation curve facilitates unloading of O2 in peripheral tissue and reoxygenation in the lungs. Myoglobin is a monomeric homolog of hemoglobin. It oxygenates in a chemically normal exponential fashion. Historically, therefore, it was thought that the cooperativity of the (sigmoidal) oxygenation curve of hemoglobin was related to its oligomeric state.
8. Conformational States in Crystal and Nuclear Magnetic Resonance Structures
103
Fig. 8.2 Subunit–subunit contacts in the hemoglobin tetramer. The four symbols ␣1, ␣2, 1, and 2 represent the four subunits of the hemoglobin tetramer. The arrows between the subunit symbols are meant to indicate subunit–subunit contacts in the tetramer. The ␣1 subunit interacts with 2, ␣2, and 1; 2 interacts with ␣1, 1, and ␣2; and so on. In the crystal structure, the strongest contacts (thick arrows) are made between the ␣1 and 1 subunits, and between the ␣2 and 2 subunits.
Partly because of its physiological importance and partly because the crystal structure of hemoglobin was the first protein to be determined, its oxygenation has been the focus of intense study for many years. Much is known about the kinetics, thermodynamics, and even the mechanism of this transition. However, here we want to use it to study two conformational states obtained by crystallographic analyses. The reader should be prepared to study both subtle and large conformational changes, using the accompanying stereodrawings. The conformational differences are complex and so will be the accompanying drawings. The use of stereoviewing is required! Why do crystals of deoxyhemoglobin shatter on oxygenation? Why, indeed, when only one to four molecules of oxygen (two to eight atoms!) are bound to a molecule containing thousands of carbon, nitrogen, and oxygen atoms. The answer lies in the fact that the binding of oxygen causes a small structural change in the conformation of the component polypeptides, but a large conformational change in the arrangement of subunits in the tetrameric molecule. It is the relationship of this small tertiary conformational change to the quaternary conformational change that is the basis of hemoglobin’s cooperativity. The sequence of conformational changes beginning with oxygenation and leading to a tertiary and subsequent quaternary conformational change is complex. Such comparisons represent the ultimate use of structural methods in comparing conformational states. Some of the highlights of the comparisons and ideas about the basis for the different states are described below. Many more details of this structural transition can be found in a monograph by Perutz (1990). First, some background information on the structure of hemoglobin must be considered. (Note: Refer again to Chapter 5 if you have forgotten the definitions of symmetry operators and point groups.) Both HbO and deoxy-Hb are tetramers with approximate 222 point symmetry. How can that be if the protein contains two different subunit types? Early in the crystallographic studies of hemoglobin, Perutz showed that the conformation of an ␣ subunit was nearly identical with that of the  subunit. This was done both visually and by the numerical methods described above. Figure 8.2 is a schematic drawing of the hemoglobin tetramer. The amino acid sequences, some crystal coordinates, and notes on the conformations are contained in the appendix at the end of this chapter.
104
The Oxygenation of Hemoglobin: Two Crystal Conformations
Fig. 8.3 Quaternary structure of HbO and deoxy-Hb. The quaternary rearrangement of Hb subunits that accompanies oxygenation is shown. Each subunit is represented by a sphere and labelled according to the notation described in text. The exact twofold rotational symmetry axes are shown by the arrows—one for HbO and one for deoxy-Hb. During oxygenation, the rotation of the ␣22 dimer occurs about an axis p, which is coming out of the plane of the paper. The favorable contacts between and ␣2 and 2 subunits allow them to rotate as a unit. The rotation around p has a magnitude of . The new twofold axis relating the ␣ dimers moves through an angle of /2.
In any tetramer with 222 point symmetry, subunit–subunit contacts are generally but not necessarily made between all four polypeptide chains. The lines connecting the symbols ␣1, ␣2, 1, and 2 as shown in Fig. 8.2 are meant to illustrate subunit contacts in the hemoglobin tetramer. Oftentimes in this sort of tetramer, some of the subunit– subunit contacts are stronger than others. This is the case in Hb, where the subunit– subunit contacts between ␣1 and 1 are many. In a sense, Hb can be described as a dimer of stable ␣ dimers, and this has important implications for the conformational change that accompanies oxygenation. The next step in the crystallographic analysis of the transition from deoxy-Hb to HbO was to compare the coordinates from the two crystal forms. Historically and logically, the analysis of the oxygen transition begins with the observation that the tetrameric ␣22 molecule has two different subunit arrangements. This means that some or all of the subunit–subunit contacts must change. The relationship of the two quaternary conformations is shown in Fig. 8.3. Perutz and co-workers realized that the subunit rearrangement involved the rigid body movement of the ␣11 and ␣22 pairs, as can be seen in Fig. 8.3. In this type of cooperative conformational change, perhaps the most important factor is that the symmetry did not change. HbO and deoxy-Hb both have true twofold rotational symmetry and approximate 222 point symmetry. The true molecular dyads for both conformational states are shown by the vertical arrows in Fig. 8.3. Because of the reorientation of the ␣11 dimer relative to ␣22, the location of this symmetry axis changes. The conformational change can be described by the rotation of ␣22 subunits by an angle around an axis p that is perpendicular to the plane of the drawing. The true dyad moves from the indicated vertical position to a new position /2⬚ away. The rearrangement of subunits, as shown schematically in Fig. 8.3, offered a crude explanation of how ligand binding in one subunit affected the chemistry at a site in another subunit. The effect of oxygen binding is somehow transmitted to the surface and to the subunit–subunit contacts. New, more stable contacts arise and the quaternary structural change takes place.
8. Conformational States in Crystal and Nuclear Magnetic Resonance Structures
105
A
B
Fig. 8.4 An ␣ subunit of deoxy- and oxyhemoglobin. These stereodrawings illustrate the crystal structure of an ␣ subunit of human hemoglobin. C␣ atoms are shown along with the heme group and three amino acid side chains: Y140, H87, and L91. Deoxy-Hb is shown in the top panel, HbO in the bottom panel. Included in the stereodrawing is an oxygen molecule as it is bound to the crystalline form of the protein. The two different crystal structures are in nearly identical orientations and it should be possible to see the conformational differences. However, some differences are small, and nearly indiscernible.
In explaining the transition from HbO to deoxy-Hb, the next problem that had to be faced was the question of how extensive the tertiary conformational changes were and how O2 binding caused the tertiary change. To do this, it is by far easier to study only one subunit in stereo and in the two conformational states. Stereodrawings of the ␣ chains in human hemoglobin are shown for deoxy-Hb in Fig. 8.4A and for HbO in Fig. 8.4B. Although only a C␣ model and the heme group are shown, the entire amino acid sequence is given in Table 8.1. In addition, the secondary structure of Hb is given in Table 8.2. The reader should study the stereodrawings of the ␣ subunits in Fig. 8.4A and B and see if the secondary structure appears at the correct locations. Pay particular attention to the location of the F helix, since it contains H87 (often referred to as histidine F8). By means of a coordinate covalent bond, this amino acid is linked to the heme iron. Figure 8.4 also depicts atoms from H87, L91, and Y141. As a first step, the reader should locate the bound O2 molecule in Fig. 8.4B. Note that it binds directly to the iron atom of the heme group. The binding of the O2 molecule must be the initial event in the cooperative transition between deoxy-Hb and HbO. In analyzing further the two conformational states, the next step is to tabulate all of the conformational differences between Fig. 8.4A and B. In fact, the conformational differences had to be documented for the entire Hb tetramer. Perutz and co-workers then had
106
The Oxygenation of Hemoglobin: Two Crystal Conformations
TABLE 8.1 Human Hemoglobin
␣ 
␣ 
␣ 
␣ 
␣ 
␣ 
................................................. 1a ....................................................2................................. -V -L -S -P -A -D -K -T -N -V -K -A -A -W -G -K -V -G -A -H -A -G -E -Y -G -A -V -H -L -T -P -E -E -K -S -A -V -T -A -L -W -G -K -V -N -V -D -E -V -G -G -E ................................................. 1 ....................................................2................................. ...................3 ....................................................4....................................................5 .......... -E -A -L -E -R -M -F -L -S -F -P -T -T -K -T -Y -F -P -H -F -D -L -S -H -G -S -A -L -G -R -L -L -V -V -Y -P -W -T -Q -R -F -F -E -S -F -G -D -L -S -T -P -D ...................3 ....................................................4....................................................5 .......... .........................................6 ....................................................7........................................... -A -Q -V -K -G -H -G -K -K -V -A -D -A -L -T -N -A -V -A -H -V -D -D -M -P -N -A -V -M -G -N -P -K -V -K -A -H -G -K -K -V -L -G -A -F -S -D -G -L -A -H -L .........................................6 ....................................................7........................................... ...... 8.................................................... 9 ...................................................10 ..................... -A -L -S -A -L -S -D -L -H -A -H -K -L -R -V -D -P -V -N -F -K -L -L -S -H -C -D -N -L -K -G -T -F -A -T -L -S -E -L -H -C -D -K -L -H -V -D -P -E -N -F -R ...... 8.................................................... 9 ...................................................10 ..................... .......................... 11 ..................................................12 ..................................................13 -L -L -V -T -L -A -A -H -L -P -A -E -F -T -P -A -V -H -A -S -L -D -K -F -L -A -L -L -G -N -V -L -V -C -V -L -A -H -H -F -G -K -E -F -T -P -P -V -Q -A -A -Y .......................... 11...................................................12 ..................................................13 .................................................14 -S -V -S -T -V -L -T -S -K -Y -R -Q -K -V -V -A -G -V -A -N -A -L -A -H -K -Y -H .................................................14..................................
a
Every tenth residue is numbered and can be used to identify its location in the amino acid sequence.
to see if it was possible to trace a cause-and-effect structural change from the oxygenbinding site to the surface of the subunits. The differences in structure between crystalline deoxy-Hb and HbO, once ordered as a series of cause-and-effect events, suggest a temporal mechanism for the quaternary transition. To begin with, the changes in the ␣ subunits, as revealed by the two crystalline conformational states, are shown in Fig. 8.4. Of course, similar conformational comparisons were done on the  subunits. Not easily visible in Fig. 8.4 is the fact that histidine F8 of the ␣ subunits moves toward the heme prosthetic group on oxygenation. It is not seen even in stereo because the change is small. The reader should calculate the change in the distance between the iron atom and the histidine atom that binds to it. Study Fig. 8.4 first, since it is necessary to decide which atom of the histidine is closest and ligating to the iron atom of the heme. This is probably the second step of the conformational transition; O2 binding must occur first. At this point it may be worthwhile to study Table 8.2 to become familiar with globin nomenclature although it was used in Chapter 4. Remember that the globins are composed of eight ␣ helices labelled A through H. The heme is held in a pocket between two helices, E and F. The side of the heme nearest the E helix is where the O2 binds. On the other side of the heme, a histidine coordinates with the heme iron. This histidine is called the proximal histidine or F8. It is the eighth residue counting from the N-terminal side in the sixth or F helix. Many crystallographic studies are described in terms of
107
8. Conformational States in Crystal and Nuclear Magnetic Resonance Structures
TABLE 8.2 Secondary Structure of Hemoglobin Helix
Amino acid sequence span ␣ Subunit
A B C D E F G H
1 2 3 4 5 6 7 8
A B C D E F G H
SER HIS PHE HIS SER LEU ASP THR
3 20 36 50 52 80 94 118
GLY SER TYR GLY ALA ALA HIS SER
18 35 42 51 71 88 112 138
VAL VAL PHE GLY ALA CYS HIS HIS
18 34 41 56 76 93 117 143
 Subunit A B C D E F G H
9 10 11 12 13 14 15 16
A B C D E F G H
THR ASN TYR THR ASN PHE ASP THR
4 19 35 50 57 85 99 123
some nomenclature invented by the investigator. It is always necessary to study the nomenclature before reading the text or studying the figures in any crystallographic report! Study Fig. 8.4 carefully, until you are confident that you understand the changes that are occurring in the ␣ subunits on binding oxygen. The stereodiagrams contain only a few of the minor changes, which together represent the transition. What happens to the F helix during the oxy to deoxy transition? It moves and ‘‘drags’’ other parts of the structure with it. Find the F helix in both the oxy and deoxy forms. Keep in mind that it is necessary to find conformational changes that will be transmitted to the subunit surfaces and hence lead to the quaternary structural change. Note especially the change in conformation of Y140 in the two crystalline forms. Why does the F8 histidine move? In deoxy-Hb the iron atom is 5 coordinated and the heme group is slightly dome shaped. In HbO, the iron atom is 6 coordinated and the crystallographic structure shows that the heme group is more flat. Although the distance ˚ , this is enough from the iron atom to the porphyrin nitrogens changes by less than 0.1 A to cause a notable movement in the main protein coordination site, histidine F8. Once again, only a few of the conformational changes are shown in Fig. 8.4A and B. But now it is possible to understand the next step in the quaternary rearrangement. Small movements of a helix, the F helix, are transmitted to other elements of the tertiary structure that are closer to the surface and perhaps involved in subunit–subunit interactions. To complete our analyses of the two conformational states of hemoglobin, it is now necessary to see if the changes found in the crystal structures at the heme group can be traced to surface residues and the reorientation of subunits. An ␣12 subunit pair is shown in Fig. 8.5. This, of course, is only half of the hemoglobin molecule, but if the complete tetramer were shown the drawing would be far too complicated to view. Remember from our earlier discussion that the ␣11 subunits move as a unit without changing their own relative orientation during the oxy-to-deoxy transition. Therefore, the major conformational change must occur at the ␣12 interface.
Fig. 8.5 The ␣12 subunit interface of human oxy- and deoxyhemoglobin. The stereodrawings depict a C␣ model of the ␣12 interface of both deoxy-Hb (top) and HbO (bottom). In addition, side chains are shown for the proximal histidines of both subunits ␣H89 and H91 and the heme prosthetic group plus O2 in HbO. Side chains are also shown for residues 91–95 of the ␣ subunits (–L R V D P–) and for residues 93–100 of the  subunits (–D K L H V P E–). Y140 of the ␣ subunits is also shown. Note that the  subunits have been renumbered as required by most drawing programs. Many of these programs require that unique sequence numbers be assigned to every residue. The renumbering then proceeds as follows: The ␣ subunits are numbered from 1 through 141. The heme belonging to the ␣ chain is numbered 142. The  chain then begins with 143. To obtain the correct amino acid sequence number (Table 8.1) subtract 142 from the numbers shown here for the  chain.
8. Conformational States in Crystal and Nuclear Magnetic Resonance Structures
109
According to Perutz and colleagues, the largest conformational changes occur in helix F, in segment FG, and in residues G1–4, H18–21, and HC1–3 (see Table 8.2 for the definitions of secondary structure). Again, for ease of viewing, only the sidechains for the FG region are shown in Fig. 8.5. Note the position of ␣L91 in both forms of the protein. Movement of this residue is small, but by looking at it in stereo it is possible to note its position relative to the proximal histidine and the shift in position it undergoes. Flattening of the heme group is a second factor in initiating the subunit–subunit transition, since it causes the small movement in ␣L91. Again using Fig. 8.5, look at the homologous leucine at position 238 (L96 or leucine 238 in the  chain). Study its position relative to the proximal histidine and small conformational differences should be apparent. The flattening of the porphyrin system and the movement of the iron atom occurs in both ␣ and  subunits and appears to bring about similar conformational changes in both types of subunits. This is not surprising since they are structurally homologous. The difficult part now is to try and detect differences in the ␣12 interface. Remember: The two models are in the same orientation, so that if a side chain appears to have changed, it is a result of the conformational transition. Although it is possible to discover many small differences, look at the relative orientations of ␣R92 and ␣D94 in Fig. 8.5. Are they the same or different? In the  chain, the conformational differences for the amino acids that are shown in Fig. 8.5 are relatively small. Compare the orientation of D99 (residue 241 in Fig. 8.5) in the oxy and deoxy forms. In particular, it should be possible to detect the reorientation of the oxygen atoms belonging to the D99 side chain. According to Perutz, the movement of the F helix and FG corner in the  subunits is transmitted to the first few residues of the G helix and is dissipated beyond G5. With Tables 8.1 and 8.2 at hand, and Fig. 8.5, this part of the G helix in the  chain should be identifiable. The reader should be able to see the three-dimensional orientation of the G helix in the ␣12 interface. To summarize, the crystallographic analysis suggests the following mechanism for the transition from HbO to deoxy-Hb: (1) The iron coordination changes; (2) the heme shape changes, going from dome shaped to flat; (3) the proximal histidine moves closer to the heme; (4) the movement of the F8 histidine pulls the F helix, the FG turn, and G1. Residues ␣G1–4, ␣H18–21, and ␣HC1–3 also move; and (5) in the  chain, the effects are similar. Movement of a number of surface residues, particularly D99, destabilizes the ␣12 contact and the quaternary structural change takes place. Conformational changes, particularly complex ones involved with allosterism, remind one of the skeletal system. Movements as simple as the bending of a finger involve a complex interaction between the skeletal and muscular components. It is unlike mechanical motions in that the order of changes is difficult to analyze. In addition, it is noteworthy that as more and more crystallographic studies of other allosteric proteins are completed, a general principle appears to be developing. In phosphorylase, phosphofructokinase, and aspartate transcarbamylase, crystallographic results have shown that binding of small molecule effectors also causes quaternary conformational changes (Perutz, 1990). The quaternary changes are dramatic but based on relatively subtle changes in the tertiary structure. The key element is the fact that the subtle tertiary changes are transmitted to some but not all of the subunit interfaces. Remember that in hemoglobin, it was almost exclusively the ␣12 interface that was affected; the ␣11 interface experienced little change.
CONFORMATIONAL STATES IN OTHER CRYSTALLOGRAPHIC ANALYSES Hemoglobin was chosen as an example of different conformational states mainly because of the complexity and its close relationship to the cooperativity observed in its
110
Summary
biological function. Many different examples of conformational states have been observed in crystallographic analyses. The following list is surely incomplete, but it gives some idea of what variations are possible. Simple tertiary change A polypeptide loop in lactate dehydrogenase changes ˚ and its position on binding the coenzyme, NAD⫹. The largest movement is about 8 A the structural change may be important to the catalytic activity. Domain changes The two domains in hexokinase change their relative orientation on binding glucose. Catalytic activity appears to be related to this change. A general rotation of one domain relative to the other of 12⬚ has been documented. Quaternary changes Often associated with allosterism, the hemoglobin example described here was the first to be documented (Baldwin and Chothia, 1979). However, a number of other allosteric proteins have also been studied by crystallographic methods. Small tertiary, conformational changes accompanying ligand binding are transmitted to protein–protein interfaces, leading to quaternary changes. Without describing these in detail, the principle that the initiating tertiary changes are relatively small appears to be commonplace. Most often they involve the small movement of protein atoms at a binding site simply to accommodate a ligand. These small changes are propagated to a subunit–subunit interface. The various conformational states revealed by crystallographic studies indicate that the movement of elements of secondary structure is sometimes involved. Often some sort of hinge or pivot point is present in the three-dimensional structure. To put such changes into a broader perspective, no conformational changes have been observed that involved a major unfolding of a protein. No changes have yet been observed that involved a major change even in secondary structure. It is important to emphasize that the conformational changes described here take place between reasonably well-defined conformational states. In addition, all of the large macromolecules are continuously experiencing conformational fluctations, some of which may have functional implications. Crystallographic B factors offer some evidence of relatively small motions that are taking place continuously in a macromolecule. However, more rapid fluctuations of conformation can be described by NMR methods or molecular dynamics calculations. There are even suggestions of conformational ‘‘modes’’ of motion in protein molecules. Rather than undergoing rapid random fluctuations, linked elements of the structure move in a concerted and repeating fashion.
SUMMARY Two ways to establish conformational changes through the use of protein crystallography have been described—one visual, one mathematical. Both methods can also be used to study homologous proteins. Conformational changes revealed by observing conformational states have been found by X-ray analyses of proteins almost since the inception of protein crystallography. Some dynamic and other conformational states have been examined by NMR methods. When there are many changes, it is difficult but not impossible to guess the chain of conformational events if the primary event is the attachment of the ligand. Such was the case for the hemoglobin transition discussed in detail in this chapter. The transition from HbO to deoxy-Hb is a very complicated change in state. It is more representative of the type of conformational change one would expect of an allosteric protein—not a small conformational change involving, for example, the binding of a substrate. By studying the two conformations of Hb, it has been possible to describe the most probable linkage of conformational changes that tie the binding of a small ligand to surface atoms to a subsequent rearrangement of subunits.
8. Conformational States in Crystal and Nuclear Magnetic Resonance Structures
111
PROBLEMS Practicing Stereovision 1. Using the amino acid sequences given in Table 8.1 and the stereodrawings of hemoglobin, optimize the alignment of the amino acid sequences of the ␣ and  subunits. What is the percent identity between ␣ and  subunits? 2. Using both Fig. 8.4A or B and Table 8.2, identify the eight component ␣ helices in the globin family. What is the significance of the E and F helices to the physiological function of hemoglobin? 3. Calculate the distance between the iron atom and the two nitrogen atoms of the F8 histidine in both HbO and deoxy-Hb for ␣ and  subunits. Which nitrogen atom has a semicovalent bond to the iron atom? How is this distance affected by the presence of oxygen? (Hint: Compare the values in HbO and deoxy-Hb.) 4. Generate the positions of the other iron atoms, assuming there is a twofold rotation axis congruent with the y axis (if necessary, refer again to Chapter 5). Use the positions generated to determine the distance between all four iron atoms in the Hb tetramer.
Using Computer Graphics Problems 5–9 challenge your ability to assemble the appropriate atoms to test a hypothesis, and your ability to edit and modify .kin files or any other graphics image. The questions are about the protein azurin, a tetramer for which several different crystal structures have been created. Two studies, in particular, show a subtle conformational difference. Obtain the coordinates of azurin for these studies: 4azu 5azu
pH 5.5 pH 9.0
Azurin is a tetramer, but only a single-subunit comparison is required to answer the questions. Using the Range command, assemble a .kin file for one subunit, the C␣ model, and the main-chain and side-chain atoms for residues 35–41 and 86–91. It will be necessary to concatenate the two .kin files after assembling them. While doing this, assign different colors to the pH 5.5 and pH 9.0 models so that the differences will be obvious on viewing. 5. Displaying only the C␣ models, which residue in the two models appears to have the largest discrepancy? How far apart are they? 6. Examine the main-chain atoms carefully: a peptide flip occurs at one residue. Where does this occur? 7. Which atom moves during this conformational change? How far? 8. Can you find the salt bridge connecting the two segments of side chains being displayed? 9. Both intrachain and side chain–main chain hydrogen bonds connect the two tight turns; find at least one example of each.
REFERENCES Baldwin, J., and Chothia, C. (1979). Haemoglobin: The structural changes related to ligand binding and its allosteric mechanism. J. Mol. Biol. 129, 183–191. Matthews, B., and Rossmann, M. (1985). Comparisons of protein structures. Methods Enzymol. 115, 397–420. Perutz, M. (1990). ‘‘Mechanisms of Cooperativity and Allosteric Regulation in Proteins.’’ Cambridge University Press, Cambridge.
112
Appendix
APPENDIX A. Partial Crystal Coordinates of Human Deoxyhemoglobin
Record name
Atom number
Atom name
Amino acid name
Chain name
Orthogonal coordinates ˚) (A x
Residue
y
z
Occupancy
B factor
␣ Subunit ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM HETATM
641 642 643 644 645 646 647 648 649 650 1071
N CA C O CB CG ND1 CD2 CE1 NE2
His His His His His His His His His His
A A A A A A A A A A
87 87 87 87 87 87 87 87 87 87
2.506 2.150 .626 .067 2.922 4.391 5.314 5.057 6.503 6.381
11.41 10.16 10.07 8.97 10.02 9.63 10.60 8.40 9.96 8.60
⫺16.243 ⫺15.586 ⫺15.345 ⫺15.395 ⫺14.254 ⫺14.518 ⫺14.877 ⫺14.474 ⫺15.052 ⫺14.811
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
15.37 17.57 15.83 15.41 17.54 18.32 15.66 15.15 13.35 14.28
FE
Hem
A
14
8.136
7.395
⫺15.038
1.00
18.06
 Subunit ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM HETATH
805 806 807 808 809 810 811 812 813 814 2238
N CA C O CB CG ND1 CD2 CE1 NE2
His His His His His His His His His His
B B B B B B B B B B
92 92 92 92 92 92 92 92 92 92
3.039 3.041 1.587 1.349 3.795 5.319 6.099 6.132 7.389 7.418
⫺12.607 ⫺11.192 ⫺10.64 ⫺9.44 ⫺11.05 ⫺10.95 ⫺12.07 ⫺9.84 ⫺11.57 ⫺10.24
18.536 18.144 17.949 18.122 16.825 17.008 17.203 16.989 17.324 17.189
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
11.94 13.60 15.42 13.17 9.94 13.74 17.57 17.06 15.93 15.44
FE
Hem
B
289
9.347
⫺9.320
17.376
1.00
15.54
B. Partial Crystal Coordinates of Human Oxyhemoglobin
Record name
Atom number
Atom name
Amino acid name
Chain name
Orthogonal coordinates ˚) (A x
Residue
y
z
Occupancy
B factor
⫺15.949 ⫺15.405 ⫺15.090 ⫺15.314 ⫺14.155 ⫺14.441 ⫺14.763 ⫺14.457 ⫺14.946
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
26.25 23.93 26.55 20.77 35.39 8.93 18.08 19.36 25.22
␣ Subunit ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM
641 642 643 644 645 646 647 648 649
N CA C O CB CG ND1 CD2 CE1
His His His His His His His His His
A A A A A A A A A
87 87 87 87 87 87 87 87 87
2.956 2.892 1.467 1.185 3.736 5.244 5.977 6.072 7.283
12.09 10.70 10.24 9.07 10.65 10.64 11.77 9.56 11.36
113
8. Conformational States in Crystal and Nuclear Magnetic Resonance Structures
ATOM HETATH HETATH HETATH
650 1071 1114 1115
NE2
His
A
87
7.344
9.99
⫺14.755
1.00
19.60
FE O1 O2
Hem Hem Hem
A A A
142 142 142
9.021 10.492 11.206
9.034 8.284 7.297
⫺14.848 ⫺14.973 ⫺14.988
1.00 1.00 1.00
27.37 17.77 33.84
 Subunit ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM ATOM HETATH HETATH HETATH
807 808 809 810 811 812 813 814 815 816 2240 2283 2284
N CA C O CB CG ND1 CD2 CE1 NE2
His His His His His His His His His His
B B B B B B B B B B
92 92 92 92 92 92 92 92 92 92
2.730 2.670 1.280 0.897 3.645 5.168 5.914 6.030 7.207 7.327
⫺12.44 ⫺11.03 ⫺10.71 ⫺9.54 ⫺10.73 ⫺10.78 ⫺11.94 ⫺9.76 ⫺11.60 ⫺10.26
15.377 15.027 14.510 14.465 13.907 14.159 14.173 14.407 14.390 14.540
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
52.64 56.64 31.62 48.20 74.94 32.79 30.87 24.82 30.80 17.53
FE O1 O2
Hem Hem Hem
B B B
289 289 289
8.969 10.369 11.190
⫺9.034 ⫺7.811 ⫺6.946
⫺14.806 ⫺14.951 ⫺14.606
1.00 1.00 1.00
36.73 42.12 37.44
C H A P T E R
9 Hydrogen Bonds and Water Molecules in Crystalline Proteins
INTRODUCTION efore the era of refinement of crystalline proteins, small volumes of electron density were unaccounted for in the X-ray maps. Oftentimes, this electron density was near the surface of the crystalline protein, adjacent to an atom of nitrogen or oxygen. During this time and extending into the late 1970s, computing power for the crystallographic refinement of protein molecules was not readily available (see Chapter 2 for a definition of refinement). As a consequence, the small regions of additional electron density that could not be accounted for by protein atoms were largely ignored. A few exceptions occurred. The ferric form of myoglobin did not bind oxygen but the crystalline protein appeared to have electron density near the iron atom at the sixth coordination site. This electron density was assigned to a water molecule. The presence of this particular water molecule could be identified chemically since its ionization at high pH produced spectral changes in the visible region of the heme spectra. At this point, with the noted few exceptions, little was known about the structure of water in crystalline proteins. When it became possible to refine crystallographic coordinates, these additional volumes of electron density persisted. They were clearly part of the protein molecule but not covalently bonded. Investigators began to assign such sites to bound water molecules and the presence of these additional atoms began to be accepted by the crystallographic community. Since they are obviously not part of the covalent structure, their presence has remained difficult to prove except by X-ray methods, which necessarily include crystallographic refinement. Hydrogen atoms are generally not visible in most electron density maps because X-ray scattering is proportional to the atomic number. Hydrogen atoms, compared with carbon, nitrogen, or oxygen, would be expected to have one-sixth, one-seventh, or oneeighth the electron density, respectively, in an X-ray map. Later, however, neutron diffraction methods were applied to a few crystalline proteins. Neutrons are scattered by a different physical mechanism and the ability to visualize hydrogen or deuterium atoms in maps of a crystalline protein was possible. In general, the neutron diffraction studies confirmed the presence of bound water molecules in the crystals and are discussed briefly below. Because the water molecules are not covalently linked to the protein and because the electron density associated with crystalline waters is generally weak, their presence
B
114
9. Hydrogen Bonds and Water Molecules in Crystalline Proteins
115
Fig. 9.1 Definition of a hydrogen bond in crystal structures. A hydrogen bond is postulated to exist in a crystal structure if the heteroatom (N or O) covalently bonded to a hydrogen atom is ˚ or less of another electronegative atom (N or O). The graph presents equienergy within 3.5 A levels based on the electrostatic or dipolar interaction calculated for a main-chain hydrogen bond between ⫺C⫽O and H⫺N⫺. The variable d represents the distance between the two heteroatoms. is the angular variance from linearity. The contours and relative energy levels are labeled 1–4 (additional contours are not labeled because of a lack of space). Contours 7 and 8 are the most favorable and have values of ⫺3.5 and ⫺4.0 kcal/mol.
must be interpreted with caution. Without certain criteria, water molecules might be added to the list of crystalline coordinates at locations that may be only noise peaks in the electron density map. During X-ray crystallographic refinement, coordinates for water molecules are added to the protein model if the following conditions are met. 1. The additional electron density must be within hydrogen-bonding distance of a heteroatom of the protein or another water molecule. Remember, hydrogen bonds are ˚ from another postulated in crystal structures when a donor or acceptor is ‘‘about’’ 2.8 A donor or acceptor. To be more exact, most often no information is available about the position of a hydrogen atom. Therefore a hydrogen bond is postulated on the basis of the heteroatoms involved in bridging, as shown by an example in Fig. 9.1. A more detailed description of atoms in proteins that form hydrogen bonds and the alignment of the atoms is given below. 2. The newly entered coordinates of the water molecule must maintain a reasonable temperature factor (or occupancy) during subsequent refinement. (Temperature factors are discussed in Chapter 2.) There is one last but important description about water molecules attached to crystalline proteins. Since hydrogen atoms are not visible, the scientist interpreting any electron density map has no information as to the ionic form of the water molecules. In some cases, they probably exist as H2O, but in exceptional cases, they may be either OH⫺ or H3O⫹.
HYDROGEN-BONDING POSITIONS IN PROTEINS With the hypothesis that any bound water molecule must be hydrogen bonded to a polar atom in the protein or another water molecule, any potential hydrogen bond donor or
116
Hydrogen-Bonding Positions in Proteins
Fig. 9.2 Hydrogen bond donor/acceptor groups in proteins. The dotted arrows are used to indicate whether an atom is a donor (arrow points away) or an acceptor (arrow points toward it). Note that all hydrogen atoms covalently attached to polar atoms are donors, by definition. Top: Hydrogen atoms from the peptide nitrogen; nitrogen atoms in the side chains of W, H, N, Q, R, and K; and nitrogen in the N terminus are shown as potential donors. Middle: Oxygen atoms that can serve as either a donor or acceptor are shown. This includes oxygen atoms found in the side chains of Y, S, T, and water itself. Bottom: Oxygen atoms serving only as hydrogen bond acceptors are depicted. This includes oxygen atoms in the peptide bond; the side chains of D, E, N, and Q; and the C terminus. A nitrogen atom in the imidazole side chain of histidine may also act as a hydrogen bond acceptor. The orbital types of the heteroatoms are also indicated.
acceptor within a protein molecule could be a binding site. The basis for this dipole– dipole interaction is quite simple. Hydrogen covalently bonded to nitrogen or oxygen atoms results in a slight positive charge associated with the hydrogen atom. This can interact with negative dipolar formations associated with sp2 or sp orbitals also found on oxygen or nitrogen atoms that are covalently part of a protein. A somewhat broader definition of a hydrogen bond is that it is assumed to exist if the two heteroatoms are separated by a distance governed by an angular dependence of the dipolar alignment. Kabsch and Sander (1983), before defining a computational method for defining secondary structure, developed an equation for analyzing the relative strength of hydrogen bonds between the carbonyl oxygen and the hydrogen linked to the nitrogen of peptides. It is instructive even if it does not apply directly to water molecules hydrogen bonded to proteins. A graphical representation of the relative energies of main-chain hydrogen bonds is also shown in Fig. 9.1. Note: These are energies tied to the electrostatic interactions, not the free energy of stabilization attributable to the hydrogen bond. The energy contours starting at ⫺0.5 kcal/mol are directly related to the separation distance and the angle . is defined as the angular departure from linearity of the ⫺N⫺H to the O⫽C ⫺ vector.
9. Hydrogen Bonds and Water Molecules in Crystalline Proteins
117
Perfect alignment of the dipolar charges results in favorable energy even at a ˚ . At the normally used distance definition for a hydrogen bond, 2.8 A ˚, distance of 5.2 A Figure 9.1 shows that for main-chain interactions, the angular alignment may be off by as much as 60⬚. Atoms in proteins that can form hydrogen bonds with water molecules or other protein atoms are shown in Fig. 9.2. An excellent early and more detailed review describes the situation in the mid-1980s (Baker and Hubbard, 1984). Note that some side chains found in a protein can act either as a hydrogen bond donor or acceptor. These are shown in the center row of Fig. 9.2. They include the oxygen atoms of serine (S), threonine (T), and tyrosine (Y) side chains. Y is somewhat different from S and T because the –OH group has some double-bond character. A water molecule, also shown in Fig. 9.2, is both a donor and acceptor. This bivalent characteristic leads to the possible formation of water networks that have been identified in many crystal structures. In these networks, a number of water molecules are linked together via their acceptor/ donor ability. Nearly always, such a water network is eventually hydrogen bonded to a polar group on the protein itself. Hydrogen bond donors are shown in the top row of Fig. 9.2. The most commonly encountered is hydrogen on the peptide nitrogen, since aside from proline, there is one such group per amino acid in the protein. In addition to the peptide hydrogen, potential hydrogen bond donors derive from the side chains of asparagine (N), glutamine (Q), and arginine (R). The guanidino group of arginine is special since it has three nitrogens, a positive charge, and the ability to form tautomers. One potential hydrogen bond donor is associated with –NH– group in the indole nucleus of tryptophan. This is similar to one of the nitrogens of the imidazole side chain of histidine. However, in its unprotonated state, the histidine side chain also has the ability to act as a hydrogen bond acceptor, as is shown in the bottom row of Fig. 9.2. The ⑀-amino group of lysine and the N-terminal amino group may also behave as hydrogen bond donors and/or waterbinding sites. All oxygen atoms in proteins are potential hydrogen bond acceptors. The most common, of course, is the carbonyl oxygen of the peptide bond. However, potential hydrogen bond acceptors also are found in the side chains of aspartate (D), glutamate (E), asparagine (N), and glutamine (Q) side chains. In addition, the oxygen atoms forming the carboxy terminal of the protein may serve as hydrogen bond acceptors. Oxygen atoms found in S, T, and Y are also potential hydrogen bond acceptors and waterbinding sites. The general geometry of the carbonyl oxygen acceptor should be apparent in Fig. 9.2, as is the geometry of other polar groups. The donor/acceptor geometry should give the reader some expectations about hydrogen bond orientations found in crystalline proteins. Since a water molecule is both a donor and acceptor, each of the atomic arrangements shown in Fig. 9.2 could serve as a potential water-binding site. There is one additional factor that should be considered about hydrogen bonding and water sites. A number of cases of bifurcated hydrogen bonds have been found in proteins and small molecules, and a schematic drawing is shown in Fig. 9.3. Bifurcation occurs when a single hydrogen atom appears to associate with two or more acceptors. Note in Fig. 9.3 that the indicated peptide hydrogen atom is hydrogen bonded to two different oxygen atoms. In most of the documented cases, the hydrogen bond appeared to be slightly different in each of the bifurcation limbs. The difference is characterized by the distances between heteroatoms as mentioned previously (Baker and Hubbard, 1984).
NEUTRON DIFFRACTION In spite of the difficulties associated with neutron diffraction studies of proteins, the results have been vitally important in confirming hydrogen positions that were only surmised from X-ray diffraction studies. The difficulties associated with neutron
118
Neutron Diffraction
Fig. 9.3 Bifurcated hydrogen bonds. In some proteins, a single bonded hydrogen atom serves as a donor to two or more acceptors. Here, one hydrogen atom forms two hydrogen bonds with two different oxygen atoms (Baker and Hubbard, 1984). Although not shown, the resultant hydrogen bonds may have different characteristics as exemplified in the crystallographic distances found between the two heteroatoms.
crystallography are twofold: (1) a nuclear reactor is needed as the radiation source used for diffraction; and (2) neutrons are not easily detected; the collection of diffraction data can take weeks to months. An introduction to neutron protein crystallography has been written by Kossiakoff (1983). It describes the theory and experimental procedures for neutron crystallography as applied to crystalline proteins. To recap only the principles: unlike the scattering of X-rays, elements behave in a different manner. Hydrogen atoms have a negative scattering coefficient and they appear as negative scattering density in maps of crystalline proteins. Deuterium, on the other hand, scatters about as well as carbon, nitrogen, and oxygen. Recall that the scattering ability of atoms in X-ray diffraction is related to their atomic number. In an X-ray diffraction experiment, hydrogen, with an atomic number of 1, has less than 20% of the scattering capacity of carbon. The large difference in neutron scattering coefficients between hydrogen and deuterium adds an additional and powerful tool to neutron crystallography. By equilibrating a protein crystal in D2O before data collection, all readily exchangeable hydrogens are replaced with deuterons. Nonexchangeable hydrogen atoms show up as negative peaks and deuterons as positive peaks in a neutron density map. By comparing results from crystals in water or D2O containing mother liquor, not only is it possible to check the water sites, but other useful information is obtainable. Many covalently bonded hydrogen atoms are exchangeable and these sites will be distinguishable from slowly or nonexchangeable hydrogen atoms in the neutron scattering maps. The results of comparing the location of water molecules bound to crystalline proteins by both neutron and X-ray diffraction methods have indicated close agreement. For example, when comparing water molecules associated with the X-ray and neutron maps of trypsin, a good correspondence was found; 53 of 58 water sites were the same in both crystallographic studies (Kossiakoff et al., 1992). Other interesting results also occur. Some proteins contain water molecules that are buried within the three-dimensional structure. That is, they are not accessible to the bulk solvent. However, when H2O/D2O maps are compared, the buried water molecules are clearly exchangeable (Kossiakoff et al., 1992). Because of normal dynamic motion of segments of the protein, a location inside the molecule is still accessible to the bulk solvent. By using neutron diffraction it has been shown that most protons associated with heteroatoms in the protein are exchangeable. This includes hydrogen atoms associated
9. Hydrogen Bonds and Water Molecules in Crystalline Proteins
119
with the peptide nitrogen and side chains of histidine, lysine, etc. Since neutron diffraction studies along with hydrogen/deuterium exchange data have been done on other proteins besides trypsin (including myoglobin, ribonuclease, lysozyme, crambin, and pancreatic trypsin inhibitor) the exchangeability factor can be tabulated for many forms of secondary structure. The results were summarized in another review article by Kossiakoff (1985). By examining the positions of hydrogen atoms that exchange readily, along with the positions of those that do not, it is possible to get some feeling for conformational fluctuations within the otherwise static protein structure derived from crystal diffraction studies. To generalize from the combined results obtained from the proteins listed above, it appears that the ability to exchange correlates with the location of individual hydrogen atom sites within the overall secondary structure of the protein. Kossiakoff notes that nonexchangeable hydrogen atom sites cluster in -sheet regions. Other, somewhat unexpected results were found. The peptide protons in an ␣ helix are hydrogen bonded to carbonyl oxygens. Because of hydrogen bonding, it could be expected that they would not be readily exchangeable—particularly those bordering on a hydrophobic core. This was not the case. Most of the peptide hydrogen atoms located in ␣ helices were found to be exchangeable. The exchange factor on soaking a crystalline protein in D2O did appear to have a packing or steric factor associated with it. If one calculates the number of atoms within ˚ sphere of the hydrogen atom in question, those with the highest packing factor a 7-A tended to be nonexchangeable. The packing units used for descriptive purposes are ˚ sphere somewhat unusual—the number of neighboring atoms found within a 7-A 3 ˚ (1436 A ). For example, the average packing density for peptide hydrogens appeared to be about 117 packing units. The upper boundary is about 180 packing units. When the value for a peptide hydrogen reached about 160, that position was generally not exchangeable. Clearly, the presence of nearby atoms can prevent the hydrogen/deuterium exchange that must occur through diffusing water molecules.
WATER MOLECULES OBSERVED IN CRYSTALLINE PROTEINS One of the more powerful aspects of crystallographic analyses of proteins is the information it gives about associated noncovalently bonded molecules. The most common of these are water molecules. To the first approximation, refined crystal structures usually include somewhere between one-third and one-half bound water molecules per amino acid. These water molecules play a role in every aspect of protein structure. Since solvent molecules bridging two subunits in an oligomeric protein have been found in crystal structures, such water clearly plays an important role in quaternary structure. Similarly, water molecules are frequently found associated with certain aspects of secondary structure. The simplest one to envision is the polar atom constellation in an ␣ helix. Recall that the carbonyl oxygen of a peptide is hydrogen bonded to the hydrogen atom attached to the peptide nitrogen of the fourth amino acid toward the C terminus (Chapter 6). However, at both ends of the ␣ helix are clusters of unfilled polar atoms. At the N terminus, there are three such hydrogen atoms belonging to the first three peptide nitrogens of the helix. Similarly, at the C-terminal end of the ␣ helix will be be three carbonyl oxygens with no continuing hydrogen bonds. Both ends of an ␣ helix are potential water-binding sites because of the presence of these polar atoms. Another example can be found near the end strands of a  sheet. These strands cannot form hydrogen bonds with adjacent strands. On one side of each strand on the sheet edge there are no secondary structural elements to satisfy the polar atoms. As a result, the carbonyl oxygens or peptide nitrogen/hydrogens at these positions have unfilled hydrogen-bonding sites that are often found with bound water molecules. Furthermore, it is not uncommon to have a bend in an ␣ helix breaking the normal
120
The Distribution of Protein-Bound Water
hydrogen-bonding pattern. Such sites also usually have a bound water molecule in the crystal structure. Water molecules are also frequently found associated with protein-bound metal ions. A variety of metal-binding sites have been documented in crystalline protein. Nearly always one or more of the metal ligands fulfills the criteria discussed above for a water molecule—they are within the appropriate semicovalent bonding distance of the metal ion, and electron density is visible that may not be part of the covalent protein structure. On the basis of the varied positions of water molecules bound to crystalline proteins, they appear to be an important part of all levels of protein conformation. Finally, bound water sites may also play an important role in enzyme catalysis. A number of enzyme structures contain a water molecule at a stereochemical location that suggests it participates in the catalytic mechanism.
THE DISTRIBUTION OF PROTEIN-BOUND WATER The location of water molecules hydrogen bonded to polar atoms in proteins is much like what would be predicted on the basis of small molecule structural studies and orbital considerations. For example, in Fig. 9.4A, water molecules bound to the main-chain carbonyl oxygens for a number of crystalline proteins are shown in stereo. To produce this image, the three atoms C, CA, and O, also shown in Fig. 9.4A and B, have been taken from coordinates of a number of crystalline proteins and superimposed. The three superimposed atoms then form the framework for the stereoimage of the water distribution (Thanki et al., 1988). If viewed in stereo, Fig. 9.4A shows that the bound water molecules form a shell around the carbonyl oxygens. This shell of water locations does not contain a uniform distribution. Rather, a bimodal preference results from the general location of the oxygen atoms of the bound water and the peptide carbonyl oxygen. This preference is probably due to the sp2 orbitals of the carbonyl oxygen aligning with the –O:H bond of water. The latter alignment optimizes the dipolar electrostatic interactions. More careful analyses of the angular distribution can be found in the original paper of Thanki et al. (1991). As can be seen in Fig. 9.4B, the overall distribution of water molecules bound to carbonyl oxygens of ␣-helical structures in crystalline proteins is different. Not unexpectedly, one of the nodes in the distribution is missing. The second or missing sp2 site is involved in the intramolecular hydrogen bonding characteristic of the ␣ helix. At the C-terminal end of the ␣ helix, one would expect to find additional waters associated with the carbonyls that cannot participate in the helical hydrogen bonding. This appears to be the case (Thanki et al., 1991). The ⬎N–H segment of the polypeptide chain is also the site of water binding. The distribution of water sites around the peptide nitrogen in the sampled crystal structures is shown in stereo in Fig. 9.5. The unimodal appearance represents a site attributable to the hydrogen bonds forming between the oxygen of a water molecule and the hydrogen belonging to the peptide nitrogen. In this case the mean distance appears to be somewhat longer than that attributable to the ⬎C⫽O to water interactions. Furthermore, most of the water sites appear to lie in a plane with the peptide bond. The plane includes the carbonyl carbon and oxygen, and the nitrogen atom. The water positions shown in Fig. 9.5 would be forbidden in an ␣-helical segment and probably in most amino acids involved in  structure. However, at the N termini of ␣ helices, at least three amino acids have peptide nitrogens that do not participate in helical hydrogen bonding and are therefore potential water sites. Taken together, Figs. 9.4 and 9.5 show that main-chain surface atoms are the site of water binding in the crystalline state. For hydrogens belonging to peptide nitrogens, the binding of water molecules is generally not possible in the presence of secondary
9. Hydrogen Bonds and Water Molecules in Crystalline Proteins
121
Fig. 9.4 Main-chain carbonyl oxygens and water molecules. The distribution of water molecules bound to a sampling of proteins of known crystal structure is shown in stereo. The coordinates for these water molecules were supplied by J. Thornton and co-workers, and represent an extensive study as described (Thanki et al., 1988, 1991). They were selected because they are within hydrogen-bonding distance of oxygen atoms belonging to the main-chain carbonyl group. The dashed line to a single water molecule is labeled with the distance (in angstroms) to a representative water molecule. (A) The distribution describes all water molecules near the oxygen of the main chain. (B) In this stereodrawing, only water molecules in crystalline proteins that are near the carbonyl oxygens in ␣-helical conformations are shown.
structure. However, this is not the case for the carbonyl oxygen of the mainchain. The sp2 arrangement may permit hydrogen bonding as part of the secondary structure and still include attractions for a water molecule. The multiple donor/acceptor nature of both main-chain and side-chain atoms permits the formation of the bound water networks frequently observed in crystalline protein structures and is discussed in the next section. In Fig. 9.2, many of the hydrogen bond donor and acceptors were atoms belonging to protein side chains. Included in this category were the side chains of K, R, H, D, E, N, Q, S, T, Y, and W. As already noted, along with hydrogen-bonding capability is the participation in water binding. To illustrate one example of water molecules bound to side-chain atoms, Fig. 9.6 describes an analysis of the area around the –OH of tyrosine side chains in crystalline proteins (Thanki et al., 1991). When viewed in stereo,
122
The Distribution of Protein-Bound Water
Fig. 9.5 Water molecules and the peptide ⫺NH in crystalline proteins. This stereodrawing describes the distribution of water molecules hydrogen bonded to the peptide nitrogen in crystalline proteins. The data are derived from the studies of Thornton and co-workers as described in Fig. 9.4.
Fig. 9.6 The tyrosine side chain and bound water molecules. This stereodrawing illustrates the location of water molecules hydrogen bonded to the ⫺OH group of tyrosine as found in a sampling of refined crystal structures. (See also the captions to Figs. 9.4 and 9.5.)
9. Hydrogen Bonds and Water Molecules in Crystalline Proteins
123
Fig. 9.7 Water networks in crystalline proteins. This schematic drawing illustrates a water network in a crystalline protein. Protein atoms are labeled C, O, N, or H. The filled circles show the position of water molecules, with the dashed lines marking the positions of hydrogen bonds. Five different water molecules are shown (W1 through W5).
the positions of the waters from refined crystal structures are as expected, with sites concentrated near either the unpaired electrons or the covalently attached hydrogen. Taken together, the structures of proteins have many potential water-binding sites and numerous examples have been found by X-ray crystallography. In fact, any polar atom of the protein offers the potential for binding water. In general, the water sites do not appear in the same positions relative to the protein atom. Rather, they appear to be distributed, with the most populated position related to the covalent position of the hydrogen atom and the nonbonded electron acceptors. Since water is both a hydrogen bond donor and acceptor, water-binding sites are often found in networks in crystalline proteins.
WATER NETWORKS IN CRYSTALLINE PROTEINS As is true of many of the protein side chains, the aromatic –OH of tyrosine can be either a hydrogen bond donor or acceptor, or both. If it acts as both a donor and acceptor to water molecules, it is participating in a so-called water network. Collections of interacting water molecules have been found in numerous proteins and are illustrated schematically in Fig. 9.7. Because of their widespread presence, they are likely to be a factor in nearly all levels of protein structure. By satisfying a polar atom of the protein on the surface, or even an internal atom, they are contributing to the tertiary structure. Since in some cases the network may form between two subunits of an oligomeric protein, it may also contribute to the quaternary structure. The network shown in Fig. 9.7 is hypothetical. Nonetheless, networked waters have been found in nearly every crystalline protein. In Fig. 9.7, look first at the water molecule labeled W2, and note that it participates in three hydrogen bonds. Two hydrogen bonds involve protein atoms and one is to another water molecule, W3. The second water, W3, interacts only with other water molecules. Going back to W2, it is hydrogen bonded to a carbonyl oxygen from a side chain of glutamine (or asparagine) and accepts
124
Problems
a hydrogen bond from the side chain of a serine (or threonine). Both waters W1 and W4 form hydrogen bonds only with other water molecules. In a sense, W1 and W4 belong to a second hydration shell but are stabilized sufficiently to appear on an electron density map.
SUMMARY Although the X-ray and neutron diffraction studies have shown unambiguously the presence of bound water molecules in crystalline proteins, their positions relative to protein atoms can be discussed only in terms of a three-dimensional distribution. Unlike covalent bonding, hydrogen bonding appears to be energetically favorable even if strict steric conditions are not maintained. Since a large number of polar atoms are capable of being either a hydrogen bond donor or acceptor within any protein, refined crystal structures have a large number of water molecules in their refined coordinate list. The number of water molecules ranges upward from one-third of the number of amino acids in the protein. A 100-residue protein would be expected to have at least 33 bound water molecules in the crystal coordinates. These bound water molecules are usually found with more than one hydrogen bond and frequently are found in networks. The networks are webs of interconnected polar protein atoms and water molecules.
PROBLEMS Practicing Stereovision 1. Using Fig. 9.4B, try to estimate the distance distribution between the carbonyl oxygen and the bound water molecules—look at the drawing with stereoglasses and the ˚ ). Is the mean distance Ⳳ 2, 1.5, 1.0, or 0.5 A ˚? ruler (2.84 A 2. Wherever hydrogen bonds form between H2O and a nitrogen or an oxygen in the protein, there is no X-ray information to determine which is the donor atom (the nitrogen or oxygen in the protein, or the H2O). Using Figs. 9.5 and 9.6, draw all atoms including hydrogen atoms to depict the water-binding site. Hint: For Fig. 9.6 you will need two drawings! 3. Draw all of the atoms in the guanidino side chain of arginine and illustrate all possible hydrogen-bonding sites.
Using Computer Graphics Go to the PDB and obtain the coordinates for the enzyme ribonuclease T1 (1rga). Near residues 63–68 is a network of water molecules. These bound waters include the numbers 107, 110, 111, 113, 115, etc. The waters are partially buried in a crevice between the turn at positions 63–68 and 9–22. 4. Show, by measuring the distance between them, that these water molecules form a network. If the distances correspond to hydrogen bonds, draw the network. Waters are colored red by default. To make them easier to see, it is useful to edit the .kin file to make them yellow or some more visible color. 5. Beginning with the water number 113, trace the solvent network to the bound Ca2⫹ ion. 6. Using the Measure command, show how these waters help stabilize the interactions between residues 63–68 and 9–22. 7. Now center the graphics display on the bound Ca2⫹ ion. What protein atoms are involved in the calcium-binding site? Describe the complete chelation shell for the bound ion.
9. Hydrogen Bonds and Water Molecules in Crystalline Proteins
125
REFERENCES Baker, E., and Hubbard, R. (1984). Hydrogen bonding in globular proteins. Prog. Biophys. Mol. Biol. 44, 97–179. Kabsch, W., and Sander, C. (1983). Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22, 2577–2637. Kossiakoff, A. (1983). Neutron protein crystallography. Annu. Rev. Biophys. Bioeng. 12, 159– 182. Kossiakoff, A. (1985). Neutron protein crystallography. Annu. Rev. Biochem. 54, 1195–1227. Kossiakoff, A., Sintchak, M., Shpungin, J., and Presta, L. (1992). Analysis of solvent structure in proteins using neutron D2O–H2O solvent maps: Pattern of primary and secondary hydration of trypsin. Proteins Struct. Funct. Genet. 12, 223–236. Thanki, N., Thornton, J. M., and Goodfellow, J. M. (1988). Distribution of water around amino acid residues in proteins. J. Mol. Biol. 202, 637–657. Thanki, N., Umrania, Y., Thornton, J., and Goodfellow, J. (1991). Analysis of protein main-chain solvation as a function of secondary structure. J. Mol. Biol. 221, 669–691.
C H A P T E R
10 Protein and Nucleic Acid Complexes INTRODUCTION n the 1950s, Watson and Crick deduced the structure of DNA from X-ray diffraction studies of fibers drawn from DNA gels (Watson and Crick, 1953; Crick and Watson, 1954). The actual fiber diffraction photographs were recorded by a number of other workers (see, e.g., Wilkins et al., 1953). The X-ray methods used to interpret fiber diffraction photographs are similar only in principle to those described in Chapter 2. In simplest terms, hypothetical models of the polymer are constructed until one is found that satisfies the diffraction pattern. The model that Watson and Crick developed satisfied the fiber diffraction pattern and they produced the first molecular model of doublestranded DNA. It quickly became the paradigm for a new subject called molecular biology, and a new era in biology began. However, the fiber diffraction model addressed only the crude, overall parameters for this form of DNA. It was many years before single crystals of oligonucleotides were obtained and their structures determined by X-ray crystallography. Later, solution structures were also obtained by nuclear magnetic resonance (NMR) methods. The model built by Watson and Crick is the now generalized structure of doublestranded B-form DNA. It has the popular name double helix. The double-stranded structure closely resembles the oligonucleotide shown in stereo in Fig. 10.1. The strands run in opposite direction and each strand of the helical molecule has a right-handed ˚ apart and is twist and 10 base pairs per helical turn. Each base pair is stacked 3.4 A located in the central part of the double helix. The structure visible in stereo in Fig. 10.1 is a pseudo-backbone model. The prongs sticking out from the backbone indicate where the bases are located. The strands are antiparallel, as is apparent by the labeling. Strand B has the 5' end at the top whereas strand A has the 3' end at the same position. The large groove visible to the right is referred to as the major groove. To hold the double helix together, Watson and Crick proposed specific interstrand hydrogen bonds between the base pairs adenine (A) and thymine (T), and between guanine (G) and cytosine (C). The precise stereochemical arrangement of base pairs has been shown to vary somewhat. Of utmost importance was the fact that the B-DNA model, with two nucleic acid chains running in opposite directions, immediately suggested a possible mechanism for copying this genetic material. The complementarity of the bases in the duplex can be used to generate a daughter strand. Copying both strands results in two identical copies of the original duplex precursor. Although stunning in its simplicity, it raised many more questions about how duplex DNA is used in replication, transcription, and the regulation of gene expression. The important genetic information contained in the base sequence is located in a somewhat unexposed area near the center of the molecule. The ribose–phosphate main
I
126
127
10. Protein and Nucleic Acid Complexes
Fig. 10.1 A pseudo-backbone structure of B-DNA. This stereodrawing shows the crude overall conformation of the most common form of DNA. The two strands, A and B, run in opposite directions and are labeled accordingly. The ‘‘bonds’’ protruding from the backbone of each chain illustrate the locations of glycosidic bonds and therefore the positions of the bases. The drawing was made by using the crystallographic coordinates of a dodecaoligonucleotide (12-mer) with the accession code 2bna (Wing et al., 1980). This oligonucleotide is self-complementary and has the sequence 5'-CGCGAATTCGCG-3'.
chain lies on the periphery of the macromolecule while the so-called genetic coding information, the base pairs, is central most to this linear polymer. In crystal structures of oligonucleotides that contain the B-DNA base pairing there are water molecules interacting with some of the phosphodiester regions. Since the double-stranded structure appears to be stable, a number of proteins have evolved to unwind the duplex and to maintain single-stranded forms.
STRUCTURAL DATA DESCRIBING DNA Because the Watson–Crick structure was deduced from fiber diffraction data, it represents only an averaged model of B-DNA. The samples used in the experiments contained DNA molecules that had nearly the same orientation, with the double-helical axis parallel with the long axis of the fiber. However, the azimuthal orientation is random, so that the fiber specimens are ordered only in two dimensions. In addition, each DNA molecule contained in the fiber varied in length and sequence. Some were bent slightly in varying directions. Therefore, the resulting fiber diffraction data represented a sequence-averaged, molecule-averaged structure of B-DNA. The A–T and G–C hydrogen-bonding patterns vital to the ‘‘genetic code’’ were not observed directly but were surmised from the model building. A better understanding of the actual base interactions and torsional angles in the ribose–phosphate backbone was delayed for several decades, until the first crystal structures were derived in the 1980s (Wing et al., 1980). Dickerson and colleagues (Wing et al., 1980) described one of the earliest examples of a detailed structure of a double-stranded oligonucleotide. Using a selfcomplementing dodecamer, crystals were obtained and analyzed to determine their molecular structure. The nucleotide sequence of the dodecamer is given in the caption to Fig. 10.1. A stereoimage of the crystallographic coordinates is shown in Fig. 10.2. This structure and others that followed have been studied in detail with regard to the orientation of base pairs. Since these oligonucleotides represent fragments of DNA, they have been subjected to criticism based on crystal packing and ‘‘end effects.’’ Clearly
128
Structural Data Describing DNA
Fig. 10.2 The crystal structure of the Dickerson dodecamer. This stereoview shows all of the atoms (except water molecules) of the DNA fragment in the B conformation (Wing et al., 1980). The drawing was placed in roughly the same orientation as that in Fig. 10.1. The position of the phosphorus atoms of the diester linkages can be seen along the backbone adjacent to the fivemembered ribose rings.
at the end of each strand, there may be a tendency toward ‘‘fraying,’’ but this aside, these crystal and NMR structures are a rich source of information regarding the B-form of DNA. In the crystal structure displayed in Fig. 10.2, a variety of important details are visible if viewed in stereo. The planes of the purine and pyrimidine bases are not exactly perpendicular to the axis of the double helix. Atoms belonging to the bases are more accessible through the major groove of the molecular structure than through the minor groove. Because of the nucleotide sequence in Figs. 10.1 and 10.2, there is a twofold rotational symmetry axis perpendicular to the helical axis located between A6 and T7, and between T7' and A6'. The potential local symmetry that may be generated by certain duplex base sequences plays a major role in the interaction of proteins with DNA. This will become more apparent by observing the molecular structures of some complexes important to modulation of DNA in phenomena such as replication and transcription. The crystallographic and NMR results for double-stranded oligonucleotides provided other conformational data. For the dodecamer described in Figs. 10.1 and 10.2, the double helix is somewhat distorted although the helical parameters of the center hexamer best coincide with standard B-DNA conformation. For example, the endmost base pairs show deviations in base pair overlap. Furthermore, the conformation of the deoxyribose sugar is not constant over the length of the nucleotide. Since the fivemembered sugar ring is part of the DNA backbone, changes in its conformation may have noticeable effects on the DNA structure. Puckering in the ribose moiety is described schematically in Fig. 10.3, where only the C-2 and C-3 endo conformations are shown. In the C-2 endo conformation, the C-2 carbon is puckered out of the plane of the ring in the direction of C-5. In the corresponding exo conformation, the C-2 carbon is moved out of the plane of the pentose ring, away from the C-5 carbon.
129
10. Protein and Nucleic Acid Complexes
Fig. 10.3 Potential conformational variation in the deoxyribose ring of DNA. Carbon positions are numbered in the conventional manner. Phosphodiester bonds are formed through the oxygens (not shown) at the C-3 and C-5 positions. Endopuckering of the ribose ring involves distortion in the direction of the bond between C-4 and C-5. Exopuckering at either the C-2 or C-3 position is in the opposite direction.
Other, even more subtle differences along the double helix can be found. For example, differences were observed in the propeller twist parameters for a deoxyribosylguanine–deoxyribosylcytosine (dG–dC) base pair relative to a dA–dT pair. The propeller twist is seen as the angular differences in the planes of Watson– Crick paired bases. Last of all, the dodecamer duplex displays a 19⬚ bend over the length of its helix. Not unexpectedly, the crystal structure demonstrated that B-DNA does not have a rigid ‘‘rodlike’’ shape, but instead shows the ability to flex. With some flexibility, relatively subtle sequence-specific variation in its conformation could be present along the double helix. It is also important to emphasize the fact that other forms of DNA were observed even in the 1950s. A-DNA, a form typically found at low pH, was described and some of its properties are listed in Table 10.1. Later crystallographic analysis showed that double-stranded DNA can adopt a third conformation that was called Z-DNA. The three different forms of DNA can be described by their helical parameters which are also listed in Table 10.1. A-DNA, a right-handed double helix like B-DNA, differs from B-DNA by having a deep major grove and shallow minor groove. Z-DNA, however, differs from both B-DNA and A-DNA because it is a left-handed double helix. The fact that DNA shows some subtle but sequence-specific conformational properties was a clue as to how protein–DNA interactions could occur. Furthermore, even though the bases are located near the center of the duplex, some of the atoms belonging to the bases are accessible through both the major and minor grooves. The reader should study Fig. 10.2 and try to evaluate the accessibility of polar atoms in the member bases. Remember that for DNA–protein interactions to occur, the protein must recognize the cognate DNA sequence by noncovalent interactions. Incidentally, the inner base pair
TABLE 10.1 Comparison of DNA Conformations
Grooves Major Minor Helix sense Residues per turn Rise per base pair Stacking Sugar conformation
B-DNA
A-DNA
Smooth, long and thin, bases perpendicular 2 Wide and deep Narrow and deep Right handed 10 3.4 Base–base in each strand C-2' endo
Smooth, short, and fat, bases tilted 2 Narrow and deep Wide and shallow Right handed 11 2.6 Base–base in each strand C-3' endo
Z-DNA Zig–zag 1 Very deep Left handed 12 3.8 Interstrand base–base C-3' endo, C-2' endo
130
DNA-Binding Motifs
sequence, d(G/AATTC), of the Dickerson dodecamer contains the recognition site for the restriction endonuclease EcoRI. The ‘‘slash’’ is the site of cleavage.
INTERACTION BETWEEN DNA AND SITE-SPECIFIC PROTEINS The sequence-specific structure of nucleic acids is of great importance in understanding the regulation of gene expression. Numerous biochemical, genetic, and structural studies have now established that the most common mechanism is for a protein to bind to duplex DNA. Sometimes binding is very sensitive to a specific duplex DNA sequence; other times specificity may be broad. For example, the lactose repressor, a 38-kDa tetramer, recognizes and has the greatest affinity for 26 base pairs in a genome of 4.2 ⫻ 106 bases (Wilkins et al., 1953). The recognition or binding of the repressor protein prevents transcription of the adjacent genes. For this system and many others, a small molecule can reverse the DNA binding by forming a complex with the regulatory protein. This adds another level of complexity to protein–DNA systems and the control of gene expression.
DNA-BINDING MOTIFS Because of the intense interest in cell differentiation and other aspects of cell biology involving transcription, many different regulators have been discovered. Several have been studied by either X-ray crystallography or NMR, or both. From the structural analyses, a relatively small number of motifs is emerging. To start with, many of the recognition elements in DNA have palindromic or pseudopalindromic sequences. By combining the base sequences from two strands and the structure of B-DNA, it is possible to generate regions of the DNA with twofold rotational symmetry. Consider the following oligonucleotide segment: hs1 5'–––GAAAGTTTG– 3'–––CTTTCAAAC– hs2 Crossing strands at the central GC pair, as marked by both the underlines and crossthrough marks, results in a palindromic segment. Assembled on B-DNA, the region will have twofold rotational symmetry. The symbols hs1 and hs2 represents two half-sites for the recognition element. If a cognate DNA sequence is palindromic and regulated by a binding protein, the binding protein should also have twofold rotational symmetry. As discussed in Chapter 5, twofold symmetry in globular proteins is likely if the protein contains two, four, six, etc., identical subunits. However, the symmetry argument, although general, provides no information about the specific noncovalent interactions that can occur between protein and nucleic acid. Many such questions are apparent. Does the protein bind along the major or minor groove of DNA? Or does it use both? Is there a particular conformation or motif that allows certain amino acids to mediate recognition and binding? What is the nature of the protein side chains that are found at an interaction site? As the number of crystal and solution structures grows, it appears currently that only a limited number of conformational motifs have evolved for transcriptional regulation. A tabulation may be found in the review article by Pabo and Sauer (1992). In 1992, at least six well-established families were identified. The six motifs were as follows: (1) helix–turn–helix, (2) homeodomains, (3) zinc fingers, (4) steroid/nuclear receptors, (5) leucine zippers, and (6) the helix–loop–helix proteins. A few examples
10. Protein and Nucleic Acid Complexes
131
Fig. 10.4 The helix–turn–helix motif with B-DNA. The helix–turn–helix motif from the Cro repressor (3 Cro) is shown in stereo as a C␣ model along with a segment of B-DNA to which it binds. Note the twofold rotational symmetry between the two motifs, each derived from a separate subunit of the repressor protein. Both strands of the oligonucleotide are labeled on the 3' and 5' ends. The helix interacting in the major groove is labeled H1 (Mondragon and Harrison, 1991).
along with some stereodrawings are given below, and Appendix 2, at the end of this book, contains the accession codes of a list of DNA, DNA processing enzymes, and protein–DNA complexes. Once again, the amount of structural data available in this area is overwhelming.
HELIX–TURN–HELIX The helix–turn–helix binding motif was first identified by the crystal structures of three transcriptional regulatory proteins: Cro, CAP, and repressor (Anderson et al., 1981; McKay and Steitz, 1981; Pabo and Lewis, 1982). All three were first crystallized and studied in the absence of DNA. Comparison of the three-dimensional structures of these DNA-binding proteins with amino acid mutational studies identified regions of the protein that were necessary for binding. The sequence identification was then used in conjunction with the crystal structures and the B-DNA conformation to show how the helix–turn–helix motif could hypothetically interact with the double helix of DNA. The Cro, CAP, and repressor proteins differ in size and tertiary structure but all bind to B-DNA as a dimer. Each contains a helix–turn–helix motif that is positioned along one face of the B-DNA at its major groove. The arrangement of DNA and protein helices for Cro (phage 434) is shown in stereo in Fig. 10.4. The helix–turn–helix segment contains approximately 20 residues per subunit, but remember, it is the dimeric form that is important for binding. As can be seen in Fig. 10.4, one of the ␣ helices is located within the major groove of the DNA recognition element. Although the two helices are a major part of the DNA–protein interaction site, other elements apart from this secondary structure also play a role. Specific amino acid–nucleotide base interactions occur. In the twofold symmetrical dimer, the distance between the two recognition ˚ . This corresponds to one full turn of double-helical B-DNA. helices is 35 A
132
A Leucine Zipper
PHAGE 434 REPRESSOR–DNA COMPLEX The bacteriophage 434 repressor controls transcription by binding to a set of six similar 14-base pair DNA operator sites in the genome. The binding of the phage 434 repressor, like that of the Cro repressor from phage, serves as a regulatory switch, in this case between lysogeny and lytic growth of the bacteriophage. The crystal structure of the complex includes amino acid residues 1–69 of the binding domain of the phage 434 repressor and a 20-base pair operator site as is shown in Fig. 10.4. The significance of this structure is that it shows that the helix–turn–helix does form specific contacts with the major groove of DNA. The crystal structure of the DNA-binding domain of the phage 434 repressor contains residues 1–69. By using stereo, it should be apparent that phage 434 repressor is a five ␣-helical bundle protein that includes the helix–turn–helix motif. One of the helices is short. While a monomer in solution, it binds to DNA as a dimer. A monomer of residues 1–69 does not display any large conformational change in its structure when bound to DNA in comparison with the unbound monomer. However, several amino acid side chains that interact with DNA do show minor rearrangement on binding to DNA. Coordinate files for the holoprotein are listed in the Protein Data Bank (accession numbers 3cro, 1per, and 1rpe; protein with no associated DNA, 2cro and 1rb9). The stereodiagram in Fig. 10.4 illustrates the interactions between a segment of DNA and the phage 434 binding protein. The protein helix sitting in the major groove has most of the side chains that interact with atoms in the B-DNA. These include Q28, T27, Q29, and Q33, all belonging to helix H1 in Fig. 10.4. The major groove recognition helix is positioned so that its helical axis is nearly perpendicular to the plane of the drawing. In addition to the contacts visible in Fig. 10.4 between protein and DNA, the two monomers make protein–protein contacts at the dimer interface. So the dimerization on binding to DNA is further stabilized by the weak protein–protein contacts. These are formed by a patch of hydrophobic residues, a salt bridge between an arginine and a glutamic acid, and a hydrogen bond between the side chains of two other arginine residues. Overall, these protein–protein interactions formed at the dimer interface serve to fix the orientation of the two recognitions sites on the protein dimer and therefore are as important as the protein–DNA interactions.
A LEUCINE ZIPPER One form of DNA-binding motif is referred to as a leucine zipper. An example of a leucine zipper is a domain of protein GCN4. GCN4 is a 281-amino acid polypeptide, of which only the last 33 residues are required for dimerization. The C-terminal end of GCN4 binds to the major groove of B-DNA at a recognition site involved in the control of amino acid biosynthesis in yeast (Hinnebusch, 1984). The crystal structure of this C-terminal segment was first determined without any DNA present (O’Shea et al., 1991). Later, a crystal structure was done with the protein motif bound to a fragment of B-DNA (Keller et al., 1995). The resemblance of the motif to a common zipper is striking and the motif is often referred to as bZip. A stereodiagram of a leucine zipper protein–DNA complex is shown in Fig. 10.5. The bZip motif, like the helix–turn–helix motif, interacts with DNA as a homodimer with twofold rotational symmetry. The DNA recognition element is a 20-mer. This site is a pseudopalindrome of two 4-base pair half-sites that overlap at a central G–C base pair. Each of these half-sites is contacted by one protein monomer. The DNA conformation in the complex is straight B-form DNA. No deviations are seen in its conformation along the portion that binds protein.
10. Protein and Nucleic Acid Complexes
133
Fig. 10.5 A leucine zipper with a recognition element. The coordinates have an accession code 1YSA and are derived from a crystallographic study.
The protein dimer is formed by interaction of the side chains in the coiled coil zipper. An N-terminal region has a high composition of basic residues that bind to the DNA in the major groove. The leucine zipper sequence is characterized by a crude heptad repeat, (abdedfg)n, with the occurrence of hydrophobic residues at positions a and d. Conserved leucines occur at position d over the length of the helix. In an imperfect fashion, the hydrophobic residues are the teeth of the zipper. The protein segment shown in Fig. 10.5 contains 56 amino acids. Remember that this is only a small portion of GCN4. The dimeric form containing both the dimerization domain and the basic DNA-binding region without the cognizant DNA is shown in the stereodiagram presented in Fig. 10.6. In the GCN4 peptide structure, the dimerization motif itself is a pair of coiled coil ␣ helices. The amino acid sequence for this domain is as follows: 227 240 250 260 270 280 PAALKRARNTEAARRSRARKLQRMKQLEDKVEELLSKNYHLENEVARLKKLVGER Note the large number of basic residues in the N-terminal segment, from position 227 to 245. In fact, 6 of the 18 residues are basic amino acids. This has been referred to as the basic region, and as seen in Fig. 10.5, it is the portion of the protein in direct contact with the DNA. The dimerization motif packs with hydrophobic residues near the center of the coiled coil, although such packing is not nearly as systematic as the name ‘‘leucine zipper’’ implies. In fact, as is visible in Fig. 10.6, one of the leucines is located on the surface of the coiled coil. Furthermore, the ‘‘teeth’’ of the zipper do not interleave but rather face each other across the twofold rotation axis of the coiled coil. The open end of the zipper appears to end near residue a L5. In the ␣-helix coiled coil, there is a twist to the helical segment leading to a supercoil. In the complex with
134
Summary
Fig. 10.6 A leucine zipper with side chains (see legend to Fig. 10.5).
DNA, this coil twisting no longer occurs. This should be demonstrable with a molecular graphics program. Start with a relatively long line along the helical axis. By moving this line up and down through the coil, it should be possible to demonstrate where the supercoiling of a single ␣ helix appears to end.
SUMMARY Complexes between DNAs and proteins take two general forms: (1) sequence specific, as would be expected for translational regulators, and (2) nonsequence specific, as might be predicted for DNA-packaging proteins such as the histones. Because of their importance in regulating gene expression, it is the first category for which there are the most conformational data. Most of the complexes that have been studied by either X-ray or NMR methods involve B-DNA. In this form of DNA, the major groove of the duplex is large enough to accommodate protein secondary structure the size of an ␣ helix. To be a regulator of DNA transcription, the protein motif must recognize specific nucleotide sequences. Such sequences have been called the recognition element. Oftentimes the recognition element has a duplex palindromic sequence. Some are pseudopalindromic. Such sequences when viewed perpendicular to the helical axis of B-DNA have twofold symmetry. The result of these rather simple features is that the regulator protein is a dimer with twofold symmetry. The twofold axis of the dimeric protein is arranged colinearly
10. Protein and Nucleic Acid Complexes
135 with the pseudotwofold axis of the duplex DNA at the sequence palindrome. At present there appear to be six different motifs that are involved in complex formation, several but not all involving ␣ helices. Some of these structural observations are easily explained while others are not. For the B-form of DNA in the absence of any major deformation, atoms in the bases near the major groove are the most accessible. In these cases, most but not all of the protein–nucleic acid interactions take place through the major groove. In addition, some DNA deformations have been seen in the structures of the protein complexes. The interactions with a protein can twist or change some of the commonly observed B-DNA properties, such as the propeller twist of the bases. In addition, water molecules have been observed to bind to specific sites on both DNA and protein. In fact, in some instances, they may form the link between nucleic acid and protein. Water, therefore, may play a significant role in complex formation and therefore in the regulation process itself.
PROBLEMS Practicing Stereovision 1. Using Fig. 10.1 or 10.2, determine how many turns of B-DNA are present. 2. Using Fig. 10.4, estimate the length of DNA that appears to be ‘‘covered’’ by the Cro repressor from phage. How many turns of the B-DNA double helix would this be? Visually identify five ␣ helices in the DNA-binding domain. 3. View Fig. 10.5 in stereo and determine the approximate position of the twofold rotation axis relating both the protein subunits and the DNA recognition element.
Using Computer Graphics 4. One of the motifs mentioned at the beginning of this chapter is a zinc finger. An example of this motif is represented by the crystallographic study of zif268. It has the PDB accession code 1zaa. The motif contains a bound zinc atom(s). Obtain the coordinates and study the structure. Determine the consensus sequence for the zincbinding segment. 1zaa contains two such domains. Compare the sequences. 5. Determine whether the principal interactions between protein and nucleic acid occurs in the major or minor groove. 6. Find an amino acid side chain that interacts directly with the bound oligonucleotide. Is this amino acid the same for both zinc fingers? Where is it with respect to the consensus sequence for the zinc-binding site? 7. Can you identify the recognition element in the bound nucleotide?
REFERENCES Anderson, W. F., Ohlendorf, D. H., Takeda, Y., and Matthews, B. W. (1981). Structure of the cro repressor from bacteriophage lambda and its interaction with DNA. Nature (London) 290, 754. Crick, F. H. C., and Watson, J. D. (1954). Proc. R. Soc. (London) Ser. A 223, 80. Hinnebusch, A. G. (1984). Evidence for translational regulation of the activator of general amino acid control in yeast. Proc. Natl. Acad. Sci. U.S.A. 81, 6442. Keller, W., Konig, P., and Richmond, T. (1995). Crystal structure of a bZIP/DNA complex at 2.2A. J. Mol. Biol. 254, 657–667. McKay, D. B., and Steitz, T. A. (1981). Structure of catabolite gene activator protein at 2.9A resolution suggests binding to left-handed B-DNA. Nature (London) 290, 744.
136
References
O’Shea, E. K., Klemm, J. D., Kim, P. S., and Alber, T. (1991). X-ray structure of the GCN4 leucine zipper, a two-stranded parallel coiled-coil. Science 254, 539. Pabo, C. O., and Lewis, M. (1982). The operator-binding domain of lambda repressor. Nature (London) 298, 443. Pabo, C., and Sauer, R. (1992). Transcription factors: Structural families and principles of DNA recognition. Annu. Rev. Biochem. 61, 1053–1095. Watson, J. D., and Crick, F. H. C. (1953). Molecular structure of nucleic acids. A structure of deoxyribose nucleic acid. Nature (London) 171, 737–738; Genetical implications of the structure of deoxyribonucleic acid. Nature (London) 171, 964–967. Wilkins, M., Stokes, A., and Wilson, H. (1953). Molecular structure of deoxypentose nucleic acids. Nature (London) 171, 738–740. Wing, R., Drew, H., Takano, T., Broka, C., Tanaka, S., Itakura, K., and Dickerson, R. E. (1980). Nature (London) 287, 109.
C H A P T E R
11 Metal Ions Bound to Proteins INTRODUCTION etal ions play an important role in the structure and function of many proteins. In fact, approximately one-third of all known proteins bind metal ions that serve catalytic, regulatory, or structural roles. Both hemoglobin and myoglobin have a heme iron as part of their oxygen transport function. The first crystal structures described the location of the metal ion in terms of the protein atoms. Heme is a tetrapyrrole cyclic compound that binds iron very tightly. In addition to serving as a cofactor in the transport of oxygen, it is also present in a wide variety of proteins in both plants and animals, where it serves as part of redox chains alternating between Fe2⫹ and Fe3⫹. In the early structural studies of hemoglobin and myoglobin, the iron atom was found at a site that indicated a semicovalent bond between it and a histidine side chain. In other proteins, such as some of the cytochromes, moieties on the tetrapyrrole nucleus may also form covalent bonds with side chains of the proteins. Metal ions played another especially important role in protein crystallography. In Chapter 2, the use of heavy atom derivatives was described. The formation of these protein derivatives was necessary for the derivation of phase information before an electron density map could be obtained and studied. Compounds such CH3HgCl, p-chloromercuriphenylsulfonic acid, and others were used to obtain derivatives with reactive –SH groups. Other popular compounds were PtCl42⫹, PtCl64⫹, and lead and gold compounds. Books on protein crystallography generally contain a list of compounds that have been used successfully as heavy atom derivatives. To find binding sites for these heavy metal compounds is generally a trial-and-error process. It is usually done by soaking a protein crystal in a solution containing the metal ion and searching for binding by observing changes in the intensities of the X-ray reflections. Since the heavy atom sites rarely had anything to do with the function of the protein, most were not studied in detail once electron density maps became available. How does one identify a metal site in an electron density map of a crystalline protein? In deriving atom coordinates for a polypeptide chain, using electron density maps, the need for the amino acid sequence is indispensable. Electron density for a polypeptide chain generally offers no clues as to the type of element present at any site. It is the shape of the density that provides the key information for fitting carbon, nitrogen, sulfur, or oxygen atoms to points in the map. Metal ions in protein maps present a similar problem. The protein crystallographer must rely on chemical analyses to provide information as to the type of metal ion present, although sometimes hints are obtainable from the coordination geometry. There is one distinguishing characteristic in studying the electron density map for bound metal ions. A fully occupied metal site in a crystalline protein is easily identifia-
M
137
138
Coordination of Metals in Proteins
Fig. 11.1 Identifying metal ions in electron density maps. This schematic drawing is a twodimensional representation of electron density levels near a metal site in a protein. The electron density contours are shown at various levels as described more fully in text. At low threshold levels, electron density is visible for carbon, nitrogen, and iron atoms. The density for carbon and nitrogen disappears when higher threshold levels are selected.
ble from the level of electron density. This is illustrated in Fig. 11.1. In this schematic representation of an electron density map, threshold contouring is started at 3. Two higher thresholds of 4 and 20 are also shown. A segment of electron density described by the volume around a –C–N– segment is clearly visible at 3, as is the density for a nearby iron atom. As the level is raised to 4, the electron density around the polypeptide chain begins to disappear, and at 20 it is gone. However, the electron density around the iron atom is still present at much higher levels and can be seen even at 20. The electron density essentially follows the rule given in Chapter 2: the X-ray scattering potential of metal ions is proportional to their atomic number. Incidentally, levels of electron density in a protein map are arbitrary. Generally, they are not calcu˚ 3. Instead, they are scaled such that ⫽ 1. This is defined lated on an absolute scale—e/A in Eq. (11.1), where j ⫽ xyz as described in Chapter 2. ⫽ (⌺ j2)1/2(N ⫺ 1) summed over all i ' ⫽ k(⌺ j2)1/2(N ⫺ 1) ≅ 1.0 summed over all i
(11.1) (11.2)
where N is the number of points in the electron density map and j is the value of the electron density at each of the calculated grid points. is the standard deviation for the calculated electron density map, and ' ⫽ 1 after scaling the map with the constant k.
COORDINATION OF METALS IN PROTEINS Nuclear magnetic resonance (NMR) and crystal structures of a large number of proteins containing bound metals are now known and a number of reviews exist. A small sampling of these references covers both structural and functional principles of metal sites in proteins (Holm et al., 1996; Glusker, 1991; Tainer et al., 1992). In addition, thousands of crystal structures have been determined for inorganic and organometallic complexes. These data are readily accessible as coordinates through the Cambridge Structural Database, a database resembling the PDB but for small molecule structures (go to www.ccdc.cam.ac.uk). A word of caution is necessary. In metal coordination jargon, the word ligand is reserved for chemical moieties that interact directly with the metal ion. Hence in the metalloprotein field, the metal ion is not referred to as the ligand. Rather, the atoms belonging to the protein but that coordinate with the metal ion are the ligands! What
139
11. Metal Ions Bound to Proteins
metal ligands exist in proteins? In general, they are the same groups that were listed in Chapter 9 as forming hydrogen bonds: amides of amino acids N and Q, carboxylates of D and E, nitrogens in the imidazole ring of H, nitrogens in the amino group of K, nitrogens in the guanidino end of R, the sulfur atoms of C and M, oxygens and nitrogens belongs to the C- and N-terminal amino acids, the nitrogen in the side chain of W, and main-chain oxygen and nitrogen atoms. Metalloprotein interactions are comparable to those seen in other organic and inorganic complexes. They are described by their coordination number and geometry. The coordination number represents the number of atoms interacting directly with the metal ion. Distances between a metal ion and a liganding atom are generally in the range ˚ . Every graphics program provides a means of determining the distance of 1.8 to 2.4 A between atoms while viewing the structure, and this facility may be used to identify the liganding groups in the coordinates of metalloproteins. In hemoglobin, six atoms interact with the heme iron, and thus the coordination number is 6. If all of the atoms that ligate with the iron are connected, an octahedron is formed. The geometry of iron in hemoglobin is octahedral. Other coordination states and geometry of iron in unrelated proteins exist and hence it is misleading to extrapolate from one metalloprotein to another.
FUNCTIONAL REASONS FOR METAL ION BINDING Although not always the case, metal ion binding to proteins and other biological macromolecules appears to fall into five or six functional categories (Holm et al., 1996). Structural The metal ion is needed for the correct conformation of a region of a protein. It may even serve a regulatory role. Transport and storage The metal ion is bound to a protein as a means of transport or storage. Electron transfer The metal ion serves as a redox center for the transfer, storage, or uptake of electrons. The redox potential of the metal ion is partially defined by the protein. Dioxygen binding The metal ion transiently binds dioxygen (O2). Hemoglobin is an example but other iron-containing proteins in this category may not include a heme group. Catalytic The metal ion is needed for substrate or water binding, substrate activation, or developing appropriate transition states. Other Metal-binding sites have been identified in proteins that appear to satisfy none of the above five tasks. Therefore, this is a questionable category. Nonetheless, nature occasionally appears to evolve a metal-binding site on a protein with no function whatsoever. Such nonfunctional sites have been identified in crystal structures. The most frequently found metal-binding sites in proteins are for iron, copper, zinc, and calcium. All but Ca⫹ belong to the first transition series, beginning with Sc (Z ⫽ 21) and including Ti, V, Cr, Mn, Fe, Co, Ni, Cu, and Zn (Z ⫽ 30). In addition to calcium, Mg2⫹ (a group IIa element) is frequently found bound to proteins. In a few instances, K and Na have also been identified. The transition metals favor protein atoms nitrogen and sulfur, typically from histidine and cysteine side chains. The IIa elements favor oxygen atoms derived from aspartate and glutamate side chains. In the PDB examples of metalloproteins, one or more water molecules frequently occupy sites on the protein-bound metal ion. Incidentally, the first transition series metals are generally not good enough X-ray scatterers to serve as heavy atom derivatives. Rather, elements in the third transition series are more commonly used (La, Hf, W, Re, Os, Ir, Pt, Au, and Hg).
140
Metalloproteins for Transport and Storage
Fig. 11.2 A zinc finger with a DNA fragment. This stereodrawing depicts the crystal structure of a zinc finger motif with a segment of DNA. The drawing was made using the coordinates with the PDB accession number 1ZAA. The oligonucleotide is shown as a backbone with the protrusions representing the location of bases. There are three zinc fingers in this segment of the protein. The three metal sites are labeled Zn1, Zn2, and Zn3, respectively. The four amino acids responsible for the binding of the zinc ion at each of the sites are labeled and include C7, C12, H25, and H29 (site 1), C37, C40, H53, and H57 (site 2), and C65, C68, H81, and H85 (site 3).
A METAL ION PROVIDING A CONFORMATIONAL FUNCTION An example of a protein-bound metal ion serving a conformational role is represented by the zinc finger motif. Here the metal ion appears to lock a segment of the polypeptide chain into a conformation that is able to recognize certain segments of DNA. Multiple zinc finger motifs in a single polypeptide chain have been found in a variety of proteins that are transcriptional regulators. Three zinc motifs in a single polypeptide chain are shown in Fig. 11.2. When viewed in stereo, the Zn2⫹ coordination number should be readily apparent. Similarly, with stereoviewing the geometry at the metal-binding site should be visible. Is it square planar or tetrahedral? The zinc finger forms a globular domain composed of two antiparallel  strands and an ␣ helix that are precisely oriented with respect to each other by a Zn2⫹ ion. The two cysteine residues are located in the -sheet region, while the two histidine residues are found in the carboxy-terminal region of the ␣ helix. The coordination sphere of the Zn2⫹ ion is complete, and therefore it has no direct role in binding to the recognition element on DNA. Formation of the metalloprotein is essential for the transcriptional regulation but it serves mainly as a conformational determinant.
METALLOPROTEINS FOR TRANSPORT AND STORAGE A number of proteins bind metal ions either to transport them through physiological media or to store the metal ion for later use by the organism. Ferritin is a multisubunit
141
11. Metal Ions Bound to Proteins
protein with a hollow core whose function is to store iron for the subsequent synthesis of redox and oxygen transport proteins. In terms of molecular structure, ferritin is a very complicated molecule containing hundreds of iron atoms in an assumed disordered array. To choose a simple example that may be studied in stereo, look at the molybdatebinding protein (ModA) found in the periplasmic space of Escherichia coli (Hu et al., 1997; Rech et al., 1996). It is responsible for the uptake of molybdenum by the microorganism. The molybdenum (not as molybdate) is needed for the active sites of several enzymes. A C␣ model is shown in Fig. 11.3A, with the details of binding given in the stereo enlargement shown in Fig. 11.3B. Note the two-domain structure of the polypeptide chain. The MoO42⫺ is located between the two domains, and since it is completely surrounded by protein atoms, the domain structure may provide for some form of flexing motion permitting the entry/exit of the molybdate. The MoO42⫺ is a dianion. Surprisingly, it is not girdled by basic side chains such as those of amino acids H, K, and R. Instead, a series of neutral polar groups form the immediate surroundings and hydrogen bonds are formed with the oxygen atoms of the metal anion. Computer graphics provide a mechanism for rotating this site, making it easier to identify the hydrogen-bonding atoms. At least eight are present. Look for the –OH side chain of Y170, the peptide hydrogen and –OH of S12 and S39, and the peptide –NH groups of A11, A125, and V152. Thus eight noncovalent links are formed between the protein and the MoO42⫺. Incidentally, the anion-binding site on ModA is able to distinguish between different anions. Molybdate and tungstate have binding constants in the micromolar range whereas those of sulfate and phosphate are on the millimolar scale. Unexpectedly, just beyond the first level of ligating polar groups are a series of methylene or methyl carbon atoms deriving from A10, A11, A58, V123, P124, A125, and V152. These surroundings place the metal anion in a second level of nonpolar atoms. Last of all, it is noteworthy that the MoO42⫺ site serves as another example of sequence degeneracy in structural biology. More than half of the first-level interactions with the anion involve the hydrogens from peptide nitrogens. So long as major changes in the main-chain conformation do not occur, these important sites may be replaceable with a variety of other amino acids with little effect on MoO42⫺ binding.
METALLOPROTEINS AS REDOX INTERMEDIATES Iron- and copper-containing proteins are found where transitory changes in the metal oxidation state are part of their biological function. They participate in a wide variety of biochemical redox reactions. The redox potential of the metal ion is linked not only to its own properties (iron versus copper, etc.) but also to the surrounding protein conformation. Iron-binding proteins appear to be the most common. The iron-binding proteins can be divided into two classes: heme and nonheme. The cytochrome c proteins are an example of redox proteins with a heme iron. Ferredoxins are nonheme ironcontaining proteins with a redox function, but there are also redox proteins that contain copper. Two structures will serve as examples of redox metalloproteins: cytochrome c and plastocyanin. The former is an iron/heme protein, and the latter is a coppercontaining protein. Cytochrome c is an electron transfer protein that is located on the mitochondrial membrane. It contains a heme group that accommodates iron in both its Fe2⫹ and Fe3⫹ oxidation states. Cytochrome c not only serves as a useful model of a redox protein, but it is also interesting from the evolutionary point of view. Depending on the species source, it is an approximately 110 amino acids long. Several crystallographic structures, and many amino acid sequences, were determined in the 1970s and 1980s. With this accumulation of sequences and structures, an evolutionary link was demonstrated
142
Metalloproteins as Redox Intermediates
Fig. 11.3 Molybdate-binding protein. These two stereoviews are derived from a single set of coordinates—1AMF. (a) C␣ coordinates for the entire protein; (b) An enlargement of the molecular structure close to the molybdate-binding site. A number of important residues are labeled, as is the metal anion.
between the animal, microbial, and plant worlds. This important link was made by comparing the crystal structure of the cytochrome c proteins from organisms in the different kingdoms.
11. Metal Ions Bound to Proteins
143
Fig. 11.4 Crystal structure of rice ferricytochrome c. This stereoview of the crystal structure of ferricytochrome c from rice depicts the conformation as seen from a C␣ model. Side chains interacting directly with the heme group include M88 and H26. C22 and C25 form covalent thioether linkages with the vinyl side chains of the heme moiety. The C and N terminals are labeled CT-111 and NT-1, respectively. The coordinates for the drawing were obtained from the PDB with the accession code 1CCR. Compare the heme binding in cytochrome c with that in hemoglobin and myoglobin (Ochi et al., 1983).
Cytochrome c from rice is shown in Fig. 11.4. Other crystal structures (including that from tuna heart, the first to be crystallized) were compared, and along with the amino acid sequences, structural and evolutionary homology was clear. The family of cytochrome c proteins also offers another example of the degeneracy in the link between amino acid sequence and structure. The conformational similarities among the cytochrome c proteins from different species are found even when as few as 10 residues are invariant. That’s right—10 out of approximately 110 residues. As is visible in Fig. 11.4, the heme and heme iron are located near the center of the protein. Unlike hemoglobin and myoglobin, in which the heme is held in place by a single semicovalent bond to a histidine, covalent bonds are formed between two cysteine side chains and the vinyl groups extending from the protoporphyrin nucleus. Also differing from the oxygen-carrying heme proteins is a sulfur atom from a methionine side chain that forms one of the six iron coordination sites. Because of the thioether links with the protoporphyrin nucleus, a characteristic recognition sequence for this type of cytochrome is –CXXCH–. With cytochrome c and, for that matter, all redox proteins, the site of electron storage, the central iron atom, is not easily accessible. This means that another protein or enzyme transferring the electron in a chain of reactions, does not have a direct approach to the redox site. The area of electron movement to and from redox centers has been the subject of intense study. The mechanism for this movement is still unclear, but it is believed that some form of electron tunneling occurs. Most biochemistry textbooks contain a description of redox potentials that describe the relative electron transfer potential. The more negative the potential, the better the reducing agent. Cytochrome c has a redox potential of 220 mV. Ferredoxin, on the other hand, has a redox potential of ⫺430 mV. In both cases, the redox reaction involves the interconversion of Fe2⫹ and Fe3⫹.
144
Metalloproteins That Serve a Catalytic Function
A second example of a redox protein, this time containing a cupric ion (Cu2⫹), is shown in Fig. 11.5. Called plastocyanin, it acts as a redox protein in oxygenic photosynthesis, carrying electrons from cytochrome f to a protein in photosystem I (Redinbo et al., 1993). Note first the overall conformation; the motif consists of an eight-stranded  barrel. The Cu⫹↔2⫹ is the redox center. Its ligands include two nitrogen atoms from histidine side chains and the sulfur atoms from a cysteine and a methionine. Unlike cytochrome c, the redox center is close to the surface of the protein, suggesting that a complex with another protein would permit relatively close contact for the transfer of the electron. Because of its relatively small size, plastocyanin is a good choice for studying metal coordination. With stereodrawings, it is generally more difficult to determine the geometry than the coordination number. A copper coordination number of 4 suggests it is either square planar or tetrahedral. By examining Fig. 11.5 in stereo, the ligating side chains appear to be at the apices of tetrahedron. However, the metal-to-ligand bond distances are not identical. With a graphics program, each of the ligand-to-copper ion distances should be measured. When that is done, the distance from the sulfur atom of M92 to the copper is found to be slightly longer than those from H37, H87, and C84. The tetrahedron is slightly distorted, lengthened in the direction of the sulfur atom of M92. Both cytochrome c and plastocyanin are capable of one-electron transfers. There are, in fact, a relatively large family of redox proteins involved in one-electron transfer reactions. They are the iron-containing ferredoxins and are characterized by the presence of sulfur. Many of the ferredoxins contain four iron atoms and four inorganic sulfur atoms. Imagine the iron atoms on the corners of a cube, with none adjacent to each other along an edge. Variations of this redox center include the 3Fe/4S form. The iron atoms in these cagelike Fe–S centers are usually bonded to the sulfur of cysteine side chains in the protein.
METALLOPROTEINS THAT BIND DIOXYGEN The binding of dioxygen to certain heme-containing proteins was described in Chapter 8. In lower animals such as worms, a protein containing two iron atoms without the protoporyphyrin ring will also bind and transport oxygen. A typical example is the protein hemerythrin (see 2HMQ in the PDB). Hemerythrin has a very simple motif. It consists of four helices grouped together in a bundle, in which each of the helices is nearly parallel to the others.
METALLOPROTEINS THAT SERVE A CATALYTIC FUNCTION Metal ions often have a vital role in the active sites of a variety of enzymes. Metalloenzymes are important in numerous redox reactions, including such widespread phenomena as photosynthesis and nitrogen fixation. Other metalloenzymes are necessary catalysts in detoxification reactions, including the cytochrome P-450 proteins and the superoxide dismutases. There are metalloproteases, and metalloenzymes, in the oxidative phosphorylation pathway. The list of metal ions in active sites is very long. In the 1990s, the ability to achieve site-directed mutagenesis led to many attempts to insert active metal sites in non-metal-containing proteins. The potential for the development of catalytically active metal sites in proteins is great and will continue to be important in structural biology. Here a simple example of an enzyme containing a metal ion in the active site is used. One of the fastest enzymes known (high turnover number) is carbonic anhydrase. It catalyzes the reaction shown in Fig. 11.6. The reversible hydration of CO2 to bicar-
11. Metal Ions Bound to Proteins
145
Fig. 11.5 Coordination of a copper ion in the protein plastocyanin. This stereoview depicts the conformation of plastocyanin (PDB accession code 2PLT) from green algae. To make it easier to view the copper ion, lines have been drawn between it and the nitrogen atoms from H37 and H87, ˚ except for that from copper to and sulfur atoms from C84 and M92. The distances were all 2.0 A ˚. methionine, which are separated by 2.9 A
bonate, HCO3⫺, by carbonic anhydrase requires a Zn2⫹ in the active site. Binding of a water molecule to the protein-bound zinc ion lowers the pK of the water molecule to about 6 or 7 units below normal, giving it a pK of about 7. The resulting OH⫺ is an excellent nucleophile capable of attacking the carbon atom of CO2 to form HCO3⫺, as shown in Fig. 11.6. Crystallographic studies of carbonic anhydrase from human erythrocytes have been carried out with a variety of inhibitors and with a very high concentration of bicarbonate (for an example, see Kumar and Kannan, 1994). The structural results can be easily ascribed to the mechanism described above. To avoid viewing difficulties, only the active site, including the bound zinc ion, is shown in stereo in Fig. 11.7. The Zn2⫹ is bound within the active site pocket by three histidine side chains: H94, H96, and H119. What is the geometry at the site? One oxygen atom of the bicarbonate is also
Fig. 11.6 Reaction mechanism for a metalloenzyme: carbonic anhydrase. The metalloenzyme carbonic anhydrase catalyzes the formation of bicarbonate from CO2 and water. The active site contains a Zn2⫹ ion that is bound within a relatively deep funnel-shaped cleft in the 3D structure. Figure 11.7 illustrates the stereochemistry of the metal ion. In roughly tetrahedral geometry, Zn2⫹ ion binds a water molecule in the fourth coordination site. The metal-bound water molecule has a lowered pK and is present as an OH⫺ ion. The zinc-bound hydroxyl ion is a good nucleophile and attacks the carbon atom of CO2 with the formation of bicarbonate.
146
Summary
Fig. 11.7 The stereochemical arrangement at the active site of carbonic anhydrase from human erythrocytes. This stereodrawing is taken from the crystal structure of human carbonic anhydrase (PDB accession code 1HCB). Many forms of crystalline carbonic anhydrase have been studied. In this case, the bicarbonate ion is present in the active site. The coordination of the Zn2⫹ ion involves the imidazole side chain of three histidine residues, H-94, H-96, and H-119. An oxygen atom belonging the bicarbonate is close to one of the four sites. In the direction of bicarbonate formation, this site would be filled by a water molecule as described in the caption to Fig. 11.6 and in text.
visible, close to the zinc atom. The proximity of an oxygen atom from the bicarbonate and the stereochemistry in general agree well with the proposed reaction mechanism. As already mentioned, the mechanism is thought to involve an OH⫺ formed by the binding of a water molecule to the active site zinc ion. The OH⫺ is in close proximity to what is believed to be the CO2-binding site. The nucleophilic OH⫺, in close proximity to the CO2, attacks the carbon atom of CO2 and bicarbonate is formed. The simple reaction mechanism for carbonic anhydrase is not always typical of redox enzymes. Many are believed to have mechanisms that are far more complicated than that of carbonic anhydrase. Nonetheless, reaction mechanisms for metalloenzymes are frequently understood in far more detail than those of their non-metal-containing counterparts. This is partly due to the host of spectroscopic methods that can be used to study metalloenzymes and their transition state intermediates.
SUMMARY Metalloproteins are commonplace in biological systems. The metal sites are generally preformed in the folded protein. When analyzing a metal site within a known protein structure, the coordinates may be used to obtain the coordination number and geometry. These two properties appear to be very similar in organometallic complexes and in their protein counterparts. In most metalloproteins, the bound metal ion imparts a specific
11. Metal Ions Bound to Proteins
147 biological function. The biological function is sometimes related to the binding affinity. Hence there are some proteins that must exchange their metal ion reversibly to affect their function. Ca2⫹ ions as part of cell signaling systems usually behave in this manner. Active site metal ions generally have a very high affinity for the protein and do not exchange rapidly with metal ions in solution.
PROBLEMS Practicing Stereovision ˚ , estimate the distance 1. Using the fact that the C␣-to-C␣ distance is about 3.8 A of the redox center in cytochrome c to the surface of the molecule. Compare with the distance of the redox center in plastocyanin (Figs. 11.1 and 11.5). 2. Draw the protein–prosthetic group interactions for cytochrome c. (Hint: It may help to look up the covalent structure of the heme group.) 3. Are the motifs of the two domains in the molybdate-binding protein similar? Describe each of them. 4. You won’t need stereovision for this one: how do you identify metal-binding sites in electron density maps?
Using Computer Graphics 5. Obtain the plastocyanin coordinates (2PLT) and measure the distance of the copper ion to the closest protein surface. 6. Go to the PDB and remove the coordinates for two redox proteins. One should be a 2Fe2S protein and another a 4Fe4S protein. Describe the coordination sites for both redox centers, using a drawing. 7. This will take some graphics editing skills. Obtain the coordinates for the protein xylose isomerase (1XLK) and derive the coordination number and geometry for the manganese ions. Use only one subunit. Some graphics programs permit the calculation of the model structure around a given x,y,z coordinate. This may be a simple way of editing the viewed segment before studying the coordination.
REFERENCES Glusker, J. (1991). Structural aspects of metal liganding to functional groups in proteins. Adv. Protein Chem. 42, 1–76. Holm, R., Kennepohl, P., and Solomon, E. (1996). Structural and functional aspects of metal sites in biology. Chem. Rev. 96, 2239–2314. Hu, Y., Rech, S., Gunsalus, R., and Rees, D. (1997). Crystal structure of the molybdate binding protein modA. Nature Struct. Biol. 4, 703–708. Kumar, V., and Kannan, K. (1994). Enzyme–substrate interactions. Structure of human carbonic anhydrase I complexed with bicarbonate. J. Mol. Biol. 241, 226–232. Ochi, H., Hata, Y., Tanaka, N., Kakudo, M., Sakurai, T., Aihara, A., and Morita, Y. (1983). ˚ resolution. J. Mol. Biol. 166, 407–418. Structure of rice ferricytochrome c at 2.0 A Rech, S., Wolin, C., and Gunsalus, R. (1996). Properties of the periplasmic ModA molybdatebinding protein of E. coli. J. Biol. Chem. 271, 2557–2562. ˚ Redinbo, M., Cascia, D., Choukair, M., Rice, D., Merchant, S., and Yeates, T. (1993). The 1.5 A crystal structure of plastocyanin from the green alga Chlamydomonas reinhardtii. Biochemistry 32, 10560–10567. Tainer, J., Roberts, V., and Getzoff, E. (1992). Protein metal binding sites. Curr. Opin. Biotechnol. 3, 378–387.
C H A P T E R
12 Lipid–Protein Interactions INTRODUCTION he interaction of lipids with proteins is important to a broad spectra of biological phenomenon. Triglycerides, through their metabolic breakdown to fatty acids and then coenzyme A (CoA) derivatives, are a vital source of energy during postprandial periods. Phospholipids in one or another of their many head group variations, are a critical component of biological membranes. Cholesterol and certain fatty acids are precursors for a variety of cell-signaling compounds. Chlorophyll is an integral component of the photosynthetic process. All of these compounds have in common the physical property that they are sparsely soluble in aqueous solutions. While insoluble in aqueous solutions, many of the lipid types can form polydisperse systems such as micelles and liposomes. Proteins embedded in membranes tend to share physical properties with lipids. Such proteins are not very soluble in aqueous buffers, have a tendency to form nonspecific aggregates, and are generally difficult to deal with by common biochemical methods. These properties have hindered efforts to obtain detailed structural data describing the conformation of membrane proteins and certain soluble lipoproteins. Typically, membrane proteins require detergents to prepare monodisperse solutions. Many appear to favor specific detergents in an essentially unpredictable manner. Unless the membrane protein specimen has a fixed number and fixed position of bound detergent, crystals may be difficult to obtain. Proteins that are soluble in aqueous buffers but still contain lipid-like compounds are easier to purify and can be treated much the same as other proteins and enzymes. Nonetheless, a subclass of soluble lipoprotein systems has proved to be as difficult as membrane proteins, insofar as obtaining structural data is concerned. For example, the serum lipoproteins, although soluble in aqueous solutions, are not readily studied by any diffraction or nuclear magnetic resonance (NMR) method. This is because they are heterogeneous with respect to lipid and sometimes protein composition. In spite of all these difficulties, two lipid-containing macromolecules have been examined and conformations were obtained by diffraction methods more than 20 years ago. In the soluble lipoprotein area, Fenna and Matthews (1975) were able to determine the crystal structure of a protein containing seven chlorophyll molecules. In the membrane protein class, Henderson and colleagues (1990) obtained a low-resolution structure of bacteriorhodopsin by electron microscopy (EM) and digital image reconstruction. Both structures were landmark achievements. The early work on bacteriorhodopsin showed that it contained seven helices running roughly perpendicular to the plane of the membrane. The subunits interacted to form a trimer with threefold rotational
T
148
149
12. Lipid–Protein Interactions
Fig. 12.1 Digital reconstruction of membrane protein conformation, using electron diffraction and imaging. This schematic drawing contains a representation of the steps involved in recovering two- and three-dimensional images with an electron microscope. For the method to be successful, the protein under study must be in an ordered two-dimensional array, i.e., a two-dimensional crystal. In the electron microscope, a tilting stage makes it possible to change the orientation of the crystalline array. The development of a low-temperature stage allowed radiation damage to be reduced and permitted images of the crystal without heavy metal stains for contrast. At first only images were obtained and digitized. By taking the Fourier transforms, isolating the diffraction maxima resulting from the lattice, and calculating the inverse transform, a filtered image was obtainable. Even corrections for astigmatism and under- or overfocusing can be made during the digital processing. To obtain a three-dimensional image, multiple ‘‘tilts’’ must be recorded, digitized, and recombined. When using negatively stained specimens, all of the data are obtained from a set of images. To obtain higher resolution, lenses in the EM are turned off and a diffraction pattern recorded. The data may be indexed just as they are in X-ray crystallography and digitized to obtain amplitudes: |Fhkl| in Chapter 2. Phases are then obtained from low-dose images, either through short exposures or aided by the low temperature of the cold stage in the electron microscope.
symmetry. It took many years to develop methods to increase the resolution, and thereby demonstrate the interconnection between the seven helices and fit the amino acid sequence to the map.
ELECTRON MICROSCOPY, ELECTRON DIFFRACTION, AND MEMBRANE PROTEIN STRUCTURE The first structures of membrane proteins were obtained by electron microscopy studies of two-dimensional crystalline arrays of the protein contained within a bilayer lipid matrix. In a few instances such as bacteriorhodopsin, ordered two-dimensional arrays were found in situ. Small two-dimensional crystalline patches were found on the membrane surface of the microorganism Halobacterium halobium. These could be isolated and placed on an EM grid. Viewed perpendicular to the membrane surface, the results produced an image of the molecule in projection. Later, as low-temperature, tilting stages became available, other views of the two-dimensional crystals were obtained. Methods very similar to those described in Chapter 2 for analyzing X-ray crystallographic data were used to reconstruct three-dimensional models. The method is shown schematically in Fig. 12.1. First developed by Nobel Prize winner Aaron Klug and colleagues at the Medical Research Council (Cambridge, UK), digital image reconstruction was used to analyze virus structures. Recall that a virus is an assembly of protein molecules arranged with high symmetry, the icosahedral point group. Because of this
150
Membrane Protein Structure by Electron Microscopy and X-ray Methods
high point symmetry, a single EM view of a virus particle contains images of the protein component(s) in multiple orientations. Figure 12.1 shows how the method of digital reconstruction is applied to the study of membrane proteins in a two-dimensional crystalline array. In the absence of a tilting stage, as was the case for the earliest studies, a single projection contained in a negatively stained image was obtainable. The negative stain, usually uranyl acetate, provided extra contrast between aqueous interstices and the protein and lipid. The electron beam used to obtain the images is very destructive, and ordering in the lattice was usually ˚ . By digitizing the EM negative and calculating visible only to a resolution of about 20 A the Fourier transform, maxima were visible that corresponded to the lattice of membrane protein molecules. A ‘‘filtered’’ projection was obtainable by calculating the inverse Fourier transform, using only the data at the maxima due to the lattice. This method corresponds to the lower pathway in Fig. 12.1. Later, with a tilting stage and improved specimen preservation by keeping the specimen at low temperatures, multiple images and diffraction patterns were obtainable. Each new image contains information on a segment of the reciprocal lattice (hkl; Chapter 2) passing through the origin. The diffraction data are in effect collected in three dimensions by tilting the EM stage. This resembles the method of X-ray crystallographic data collection obtainable by repetitious reorienting of the crystal. In the context of EM, by scaling and combining all of the diffraction patterns and by using phases from their corresponding images, a three-dimensional map was obtainable and the 3D conformation derived.
MEMBRANE PROTEIN STRUCTURE BY ELECTRON MICROSCOPY AND X-RAY METHODS The method described in the preceding section led to the first structure of a membrane protein—bacteriorhodopsin from Halobacterium halobium. Although first at low resolution, seven ‘‘tubes’’ of density were clearly present and each had the conformational ˚ ) by features of an ␣ helix. The studies were later extended to higher resolution (3.5 A Henderson and co-workers (1990) and the results are shown in stereo in Fig. 12.2. When viewed in stereo, the seven ␣ helices that make up the molecule should be clearly visible. In addition, bacteriorhodopsin contains a retinoid, usually retinaldehyde or retinol, that serves as part of the transduction function of bacteriorhodopsin. The protein is necessary for pumping protons against a concentration gradient, using light as an energy source. For a variety of reasons, the polypeptide segments connecting the transmembrane helices were not visible in the map and are not present in the coordinates used to prepare Fig. 12.2. However, when viewed in stereo, the interconnections may be easily visualized by connecting them according to the amino acid numbering scheme. The axes of the seven helices, on the basis of their overall orientation relative to the unit cell and membrane patch, are nearly perpendicular to the plane of the membrane. Note that although this is a membrane protein contained within a lipid bilayer, no lipid molecules were visible in the EM map. For them to show up, the lipid would have to be ordered in the bilayer as precisely as the protein molecules. Later, threedimensional protein crystals of bacteriorhodopsin from another microorganism were prepared and were suitable for X-ray study. The molecular structure was determined to ˚ resolution and the coordinates are available with the PDB accession code 1BRX 2.3-A (Luecke et al., 1998). Using the higher resolution electron density map, most of the interhelical connections were visible. The ability to purify membrane proteins by biochemical methods based on a variety of detergents opened a new era in structural biology. Through trial and error, a
12. Lipid–Protein Interactions
151
Fig. 12.2 Molecular model of bacteriorhodopsin derived by electron diffraction methods. This stereodrawing is one of the first conformational determinations of a membrane protein. It is derived from the PDB coordinates 1BRD. The C␣ model shows the location of the seven ␣ helices that make up the structure. A few individual C␣ positions are labeled with their amino acid type and number. All of the interconnecting segments were missing in the map. However, the position numbering should make it possible to trace the pathway of the chain. The protein is embedded in a phospholipid bilayer that would be oriented horizontally. Bacteriorhodopsin contains a bound retinoid that is also labeled and should be visible near the center of the molecule.
detergent is frequently found that stabilizes the membrane protein in a monodisperse form and permits the preparation of three-dimensional crystals suitable for X-ray diffraction analyses. Although still a very difficult approach, investigators have succeeded with a number of membrane proteins. One of the first membrane protein structures to be purified in this way was the so-called photosynthetic reaction center (Deisenhofer et al., 1985). The success of this structural study led to Nobel Prizes for the investigators. However, this is a very large protein and consequently difficult to view in its entirety. Since it is commonplace in the biochemical literature to anticipate ␣-helical anchors as part of membrane proteins, it is important to view some conformations that are far removed from this form of secondary structure. The conformation of the membraneanchoring domain of OmpA is an example containing essentially all  structure (Pautsch and Schulz, 1998). A stereodiagram describing its conformation is shown in Fig. 12.3. OmpA is found in the outer membrane of Escherichia coli. Viewed in stereo, the motif is clearly antiparallel  structure. The strands are twisted and then close to form a barrellike structure. The thought of a membrane-embedded protein suggests a very hydrophobic surface. In Fig. 12.3, this could be illustrated by displaying only the hydrophobic side chains. Even more striking is the distribution of polar residues on the twisted  barrel. Therefore, Fig. 12.3 also contains all of the side chains for the amino acids T, Q, N, K, R, H, D, and E, the polar amino acids. When viewed in stereo, an unusual distribution is apparent. Aside from the ends of  barrel and with only a few exceptions, all of the polar side chains are located in the internal cavity of the barrel. This is particularly true in the middle portion of the barrel. The solvent-facing surface is covered with hydrophobic side chains. Although the fold suggests a porelike structure, the cavity of the barrel is mostly filled with side-chain atoms and apparently has no visible channel through the length of the barrel.
152
Lipid-Metabolizing Enzymes
Fig. 12.3 The conformation of the membrane-binding domain of OmpA. This stereodrawing is derived from the PDB coordinates 1BXW. The protein motif is a twisted antiparallel  barrel. In this orientation, the N terminal and C terminal are visible at the top and labeled accordingly. The C␣ model has side chains shown for all T, Q, N, K, R, H, D, and E amino acids (Pautsch and Schulz, 1998).
LIPID-METABOLIZING ENZYMES Many of the lipid-metabolizing enzymes are associated in some way with organelle or cellular membranes. A few, like the pancreatic lipases, are secreted and eventually end up in the digestive tract. Here they aid in the breakdown of lipids such as the triglycerides. Some of the enzymes involved in lipid metabolism deserve special attention since they must deal with lipid phases rather than monodisperse substrate. They must carry out the catalytic reaction at a lipid–water interface formed between two nonmiscible phases. Triglycerides, for example, are present in physiological fluids either as lipoproteins or in the form of lipid droplets or micelles. The lipid uptake process begins with the action of a lipase on the water-insoluble triglyceride. The lipase hydrolyzes the ester bond at either the sn1 or sn3 positions of the glycerol, resulting in a fatty acid and diacylglycerol. The catalytic reaction falls into the category of interfacial processes. That is because the initial hydrolysis occurs fastest at the interface of the aqueous physiological fluid and the surface of a micelle or lipoprotein that contains the target triglyceride. The enzyme is literally thought to reside on the surface of a lipid phase. An example of a triacylglycerol lipase is shown in stereo in Fig. 12.4 (Bourne et al., 1994; Carriere et al., 1998). This pancreatic protein is secreted into the gut and like most secreted proteins contains a number of disulfide bonds, six in all. They are labeled in the drawing. Notice that S–S6 involves the C-terminal amino acid. Parenthetically, this is a good structure for the study of disulfide bonds, a segment of structural biology that has been left untouched until now. (see problems 2 and 4 at the end of this chapter).
12. Lipid–Protein Interactions
153
Fig. 12.4 The conformation of horse pancreatic lipase. This stereoview contains a C␣ model of horse pancreatic lipase. The model was drawn from the coordinates 1HPL (Bourne et al., 1994). The disulfide bonds throughout the protein are included with the C␣ model. They are labeled as follows: S–S1 ⫽ C-4 to C-10; S–S2 ⫽ C-90 to C-101; S–S3 ⫽ C-237 to C-261, S–S4 ⫽ C-285 to C-296; S–S5 ⫽ C-299 to C-304; and S–S6 ⫽ C-433 to C-449. Also present in the stick drawing and labeled are the side chains for S-152, D-176, and H-263. These are the active site residues. The word FLAP marks a segment of polypeptide chain, approximately residues 237– 261, that covers the active site in the conformational state shown here.
Horse pancreatic lipase (HPL) is clearly a two-domain enzyme. In Fig. 12.4, the domain near the top of the illustration belongs to the N terminal and contains the active site. The C-terminal domain at the bottom is formed by  strands. Try to identify the eight strands that form an antiparallel  barrel in the C-terminal domain. Because of the size of the protein they may be difficult to find. Three side chains are also contained in their respective positions in the N-terminal domain: S152, D176, and H263 in Fig. 12.4. These same three residues in much the same orientation are found in a family of proteins called the serine proteases (Carriere et al., 1998). However, there is little structural homology between the serine proteases and the lipases. In other words, this so-called catalytic triad appears in several protein families that do not appear to be homologous or evolutionarily related. This is perhaps an example of convergent evolution. It is currently unknown how the lipase is oriented on a triglyceride micelle. However, the conformation shown in Fig. 12.4 is essentially inactive. Look carefully at the conformation using stereo. Notice a loop labeled the flap. This region of the polypeptide chain is blocking easy access to the active site, the catalytic triad. In other crystallographic studies, the flap takes a different steric position interacting with the C-terminal domain. This second conformation, with changes only in a relatively insignificant part of the enzyme, is believed to be the active form. The conformational changes are thought to be the result of binding to the interface. In a few other examples, another
154
Lipid Transport and Storage Proteins
Fig. 12.5 Lipid storage protein: bacteriochlorophyll protein. This stereodrawing was made using the PDB coordinates with the accession code 4BCL, a refined crystal structure (Fenna and Matthews, 1975). The C␣ model describes the overall conformation with three short segments not visible in the electron density map (residues 59–61, 213–216, and 170–172). Seven molecules of chlorophyll are contained within the molecule. Bacteriochlorophyll is a trimer but only one subunit is shown.
protein may be involved in the conformational change, with the second protein acting like a coenzyme.
LIPID TRANSPORT AND STORAGE PROTEINS A third category of lipid–protein systems is important in all biological systems: these are the proteins involved in lipid transport or storage. Members include both intra- and extracellular proteins. In higher organisms, typical examples found extracellularly are high- and low-density serum lipoproteins, serum albumin, and serum retinoid-binding proteins. The serum lipoproteins of all classes are heterogeneous with respect to lipid content and type. Because of this heterogeneity, conformational data have been difficult to obtain. However, the crystal structures of both serum albumin and a serum retinoidbinding protein have been completed and coordinates may be found in the PDB. The bacteriochlorophyll-containing protein, mentioned in the introduction, is shown in Fig. 12.5. This water-soluble protein is obtained from green photosynthetic bacteria, where it is believed to serve as an antenna for light energy to be used later in photochemical reactions. The early crystallographic studies could not be completed for lack of an amino acid sequence. This was commonplace in structural biology research before cDNA sequencing. Crystal structures were obtained, and models were built without side chains and described in the literature. Later, if amino acid sequence data became available, a complete model would be built and X-ray refinement completed. Incidentally, a number of attempts were made in this period to assign an ‘‘X-ray sequence’’ based on the shape and size of side-chain electron density. These sequences were unreliable but good first guesses. The flip side to the failure to identify side chains and amino acid sequences in electron density maps is the fact that many crystallographic studies identify sequencing errors. In 4BCL, residue 117 is identified in the X-ray study as a serine whereas the original determination of the amino acid sequence listed a glutamine at this position.
12. Lipid–Protein Interactions
155
Fig. 12.6 A lipid transport protein: liver fatty acid-binding protein. This stereodrawing describes the crystal structure of rat liver fatty acid-binding protein. The depiction contains all of the main-chain atoms with short lines describing the positions of mainchain hydrogen bonds. Both the N and C terminals are labeled. The structure contains two molecules of bound oleic acid labeled in the drawing as oleic acid 1 and 2. The protein belongs to a family called the iLBPs. All except the liver form bind a single molecule of lipid roughly in the position of oleic acid 2. The coordinates are derived from the PDB file with accession number 1LFO.
The bacteriochlorophyll protein is a trimer, but only a single subunit is shown in Fig. 12.5. As expected, the subunits of the trimer are related by a threefold rotation axis. Chlorophyll with a Mg2⫹ in the center of the porphyrin rings is basically a lipid molecule. Examining the structure in stereo will show that the protein component is barrellike, with the internal cavity formed by a series of antiparallel  strands, not all of which follow each other in the amino acid sequence. The lipid-like chlorophyll molecules pack together such that there are numerous lipid–lipid contacts in addition to interactions with the protein. The hydrophobic phytyl side chains (hydrocarbon) are in different conformations serving as packing aids. The chlorophyl ring systems are nearly parallel to each other. The -barrel motif visible in Fig. 12.5 is used on a smaller scale in a family of proteins known as the intracellular lipid-binding proteins or iLBPs. The iLBPs represent a highly homologous group found in the cytosol of a variety of cell types. They are responsible for shuttling lipid-like molecules such as fatty acids and retinoids around the cell to target organelles. They are relatively small monomeric proteins (approximately 130 amino acids) and many structures have been determined by both NMR and X-ray methods. An example of this family is shown in Fig. 12.6 (Thompson et al., 1997). One of the keys to these -barrel motifs is the interstrand hydrogen bonds. These are important polar interactions between the carbonyl oxygen of one strand and the hydrogen attached to the peptide nitrogen in another strand (single lines in Fig. 12.6). Showing the main-chain hydrogen bonds gives the C␣ model the appearance of fishnet. Aside from the residues in the turn regions, the main-chain network is largely continuous. Notice how the network is complete in spite of the twisting of the strands or staves of the barrel.
156
Problems
In this particular family, the  barrel has an internal cavity. The internal cavity is filled with water molecules in structures with and without lipid. For most of the family members, only one lipid-like ligand is bound. The liver iLBP shown in Fig. 12.6 is an exception. It binds 2 mols of fatty acid. If other family members may be used as a guide, the principal binding site would be the one labelled oleic acid 2 in Fig. 12.6. While one would expect the cavity to be lined with hydrophobic residues, this is not the case. It is in fact lined with both hydrophilic and hydrophobic side chains. Perhaps this is an important lesson in considering other lipid–protein systems. The lipid-binding cavity contains a number of water molecules even with the bound lipid and is lined with both hydrophobic and hydrophilic side chains. This is in contrast to the bacteriochlorophyllbinding protein, where the core contains the hydrophobic chlorophyll molecules (Figs. 12.5 and 12.6).
SUMMARY The interactions of proteins and lipids involve at least three major classes: (1) membrane proteins, (2) lipid-metabolizing enzymes, and (3) lipid storage and transport proteins. Electron microscopy and digital image reconstruction have been particularly useful in the study of membrane proteins. The nature of the secondary structure found in these various proteins includes both ␣ helices and  structure. Forms of antiparallel  barrels appear to be the most frequently found motif in both membrane and soluble lipid– protein systems. In a few instances, bound lipid molecules are highly ordered and visible in electron density maps. Although not discussed in detail, NMR methods have also been used to obtain structure and to study dynamic properties. The concept of lipid seeking only a hydrophobic environment is not always true. The use of electron diffraction and new methods for obtaining single crystals in the presence of detergents is leading to an increasing number of known structures from biological membranes.
PROBLEMS Practicing Stereovision 1. Use Fig. 12.3 to determine the handedness or twist of the OmpA  barrel. 2. In Fig. 12.4, the disulfide bonds are found in different sections of enzyme structure. Study the stereodrawing and decide whether any of the disulfide bridges involve the flap.
Using Computer Graphics 3. Obtain the X-ray coordinates for bacteriorhodopsin (PDB, 1BRX). Prepare a display that can be used to study the distribution of hydrophobic and hydrophilic side chains. What can be concluded about the distribution of polar versus nonpolar side chains on the outer surface of this membrane protein? 4. Obtain the coordinates of horse pancreatic lipase or any other disulfidecontaining protein. Display the protein as an C␣ model with the disulfide bonds. Think of the torsional angle around -S-S- bond. Survey a number of disulfide bonds in the lipase or any other protein and see if the torsional angle is favored in a right- or lefthanded configuration. 5. An extra-difficult problem: extract the coordinates of the catalytic residues from horse pancreatic lipase and bovine chymotrypsin. Catalytically important residues are usually described in the header of the PDB files. Using the graphics program of your
12. Lipid–Protein Interactions
157 choice, superimpose the two sets of active site side chains (S, H, D) and describe how well they overlap stereochemically.
REFERENCES Bourne, Y., Martinez, C., Kerfelec, B., Lombardo, D., Chapus, C., and Cambillau, C. (1994). ˚ resolution. J. Mol. Biol. 238, Horse pancreatic lipase. The crystal structure refined at 2.3 A 709–732. Carriere, F., Withers-Martinez, C., van Tilbeurgh, H., Roussel, A., Cambillau, C., and Verger, R. (1998). Structural basis for the substrate selectivity of pancreatic lipases and some related proteins. Biochim. Biophys. Acta 1376, 417–432. Deisenhofer, J., Epp, O., Miki, K., Huber, R., and Michel, H. (1985). Structure of the protein ˚ resolusubunits in the photosynthetic reaction centre of Rhodopseudomonas viridis at 3A tion. Nature (London) 318, 618–624. Fenna, R., and Matthews, B. (1975). Chlorophyll arrangement in a bacteriochlorophyll protein from Chlorobium limicola. Nature (London) 258, 573–577. Henderson, R., Baldwin, J., Ceska, T., Zemlin, F., Beckmann, E., and Downing, K. (1990). Model for the structure of bacteriorhodopsin based on high-resolution electron cryo-microscopy. J. Mol. Biol. 213, 899–920. Luecke, H., Richter, H., and Lanyi J. (1998). Proton transfer pathways in bacteriorhodopsin at 2.3 ˚ resolution. Science 280, 1934–1937. A Pautsch, A., and Schulz, G. (1998). Structure of the outer membrane protein A transmembrane domain. Nature Struct. Biol. 5, 1013–1017. Thompson, J., Winter, N., Terwey, D., Bratt, J., and Banaszak, L. (1997). The crystal structure of the liver fatty acid-binding protein. A complex with two bound oleates. J. Biol. Chem. 272, 7140–7150.
A P P E N D I X
1 Extra Reading in Structural Biology Abseher, R., Schreiber, H., and Steinhauser, O. (1996). The influence of a protein on water dynamics in its vicinity investigated by molecular dynamics simulation. Proteins 25, 366– 378. Berger, J., Gamblin, S., Harrison, S., and Wang, J. (1996). Structure and mechanism of DNA topoisomerase II. Nature (London) 380, 179. Braun, W. (1987). Distance geometry and related methods for protein structure determination from NMR data. Q. Rev. Biophys. 19, 115–157. Brodsky, B., and Shah, N. (1995). The triple-helix motif in proteins. FASEB J. 9, 1537–1546. Clarke, J., Henrick, K., and Fersht, A. R. (1995). Disulfide mutants of barnase. I. Changes in stability and structure assessed by biophysical methods and x-ray crystallography. J. Mol. Biol. 253, 493–504. Clarke, J., Hounslow, A., and Fersht, A. (1995). Disulfide mutants of barnase. II. Changes in structure and local stability identified by hydrogen exchange. J. Mol. Biol. 253, 505–513. Clarke, N. D. (1995). Covariation of residues in the homeodomain sequence family. Protein Sci. 4, 2269–2278. Clore, G., and Gronenborn, A. (1991). Two-, three-, and four-dimensional NMR methods for obtaining larger and more precise three-dimensional structures of proteins in solution. Annu. Rev. Biophys. Biophys. Chem. 20, 29–63. Creighton, T. (1988). Disulphide bonds and protein stability. Bioessays 8, 57–63. Fields, B., Goldbaum, F., Dallacqua, W., Malchiodi, E., Cauerhff, A., Schwarz, F., Ysern, X., Poljak, R., and Mariuzza, R. (1996). Hydrogen bonding and solvent structure in an antigen–antibody interface—crystal structures and thermodynamic characterization of three Fv mutants complexed with lysozyme. Biochemistry 35, 15494–15503. Fish, W., Reynolds, J., and Tanford, C. (1970). Gel chromatography of proteins in denaturing solvents. J. Biol. Chem. 245, 5166–5168. Foisner, R., and Wiche, G. (1987). Structure and hydrodynamic properties of plectin molecules. J. Mol. Biol. 198, 515–531. Garboczi, D., Ghosh, P., Utz, U., Fan, Q., Biddison, W., and Wiley, D. (1996). Structure of the complex between human T-cell receptor, viral peptide and HLA-a2. Nature (London) 384, 134–141. Gibrat, J., Madej, T., and Bryant, S. (1996). Surprising similarities in structure comparison. Curr. Opin. Struct. Biol. 6, 377–385. Gonzalez, L., Brown, R., Richardson, D., and Alber, T. (1996). Crystal structures of a single coiled-coil peptide in two oligomeric states reveal the basis for structural polymorphism. Nature Struct. Biol. 3, 1002–1010. Grigorieff, N., Ceska, T., Downing, K., Baldwin, J., and Henderson, R. (1996). Electron-crystallographic refinement of the structure of bacteriorhodopsin. J. Mol. Biol. 259, 393–421. [Review] Hagerman, P., and Amiri, K. (1996). Hammering away at RNA global structure. Curr. Opin. Struct. Biol. 6, 317–321. Hagerman, P., and Tinoco, I. (1996). Nucleic acids from sequence to structure to function— editorial overview. Curr. Opin. Struct. Biol. 6, 277–280.
158
1. Extra Reading in Structural Biology
159 Harvey, S., and Tan, R. (1992). Teaching macromolecular modeling. Biophys. J. 63, 1683–1688. Havel, T. (1991). An evaluation of computational strategies for use in the determination of protein structure from distance constraints obtained by nuclear magnetic resonance. Prog. Biophys. Mol. Biol. 56, 43–78. Hulmes, J., Miedel, M., Li, C., and Pan, Y. (1989). Primary structure of elephant growth hormone. Int. J. Peptide Protein Res. 33, 368–372. Kabsch, W., and Sander, C. (1983). Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22, 2577–2637. King, G. F. (1996). NMR spectroscopy and x-ray crystallography provide complementary information on the structure and dynamics of leucine zippers. Biophys. J. 71, 1152–1153. Kjeldgaard, M., Nyborg, J., and Clark, B. (1996). The GTP binding motif: Variations on a theme. FASEB J. 10, 1347–1368. Kuntz, I., Thomason, J., and Oshiro, C. (1989). Distance geometry. Methods Enzymol. 177, 159–205. Lerner, L., and Horita, D. (1993). Teaching high-resolution nuclear magnetic resonance to graduate students in biophysics. Biophys. J. 65, 2692–2697. Montal, M. (1996). Protein folds in channel structure. Curr. Opin. Struct. Biol. 6, 499–510. [Review] Murphy, J. (1996). Protein engineering and design for drug delivery. Curr. Opin. Struct. Biol. 6, 541–545. Murzin, A., Brenner, S., Hubbard, T., and Chothia, C. (1995). SCOP: A structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536–540. Ohlendorf, D. H. (1994). Accuracy of refined protein structures. II. Comparison of four independently refined models of human interleukin 1 . Acta Crystallegr. D50, 808–812. Qi, P., Beckman, R., and Wand, A. (1996). Solution structure of horse heart ferricytochrome c and detection of redox-related structural changes by high-resolution H-1 NMR. Biochemistry 35, 12275–12286. Reynolds, J., and Tanford, C. (1970). The gross conformation of protein-sodium dodecyl sulfate complexes. J. Biol. Chem. 245, 5161–5165. Rini, J. (1995). X-ray crystal-structures of animal lectins. Curr. Opin. Struct. Biol. 5, 617–621. Schellman, J. (1987). The thermodynamic stability of proteins. Annu. Rev. Biophys. Biophys. Chem. 16, 115–137. Skinner, M., and Terwilliger, T. (1996). Potential use of additivity of mutational effects in simplifying protein engineering. Proc. Natl. Acad. Sci. U.S.A. 93, 10753–10757. Stroud, R., and Fauman, E. (1995). Significance of structural changes in proteins: Expected errors in refined protein structures. Protein Sci. 4, 2392–2404. Turner, D. (1996). Thermodynamics of base pairing. Curr. Opin. Struct. Biol. 6, 299–304. Wuthrich, K. (1989). Protein structure determination in solution by nuclear magnetic resonance spectroscopy. Science 243, 45–50. Zhang, X., Wozniak, A., and Matthews, B. (1995). Protein flexibility and adaptability seen in 25 crystal forms of T4 lysozyme. J. Mol. Biol. 250, 527–552. Zhao, D., and Jardetzky, O. (1994). An assessment of the precision and accuracy of protein structures determined by NMR—dependence on distance errors. J. Mol. Biol. 239, 601–607.
A P P E N D I X
2 Macromolecular Structure Information Resources1
RESOURCES OF PRIMARY DATA PDB
Rutgers University http://www.rcsb.org/ NDB Nucleic Acid Database—Rutgers University (USA) http://ndbserver.rutgers.edu BMRB Protein, Peptide, and Nucleic Acid NMR Spectroscopy Database http://www.bmrb.wisc.edu
USEFUL COLLECTIONS OF DATA HSSP Homology-Derived Secondary Structure of Proteins http://www.embl-heidelberg.de/pub/databases/protein extras/hssp 3D ALI Sequence alignments of structurally superposed proteins http://www.embl-heidelberg.de/argos/ali/ali.html
SITES TO BROWSE AND SEARCH ON THE WORLD WIDE WEB MOOSE Macromolecular Structure Query of the PDB FTP archive http://www.sdsc.edu/moose Molecules R US Forms interface to search an index of the PDB http://molbio.info.nih.gov/cgi-bin/pdb PBDSummaries Summary information and images of each PDB entry http://www.biochem.ucl.ac.uk/bsm/pdbsum/index.html SCOP Structural classification of proteins http://scop.mrc-lmb.cam.ac.uk/scop CATH Protein structure classification—domains This list of WWW sites was tested and active in April 1999. It is an edited list taken from Trends Biochem. Sci. 21, 252–256, 1996. 1
160
2. Macromolecular Structure Information Resources
161
http://www.biochem.ucl.ac.uk/bsm/cath LPFC Library of Protein Family Cores http://camis.stanford.edu/projects/helix/LPFC SWISS-3DIMAGE High-quality pictures of biological macromolecules http://expasy.hcuge.ch/sw3d/sw3d-top.html Entrez Access now MMDB, a database of protein structures http://www.3.ncbi.nlm.nih.gov/Entrez Protein Motions A database of domain, loop, and subunit motions http://bioinfo.mbb.yale.edu/MolMovDB
STRUCTURE DATABASES WPDB Compressed and indexed PDB under MS Windows downloadable freeware http://www.sdsc.edu/CCMS/Packages/wpdb.html P/FDM Object-oriented database management, freeware but for any database http://www.csd.abdn.ac.uk/⬃pfdm ProLink Relational database under Sybase http://bmerc-www.bu.edu/plforms/forms-toc.html
OTHER DATABANKS/DATABASES BMCD Biological Macromolecule Crystallization Database http://ibm4.carb.nist.gov:4400/bmcd/bmcd.html SWISS-PROT Annotated Protein Sequence Databank http://expasy.hcuge.ch/sprot/sprot-top.html ATLAS Retrieval program to access sequences databases at MIPS http://www.mips.biochem.mpg.de GenBank Genetic Sequence Databank http://www.3.ncbi.nlm.nih.gov/Entrez ENZYME Enzyme Nomenclature Databank http://www.expasy.ch/enzyme/ Enzyme Structures Database Another enzyme database http://www.biochem.ucl.ac.uk/bsm/enzymes/index.html ProDom A database of protein or domain families http://www.sanger.ac.uk/Pfam/ PROSITE A databank of biological sites, patterns, and profiles http://expasy.hcuge.ch/sprot/prosite.html PRINTS Protein Motif Fingerprint Database http://www.biochem.ucl.ac.uk/bsm/dbbrowser/PRINTS/PRINTS.html MOTIFS Searching protein sequence motifs http://www.genome.ad.jp/SIT/MOTIF.html Structural Information Resources The compilation of resources of resources listed in this table http://www.ucmb.ulb.ac.be/StructResources
This Page Intentionally Left Blank
Index Adipocyte lipid-binding protein, temperature factors, 57–58 Alcohol dehydrogenase, domains, 91–93 Aldolase, activity in crystals, 55 Alpha helix, see Secondary structure Amino acid identification in stereo drawings, see Stereo drawings ionization of side chains in protein crystals versus solution, 53–54 one-letter codes, 32, 36 temperature factors of side chains, 57–58 Aspartate carbamoyltransferase, quaternary structure, 67 ATLAS, database access, 161 Azurin, conformational change analysis, 111
Bacteriochlorophyll protein, crystal structure, 154–155 Bacteriophage 434 repressor, DNA recognition, 132 Bacteriorhodopsin computer display, 156 crystal structure, 148–149 electron microscopy, 150 Beta sheet, see Secondary structure Bijvoet pair, analysis, 16 Biological Macromolecule Crystallization Database (BMCD), access, 161 BMCD, see Biological Macromolecule Crystallization Database BMRB database, access, 160 Bragg equation, resolution calculation, 11–12 Bridge, secondary structure recognition, 79
CAP, see Catabolite activator protein Carbonic anhydrase reaction mechanism, 144–145 zinc binding and function, 145–146 Catabolite activator protein (CAP), DNA recognition, 131 ␣-Chymotrypsin, activity in crystals, 55 Collagen displaying of fragments, 84 triple helix, 82–83 Computer display, protein structures, 39 Conformational states comparison of conformational states mathematical superimposition of coordinate sets least squares analysis, 101 principle, 100 root mean square differences, 102 transformation, 100–101 visual guidance, 102 software, 102 visual comparison of maps, 100 globin changes in crystals versus solution, 52–53 pancreatic lipase, 153–154 temporal resolution of changes, 110 X-ray crystallography analysis hemoglobin, comparison of oxygenated and deoxygenated states ␣ helices, 106 ␣ subunit, 105–107 biological significance, 102, 109 FG region movement, 107, 109 histidine F8 movement, 105–107, 109 mechanistic conclusions, 109 quaternary changes, 103–104 tertiary changes, 103–107
163
164
Index
Conformational states (continued) Y140 movement, 107 hexokinase, 110 lactate dehydrogenase, 109 overview, 99 quaternary changes, 110 Cro, DNA recognition, 131 Crystal, protein density, 51 diffusion of molecules, 51 heterogeneity regions of structure, 58–59 mother liquor, 51 packing effects, 50–51, 54–55 preparation hanging drop method, 8 membrane proteins, 148 testing for protein versus small molecule crystals Izit stain, 9 softness test, 8–9 variables, 8 proton exchange studies, 118–119 water content, 50–51 water networks, 123–124 X-ray diffraction, see X-ray crystallography Cytochrome b5, domains, 88–90 Cytochrome c function, 141, 144 heme-binding site, 143 redox potential, 143 structure, 141–143
Dihydrofolate reductase, nucleotide-binding domain, 98 DNA A-DNA structure, 129 B-DNA structure, 126–127, 129, 134–135 Dickerson dodecamer protein recognition, 129–130 structure, 127–129 fiber diffraction, 7, 126 hydrogen bonding, 126 twist, 129, 135 water interactions, 127 Z-DNA structure, 129 DNA-binding proteins conformational motifs in recognition helix–turn–helix, 131–132 leucine zipper, 132–134 overview, 130–131 zinc finger, 135, 140 recognition element symmetry, 130, 135 specificity, 130
Domain crystal structure delineation, 86 cytochrome b5 domains, 88–90 definition, 86 dehydrogenase domains and subdomains, 86, 90–93 deletion mutant analysis, 88 functional element, 87–88, 97 immunoglobulin G domains, 93–95 proteolytic analysis, 88
Electron density calculation, 13, 18 difference maps, 20 mapping, 19–21 metals, 137–138 Electron diffraction applications in protein structure, 7–8 membrane proteins, 149–150 three-dimensional imaging, 7, 149–150 Electron microscopy bacteriorhodopsin, 150 helical polymers, 64 Entrez, access, 161 Enzyme Structures Database, access, 161 ENZYME, database access, 161
Ferritin, iron binding, 140–141
GCN4, DNA recognition, 132–133 GenBank, database access, 161 Glyceraldehyde-3-phosphate dehydrogenase, quaternary structure, 67
Helical polymer examples, 63–64 imaging, 64 multistrand systems, 63–64 overview, 62–63 pitch, 63 Helix–turn–helix, DNA recognition motif, 131–132 Hemerythrin, oxygen binding, 144 Hemoglobin 222 point symmetry, 103 conformational change in solution versus crystals, 52–53 conformational states, comparison of oxygenated and deoxygenated states ␣ helices, 106
165
Index
␣ subunit, 105–107 biological significance, 102, 109 FG region movement, 107, 109 histidine F8 movement, 105–107, 109 mechanistic conclusions, 109 quaternary changes, 103–104 tertiary changes, 103–107 Y140 movement, 107 crystal structure determination, 1–2, 7, 11–12 dihedral symmetry, 66 oxygen binding site, 105 partial crystal coordinates deoxyhemoglobin, 112 oxyhemoglobin, 112–113 quaternary structure, 61, 103–104 radius of gyration calculation, 52 Hexokinase, crystallographic analysis of conformational changes, 110 Hydrogen bond bifurcation, 117 definition, 115–116 distance, 117 DNA, 126 donors/acceptors in proteins, 117, 121, 123 energy, 116–117
IgG, see Immunoglobulin G Immunoglobulin G (IgG), domains, 93–95 Ionization, side chains in protein crystals versus solution, 53–54
Lactate dehydrogenase, crystallographic analysis of conformational changes, 109 Lactose repressor, specificity, 130 Lambda repressor, DNA recognition, 131 Leucine zipper, DNA recognition motif, 132–134 Likely Quaternary Structure (LQS), website, 71–72 Lipase, see Pancreatic lipase Lipid metabolic enzyme features, 152–154 transport and storage proteins, 154–156 Liver fatty acid-binding protein, crystal structure, 155–156 LQS, see Likely Quaternary Structure
MAGE development, 41–42 downloading of program, 41
drawing lines and labels, 44 editing, 42 hard copy, 46 keywords, 48 kinemage color plate, 47–48 local rotations, 45 measurement of distances and angles, 45 parameters for keyword modification, 49 text, captions, and colors, 43 use with PREKIN, 42, 47 views, 43–44 Malate dehydrogenase domains, 86, 90–93 quaternary structure cytosolic enzyme, 69 Escherichia coli enzyme, 73–74 mitochondrial enzyme, 65–66 Membrane proteins crystallization, 148 difficulty of study, 148 structure bacteriorhodopsin, 148–150 OmpA, 150 Metal ions, protein-bound binding site elucidation in crystallography, 137 bond lengths, 139 coordination and ligands, 138–139 electron density maps, 137–138 functions catalysis, 144–146 conformation, 140 overview, 137, 139 oxidation-reduction intermediates, 141–144 oxygen binding, 144 transport and storage, 140–141 heavy atoms in crystal structure analysis, 1–2, 12, 14, 137 types of metals, 139 water association, 120, 124–125 Miller indices, reflection data, 10 MIR, see Multiple isomorphous replacement Molecular replacement, X-ray crystallography, 17–18 Molybdate-binding protein, structure and function, 141 MOTIFS, database access, 161 Multiple isomorphous replacement (MIR), X-ray crystallography, 1–2, 12, 14 Myoglobin conformational change in solution versus crystals, 52–53 crystal structure determination, 1–3, 7, 11
166
Index
Myoglobin (continued) display with PREKIN, 60 water assignment in crystal structure, 114
NDB, see Nucleic Acid Database Neutron crystallography applications, 7 buried water exchange studies, 118 difficulties, 117–118 hydrogen atom visualization, 114, 118 proton exchange studies, 118–119 scattering coefficients, 118 NMR, see Nuclear magnetic resonance NOE, see Nuclear Overhauser effect Nuclear magnetic resonance (NMR), protein structure determination chemical shifts, 22 complementary studies, 4 crystal structure comparison biological activity analysis, 54–55 chemical reactivity of side chains, 53–54 conformational changes, 52–53 radius of gyration calculation, 51–52 similarity, 3, 50 thioredoxin, 55–56 distance geometry analysis, 21, 23–24 history of development, 3, 22 3 J, torsional angle determination, 22–23 nuclear Overhauser effect classification, 22 distance effects, 22 origin, 22 peak volume, 22 nuclei and isotoic enrichment, 21–22 size limitations, 6 Nuclear Overhauser effect (NOE) classification, 22 distance effects, 22 origin, 22 peak volume, 22 Nucleic Acid Database (NDB), access, 160
OmpA, crystal structure, 150
Pancreatic lipase active site comparison with chymotrypsin, 156–157 computer display, 156–157 conformational states, 153–154 interfacial catalysis, 152
structure, 152–153 Patterson maps calculation, 14 difference map, 15 heavy atom site localization, 14–17 PDB, see Protein Data Bank P/FDM, database access, 161 Plastocyanin, copper binding and function, 144, 147 PREKIN control specifications, 42–43 development, 41–42 downloading of program, 41 text, captions, and colors, 43 use with MAGE, 42, 47 PRINTS, see Protein Motif Fingerprint Database ProDom, database access, 161 ProLink, database access, 161 PROSITE, database access, 161 Protein Data Bank (PDB) access, 5, 39–41, 160 ATOM records, 30 atom labels, 30, 32 contents, 25 coordinates and conversions, 29–30, 70–72 display programs, see MAGE; PREKIN file transfer protocol, 40–41 header records of files AUTHOR, 26 COMPND, 26 CONECT, 28 CRYST, 27 HEADER, 26 HELIX, 27 HET, 27 JRNL, 27 MTRIX, 28, 71 ORIGX, 27–28 REMARK, 26–27 SCALE, 28–29 SEQRES, 29 SHEET, 27 SITE, 27 SOURCE, 26 TER, 28–29 TURN, 27 identification codes for proteins, 39 publication guidelines, 25 record type, 26 size of files, 26 temperature factors, 26 World Wide Web pages, 41, 160
167
Index
Protein Motif Fingerprint Database (PRINTS), access, 161 Protein Motions, database access, 161
Quaternary structure, see Subunits
Refinement, X-ray crystallography, 12–13, 19, 115 Resolution, X-ray crystallography, 11–12, 21 R factor, X-ray crystallography, 19 Ribonuclease T1, water association, 124–125
Secondary structure ␣ helix amphipathic helix, 80 helix wheels, 80 hydrophobic helix, 80–81 structure, 78–81  strand, structure, 78–79, 81 bridges, 79 chirality, 79 definition, 75 Homology-Derived Secondary Structure of Proteins resource, 160 pattern recognition in crystal coordinates, 78–79 peptide bonds, cis versus trans, 76 prediction by statistical methods, 83 proton exchange influence, 119 rationale for study, 75 supersecondary structure, see Supersecondary structure torsional angles allowed versus disallowed, 77 ␣ helix, 78  strand, 78 definition, 76 triple helix of collagen, 82–83 turns, 82 water binding sites ␣ helix, 119  sheet, 119–120 Small angle neutron scattering, applications, 7 Small angle X-ray scattering applications, 7 radius of gyration calculation, 52 Stereo drawings amino acid identification asparagine versus aspartate, 38 glutamine versus glutamate, 38
histidine, 38 isoleucine, 38 methionine, 37 proline, 38 stereochemistry, 38 tryptophan, 38 valine versus threonine, 37–38 generation, 32 importance, 3–4, 32, 46 stick drawings, 32, 34 viewing techniques crossed eyes, 37–38 divergent eyes, 36–37 practice problems, 46, 59, 73, 84, 97–98, 110–111, 124, 135, 147, 156 stereoglasses, 34–35 Structure factor, calculation, 12–14, 18 Subunits biological implications of quaternary structure, 68 constant stereochemical sense, 61 contact types, 61–62 coordinate generation for other subunits from PDB files, 70–72 crystallographic analysis of quaternary structure conformational changes, 103–104, 110 enzymes aspartate carbamoyltransferase, 67 glyceraldehyde-3-phosphate dehydrogenase, 67 malate dehydrogenase cytosolic enzyme, 69 mitochondrial enzyme, 65–66 helical polymers examples, 63–64 imaging, 64 multistrand systems, 63–64 overview, 62–63 pitch, 63 interaction forces, 61, 64 Likely Quaternary Structure website, 71–72 sequence prediction of quaternary structure, 68 surface accessibility calculation, 69–70 symmetry cyclic symmetry, 64, 73 dihedral symmetry, 64, 66 elements, 64–65 local symmetry, 64 rotation axis, 65–66 32 symmetry, 67 virus coat proteins, 67–68, 73
168
Index
Supersecondary structure ␣ structure, 86, 95 crystal structure delineation, 87 definition, 86, 95, 97 handedness, 95–96 Surface accessibility, calculation for subunits, 69–70 SWISS-PROT, database access, 161 Symmetry asymmetry of ribosomes, 69 cyclic symmetry, 64, 73 dihedral symmetry, 64, 66 DNA-binding protein recognition elements, 130, 135 elements, 64–65 local symmetry, 64 rotation axis, 65–66 32 symmetry, 67 virus coat proteins, 67–68, 73
Temperature factor, X-ray crystallography, 18–19, 56–59 Thioredoxin crystal versus solution structure, 55–56 torsional angles, 77 Torsional angle allowed versus disallowed angles, 77 ␣ helix, 78  strand, 78 definition, 76 determination from 3J, 22–23
Water assignment in crystal structures, 114 binding sites ␣ helix, 119  sheet, 119–120 metals, 120, 124–125 catalysis role, 120 criteria for addition in refinement, 115 distribution of protein-bound water, 120–121, 123 DNA interactions, 127
ionic forms, 115 networks in crystalline proteins, 123–124 numbers associated with proteins, 124 ribonuclease T1 association, 124–125 WPDB, database access, 161
X-ray crystallography, see also Crystal, protein balsa wood models, 2 Bijvoet pair analysis, 16 complementary studies, 4 diffraction pattern, 9–10 electron density calculation, 13, 18 difference maps, 20 mapping, 19–21 metals, 137–138 helical polymers, 64 instrumentation, 9–10 Miller indices of reflection data, 10 molecular replacement, 17–18 multiple isomorphous replacement, 1–2, 12, 14 Patterson maps calculation, 14 difference map, 15 heavy atom site localization, 14–17 phase problem, 1, 12 refinement, 12–13, 19, 115 resolution, 11–12, 21 R factor, 19 solution structure comparison biological activity analysis, 54–55 chemical reactivity of side chains, 53–54 conformational changes, 52–53 radius of gyration calculation, 51–52 similarity, 3, 50 thioredoxin, 55–56 structure factor calculation, 12–14, 18 temperature factors, 18–19, 56–59 time requirements for data acquisition, 11 Xylose isomerase, manganese binding, 147
Zinc finger, DNA recognition, 135, 140