FOREWORD TO THE SECOND EDITION
The five years since the first edition of this remarkably useful book have been marked by a number of significant advances in protein crystallography. Synchroton radiation and cryocrystallography now routinely give data of much higher resolution and superior quality than the average datasets available in the past. The use of multiple wavelength anomalous dispersion and improved methods of locating heavy atoms has made high-quality experimentally determined phases much more common. Serious progress has been made on the direct phasing problem, to the extent that structures containing some metal atoms and data to better than 1.2 A resolution have a realistic chance of being solved with one dataset. Triclinic lysozyme, which contains 1000 atoms (all light atoms), has been solved ab initio by two methods. The increasing occurrence of atomic resolution diffraction data for protein molecules is changing the methods used for refinement. It is now becoming evident that the mathematical and physical model that describes the diffracting contents of a protein crystal is essentially the same as that which has stood the test of time for small molecules. The focus of this book, and one of the keys to its popularity, is the word "Practical" in the title. Duncan McRee has an excellent ability to strip a problem to its essentials and then cast those essentials into computer programs that can be used by people who are not expertsmwithout sacrificing the power and features required by people who are experts. His program XtalView, introduced with the first edition of this book, has rendered the arcane comprehensible. It has been demonstrated time and again that it is not sufficient for computer programs to be comprehensive and correct. If the ~ Xlll
xiv
FOREWORDTO THESECONDEDITION
task is complex, the user interface must be well designed or the program becomes a barrier instead of an aid. XtalView provides a clear, interactive, modular interface to the complex tasks of macromolecular crystallography. It thus gets the apparatus out of the way of the science and becomes a powerful tool both for the beginner learning the techniques and for the expert seeking higher productivity. The technical advances in macromolecular crystallography, particularly the advent of higher and higher resolution, improved phasing methods, and improved methods for highlighting problem areas of a structure, have required supporting changes in the software. In most cases these make the underlying science more discernable. XtalView has been enhanced to support atomic resolution structures, improve the phasing methods to include MAD (multiple anomalous dispersion) phasing, and greatly improve the ease with which models of new structures can be built. XtalView's comprehensive analysis tools have also proven useful beyond the crystallographic community. The program is increasingly popular with molecular modelers. XtalView, which is described in this book, is distributed through the Computational Center for Macromolecular Structure (CCMS) at the San Diego Supercomputer Center. Readers can obtain copies of the program through the CCMS web page at http'//www.sdsc.edu/CCMS. Support services are available by e-mail at
[email protected]. XtalView provides a modern user interface and runs on very inexpensive computer systems, including PC platforms running Linux. Because XtalView is free to academic users, the total cost of the crystallographic workstation is that of a personal computer with a reasonably powerful graphics card and a good-quality monitor. With XtalView, every postdoctoral and graduate student in a laboratory can have his or her own graphical workstation, with all of the corresponding increases in productivity. This point cannot be made too strongly. The productivity we are talking about here is the removal of barriers to the interactive exploration of ideas. I have great faith in the creativity of students. I consider it my great good fortune to be able to participate in the development and distribution of a tool that is capable of changing a whole field of science. Lynn F. Ten Eyck San Diego Supercomputer University of California, San Diego
PREFACETO THE SECOND EDITION
This book is a practical handbook for anyone who wants to solve a structure by protein crystallography. It should prove useful both to new protein crystallographers and to old hands. The topics covered in this book are well-tested, robust methods commonly used in our laboratory and in others. The well-informed crystallographer will note, however, that many techniques and methods are not mentioned or are mentioned very briefly. These have been omitted because of space, less common use, difficulty of application, the need for special expertise, or perhaps oversight on the author's part. The exclusion of a method should not be taken in any way as a disapproval; one book cannot possibly include everything. For the second edition, a new chapter on cryocrystallography, written by Peter R. David, has been added. Other topics that have been added or expanded are very high-resolution refinements, MAD phasing, a tutorial section on XtalView, and material on CCP4 software and mmCIF (macromolecular crystallographic information file), as well as minor changes throughout the book. When I wrote the first edition of this book in 1993, there were a few hundred entries in the Protein Databank; in 1999, the number is rapidly approaching ten thousand. I hope that the first edition had some small part in this explosion of protein structures solved by crystallography. Now we are looking forward to the day when the structures of most of the proteins in an entire genome will be solved. If you are reading this book, then you want to be part of this coming revolution in structural biology. XV
xvi
PREFACETO THESECONDEDITION
One exciting development is the dramatic drop in the cost of the computers needed for solving structures. An IBM PC running LINUX is more than adequate for all of the tasks of solving a protein structure and can be purchased for a minimal cost. Often an existing Windows computer can be converted to a dual boot Windows/LINUX machine and structures solved with freely available software. The book is divided into six chapters: (1) Laboratory Techniques, (2) Data Collection Techniques, (3) Computational Techniques, (4) XtalView Tutorial, (5) Protein Crystallography Cookbook, and (6)Cryocrystallography, by Peter David. There is no need to read the book in any order; chapters that use information from other chapters will reference them when necessary. In fact, the best use of the book may be to read the cookbook first and then refer to the other chapters as needed. Chapters 4 and 6 are new to the second edition, and there is new material in all of the chapters reflecting the advances since the first edition. In particular, freezing techniques, better detectors, and the wide availability of synchrotron beamlines have increased the resolution of the average protein structure considerably. To cover this new material on very high-resolution structure, refinement and analysis have been added. In Chapter 1, only cursory information on crystal growth is presented because several excellent texts on crystal growth already exist. A fair amount of attention is given to protein sample handling, because proper attention to the sample can often mean the difference between a successful protein and a failure. Proteins are delicate materials that demand special handling and are very difficult to purify in large quantities. Protein crystals are also very delicate and require special handling techniques different from those of small molecule crystals. Chapter 2 bridges the gap between the laboratory and the computer. Special emphasis is placed on the newer techniques using area detectors and synchrotron sources. Because of the high cost of these area detectors, the user will probably have access to only one. Experience has shown that the user will then come to regard the available detector as best and will defend it vehemently against all others. Emphasis is placed less on a specific system and more on general techniques relevant to all area detectors. Chapter 3 provides information about using computers and file systems and how theory translates into actual methods. Rules of thumb are provided throughout to serve as a guide. Like all rules, these are made to be broken and should not be taken too literally. The variety of software used by different groups is enormous and no book could hope to cover even a small portion of it. General information that should be applicable to most techniques is given. Although this book can never substitute for the individual manuals
PREFACETO THE SECONDEDITION
xvii
of each program, it does give guidelines that will allow the reader to make intelligent choices among the program options. To allow discussion of specifics, the XtalView system and CCP4 are used. Information on obtaining XtalView, CCP4, and other crystallographic software can be found at this book's web site at http://ppcII.scripps.edu. Chapter 4 is a tutorial for using XtalView to perform common crystallographic tasks and covers everything from making Patterson maps to automated map fitting. Chapter 5 contains examples drawn from the experience of the author and his colleagues, providing some examples of protein structures solved by various methods. These examples can be used as guides for the user's own projects and to give a feel for how to apply the varied methods. Real numbers are given as a basis for interpreting the user's own data. By following the examples in multiple isomorphous replacement, users can, with luck and perseverance, solve their own structures. For the second edition, new material on a MAD phasing example has been added. Chapter 6 covers the now standard method of collecting data on frozen crystals, including the apparatus needed and many practical pointers. Appendix A contains formulas commonly used in protein crystallography, but with a twist: the formulas are coded in both FORTRAN77 and C. These two languages easily account for 99% of all protein crystallographic software. This will be of great aid to users in writing their own software and in understanding other software. Also for those of us who understand a computer language better than we do math, this appendix explains the formulas. One goal of this book is to provide enough information for the computer neophyte to write a simple program to reformat or filter data. Unfortunately, because of the incredible variety of software available and the consequent large variety of file formats, this is a necessary skill. Appendix A provides information for writing programs that will continue to be useful with different operating systems and for other projects.
SuggestedReading Crystal Growth: McPherson, A. (1998). Crystallization of Biological Macromolecules. Cold SpringHarbor Laboratory Press, Cold SpringHarbor, New York. Crystallography: Stout, G. H., and Jensen, L. H. (1989). X-Ray Structure Determination: A Practical Guide, 2nd Ed. Wiley,New York. Ladd, M. S. B., and Palmer, R. A. (1985). Structure Determination by X-Ray Crystallography, 2nd Ed. Plenum,New York.
xviii
PREFACETO THE SECOND EDITION McKie, D., and McKie, C. (1986). Essentials of Crystallography. Blackwell Scientific, Oxford.
Protein Crystallography: Blundell, T. L., and Johnson, L. N. (1976). Protein Crystallography. Academic Press, San Diego. Wyckoff H., ed. (1985). Diffraction Methods for Biological Macromolecules, Methods in Enzymology, Vols. 114 and 115. Academic Press, San Diego. Carter, C. W., Jr., and Sweet, R. M., eds. (1985). Macromolecular Crystallography A and B, Methods in Enzymology, Vols. 276 and 277. Academic Press, San Diego. Drenth, J. (1994). Principles of Protein X-Ray Crystallography. Springer-Verlag, New York.
ACKNOWLEDGMENTS
My thanks go to a number of people for their assistance in preparing the manuscript for this book. In particular, I thank Emelyn Eldredge for giving me the opportunity and impetus to do a second edition. I thank David Stout, Yolaine Stout, Michele McTigue, and Pamela Williams for critically reading the manuscript and making many helpful suggestions. Any errors in the book are, of course, my responsibility. I also thank the large number of patient people in the Structural Biology Group at the Scripps Research Institute who have beta-tested XtalView for me. I hope this book will answer some of their questions. In particular, I thank David Goodin, John Tainer, Elizabeth Getzoff, Ian Wilson, Jack Johnson, Ethan Merritt, Chris Bruns, Cliff Mol, Jean-Luc Pellequer, Marieke Thayer, Yi Cao, Sheri Wilcox, Rabi Musah, Gerard Jensen, Melissa Fitzgerald, G. Sridhar Prasad, Brian Crane, Andy Arvai, John Irwin, Alex Shah, Gary Gippert, Arno Pahler, Nicole Kresge, Jacek Nowakowski, Robin Rosenfeld, John Blankenship, Nathalie Jourdan, Ward Smith, Paul Swepston, Phil Bourne, John Badger, and John Rose for their numerous comments and suggestions over the years. I thank Mark Israel and the folks at CCMS for several years of dedicated support of XtalView users while always remaining cheerful. I especially thank Mike Pique for keeping me up to date on programming and computer graphics. I have worked with many people over the years who have provided the intellectual stimulus, knowledge, and help that made this book possible. David Richardson was my Ph.D. advisor and started me on the right path. Jane Richardson has been a major inspiration over the years. Wayne Hendrickson took me under his wing and taught me much about anomalous xix
xx
ACKNOWLEDGMENTS
scattering, phasing, refinement, scaling, and critical thinking. Bi-Cheng Wang and Bill Furey taught me all about phase modification; Fred Brooks taught me the virtues of a good user interface and the true powers of computers. John Tainer and Elizabeth Getzoff completed my education in protein crystallography while I worked as a postdoctorate in their lab. Lynn Ten Eyck taught me all about the mathematics of crystallography and provided many good ideas. George Sheldrick has been an inspiration to keep programming despite the academic consequences, and his encouragement has been invaluable. XtalView distribution through the Center for Macromolecular Structure (www.sdsc.edu/CCMS) at the San Diego Supercomputer Center is funded by Grants BIR 93-31436 and 96-16114 from the National Science Foundation. Finally, I thank my wife, Janice Yuwiler, for her dedication and support to her sometimes trying husband, and her father, Art Yuwiler, for always believing in me. And last, but not least, I want to thank my three children, Alisa, Kevin, and Alex (one more than the first edition!) for putting up with me while I spent so many evenings late at work. Peter R. David, the author of Chapter 6, thanks Duncan McRee, E1speth Garman for critically reading the chapter, Roger Kornberg and Michael Levitt for their support and encouragement on difficult problems, Kerstin Leuther for many thingsmespecially for asking w h y m a n d finally, Mike Blum for starting him out on crystallography.
LABORATORY TECHNIQUES
.....
1.1
.....
PREPARING PROTEIN SAMPLES A protein sample must be properly prepared before it can be used in a crystal growth experiment. There are many ways that this can be done to accomplish the same goal: to put the protein at a high concentration in a defined buffer solution. Methods of preparing a protein sample that have worked well in our laboratory are outlined here, but if you know of a quicker, easier method then by all means use it.
History and Purification Since it is not uncommon for one batch of protein to crystallize while the next will not, it is vital to keep a history of each sample and to track each batch separately. Your records may provide the only clue to the differences between samples that produce good crystals and samples that are unusable. For example, there have been several cases where the presence of a trace metal is needed for crystallization. The most famous case is insulin. It seemed that the only insulin that would crystallize was that purified from material collected in a galvanized bucket. It was eventually discovered that zinc was required, and later it was added directly to the crystallization mix. In our lab
2
LABORATORYTECHNIQUES
any sample received is logged into a notebook with a copy of any letters or material sent with the sample. Samples should be shipped to you on dry ice and kept frozen at - 70 ~C until they are ready for use. Ask the person sending the sample to aliquot the protein into several tubes and to quick-freeze each one. This way you can thaw one aliquot at a time without having to repeatedly freeze-thaw the entire sample, which can damage many proteins. Keep a small portion of the sample apart and save it for future comparison with samples that do not crystallize or crystallize differently. Always keep protein samples at 4~ in an ice bucket to prevent denaturation and to retard bacterial growth. Perform all sample manipulations at 4~ either in a cold room or on ice. When the samples are finally set up they can be brought to room temperature. Proteins are usually stabilized by the presence of the precipitants used in crystallization, and agents are added to retard microbial and fungal growth. A common antimicrobial agent is 0.02% sodium azide. Other broad-spectrum cocktails sold for use with tissue culture are quite effective.
Exchanging Buffers If the desired buffer of the sample is not already known, the sample should be placed in a weak buffer near neutrality. A good choice is 50 mM Tris-HC1 at pH 7.5 with 0.02% sodium azide. Some proteins will not be stable at low ionic strength, so a small portion should be tried first to see if a precipitate forms. Be sure to wait several hours before deciding if the sample is stable. Observe the sample in a clear glass vial and hold it near a bright light to detect any cloudiness in the sample. There are two methods for exchanging the buffer solution of the protein sample: dialysis and use of a desalting column. The desalting column is the fastest method, and if disposable desalting columns such as a PD-10 column from Pharmacia-LKB is used, it is very convenient. A single pass through the column will remove 8 5 - 9 0 % of the original salt in the sample and, if this is not enough, two passes can be used. Every time the sample is passed over the column it will be diluted about 50%. Unless the sample is very concentrated to begin with, it may be necessary to concentrate the sample after desalting. Dialysis on small volumes is best carried out in finger-shaped dialysis membrane or with a Microcon (Fig. 1.1). Always soak the membrane first in the buffer to remove the storage solution. A minimum of two changes of dialysis buffer 12 h apart is recommended. The dialysis buffer should be stirred and should be at least 100 times the sample volume. Use membranes with a molecular weight cutoff less than half the sample molecular weight;
3
1.1 PreparingProtein Samples
Dialysis m e m b r a n e
/ Flask
solution
.
FIG. 1.1 Dialyzingsamples to exchange buffers. otherwise you risk losing a substantial portion of the sample. When removing the sample, wash the inside of the membrane with a small amount of buffer to recover the sample completely.
ConcentratingSamples First measure the starting concentration of the sample. The simplest way is to measure the absorbance of a 50-times dilution of the sample at 280-nm wavelength and assume the concentration is simply the absorbance times 50. While this method is not very accurate, it is reproducible, and the sample should be pure enough to warrant the assumption that all absorption is due to the protein. 1 Always use buffer as a blank and check the buffer versus distilled water to be sure it does not have significant absorption. Some buffers have a significant amount of absorption at 280 nm, which can greatly reduce the accuracy of different absorbance measurements. The diluted absorbance must be below 1.0 or the measurement will be inaccurate. Concentrate the sample to 1For further information on determining protein concentration, see Scopes, Robert K., and Canter, Charles R., eds. (1994). Protein Purification, Principles and Practice, 3rd Ed. Springer Verlag, New York.
4
LABORATORYTECHNIQUES
1 0 - 2 0 mg/ml. If you have enough sample, it is better to concentrate to 30 mg/ml, wash the concentrator with one-half the sample volume, and then add the wash to the sample to make a final concentration of 20 mg/ml. A Centricon is one of the best ways to concentrate the sample. An Amicon will also work well. Another method is to dialyse against polyethylene glycol 20,000 (PEG-20K) using a finger-shaped dialysis membrane. The advantages of this method are that it can be combined with dialysis and that the same membrane used to dialyze the sample can be transferred directly to the PEG-20K for concentrating. The dialysis tube can be put directly onto solid PEG-20K. The water in the sample will be quickly removed, so check the sample often. However, be aware that PEG is often contaminated with salts and/or metals and this may or may not be desirable. In many cases, though, such contamination has actually contributed to crystallization. You may want to keep a sample of the particular batch of PEG you use. If, in the future, a new batch causes problems, you can analyze the differences. Another method often used is precipitation with ammonium sulfate. If an ammonium sulfate step has already been used in the purification procedure, this may be an easy way to achieve a high concentration. You will want to use a high level of ammonium sulfate to ensure that the entire sample is precipitated. The ammonium sulfate should be added slowly to the solution while kept cold. Let the solution sit for at least 30 min after all ammonium sulfate has dissolved. Spin down the precipitate and remove the supernatant. The pellet can be redissolved in a small amount of buffer. However, since the pellet will contain some salt, a dialysis step will be needed before the protein is ready. It is not uncommon for a protein to precipitate at high concentrations. If this happens while you are concentrating, add buffer back slowly until all the sample has dissolved. Raising the level of salt by using a more concentrated buffer or by adding sodium chloride can often help stabilize protein solutions. If a precipitate forms, examine it carefully to make sure it is not crystalline. Amorphous precipitates are cloudy and have a matte appearance. Crystalline precipitants are often shiny and if from a colored protein are brightly colored with little cloudy appearance. Two proteins have been crystallized in our lab accidentally during concentration. One was found in an ammonium sulfate precipitation step and the other during concentration on an Amicon to lower the salt concentration.
Storage of Samples The entire sample will not be used all at once, and the remaining protein solution should be aliquoted and stored frozen at - 7 0 ~C. Divide the sample into 100- to 200-/~1 aliquots in freezer-proof tubes (not glass, which
1.1 PreparingProteinSamples
5
FIG. 1.2 Storageof samples. The procedure involves (1) aliquoting samples into several tubes, then (2) quick-freezingeach sample in an acetone-dry ice bath and storing at - 70~C. becomes brittle and shatters at low temperatures) and quick-freeze the tubes in an acetone-dry ice or liquid nitrogen bath (Fig. 1.2). Label each tube with the date, a code to identify the sample and the particular batch of the sample, and your initials. Cover the label with transparent tape to prevent the ink from rubbing off when you handle the frozen tubes later. Place the tube in a cardboard box and store in the freezer. It will harm protein samples to be freeze-thawed; although often they may withstand several cycles of freezethawing, it is best not to find out the hard way. Thaw the samples in an ice bucket or the cold room when they are to be used. If some sample is left over and it will be used the next day, it can usually be stored at 4~ overnight.
Ultrapurification While it is beyond the scope of this handbook to cover purification techniques, the crystallographer has one special technique that is usually not tried by others to further improve the sample: recrystallization (Fig. 1.3). We will assume that you have succeeded in finding conditions that will grow small crystals but are having trouble growing larger ones. It may be worthwhile to recrystallize the sample to improve the purification. A large sample
6
LABORATORYTECHNIQUES
FIG. 1.3 Redissolvingcrystals. of the protein can be set up in the crystallization mixture and seeded with a crushed crystal. After crystals have grown, you may wish to add slightly more salt to push more protein into the crystalline state. Gently centrifuge down the crystals, or allow them to settle by gravity, and remove the supernatant. Resuspend the crystals in a mother liquor higher in precipitant by about 10% to avoid redissolving and to wash them, and then remove the supernatant. Resuspend the crystals in distilled water to dissolve them. If you have a large amount of precipitate present with the crystals, this method will not remove the precipitate unless it settles the crystals. In these cases, the crystals can be resuspended in 2 ml of artificial mother liquor in a petri dish, then picked up individual crystals with a capillary and manually separated them from the precipitate. For the crystals to redissolve well, they should be freshly grown. Old crystals that still diffract well often will not redissolve even in distilled water because the surface of the crystal has become cross-linked. This is especially true of crystals grown from polyethylene glycol.
.....
1.2 . . . . .
P R O T E I N CRYSTAL G R O W T H
Several excellent texts have been published on methods for growing protein crystals (see Suggested Reading in the preface) and I will not repeat
1.2 ProteinCrystalGrowth
7
this material here except briefly, to add some of our own experience. Like fine wine, protein crystals are best grown in a temperature-controlled environment. M o s t cold rooms have a defrost cycle that makes them especially poor places to grow crystals. Investing in an air conditioner for a small r o o m to keep it a few degrees colder than the rest of the laboratory is the best way to keep a large area at a constant temperature for crystal growth. To ensure that the r o o m is tightly regulated, get a unit with a capacity larger than needed. Another alternative is to use a temperature-controlled incubator. However, a r o o m is best because you will need to examine your setups periodically at a microscope. In a r o o m everything can be kept at the same temperature. Invest in a good dissecting stereomicroscope and remove the lightbulb in the base. Substitute a fiber-optic light source so that the base does not heat up and dissolve your crystals as you observe them. Even with the fiber-optic source be careful not to put the setup down near the fiber-optic light source, which gets hot during operation. Have a Plexiglas base built over the dissecting scope base (Fig. 1.4) to provide a large surface on which to place setups so that they do not fall off the edges during examination. This will also provide a base to steady your hands during delicate mounting procedures.
Plexiglas cover Fiber-optic light source
Z
FIG. 1.4 Modifieddissecting scope. A Plexiglas base is put over the scope to make a larger area, providing a place to rest your hands during mounting operations and prevents tipping hangingdrop plates over the edge. A fiber-optic light source is used instead of the built-in light to prevent the base from heating and damaging crystals.
8
LABORATORYTECHNIQUES
Protein Solubility Grid Screen Before embarking on trials of a protein, we routinely screen it for solubility with a grid screen invented by Enrico Stura. 2 The grid consists of a number of common precipitates and a wide range of precipitant concentrations as well as a wide range of pH values (Fig. 1.5). To make the grid screen, we make up a 100 ml of stock solution of the highest concentration of precipitant in row D in the buffer listed and store the solutions in lighttight bottles (wrap with aluminum foil or keep in a dark place). Make up a 4 • 6 Linbro plate 3 with 1 ml in each well of the solutions as shown in Fig 1.5, diluting the stock solution with buffer as appropriate toward the top of the grid. The top edges of the grid are liberally coated with vacuum grease to make a seal with a 22-mm siliconized circular glass coverslip to be added in the next step. Make a hanging drop of protein solution over each well by mixing 5 #1 of protein solution ( 1 0 - 2 0 mg/ml) with 5/~1 of well solution in the center of a coverslip and then quickly invert the coverslip with forceps and place the coverslip over the well and press into the grease seal. Make sure the slip is sealed completely with grease by looking for air gaps in the seal. After preparing all 24 wells, place the tray in a dark place with a constant temperature. We use large incubators 4 set at 17-22 ~ C for crystal trials in our lab, although a well-air-conditioned room can also be used. Check the plates for precipitation soon after setting up and the next day, by observing the drop on the coverslip with a dissecting microscope. What you are looking for as you scan down a row are rows where relatively clear drops turn cloudy with precipitate as you go up the row. The midpoint of this transition is where you want to start crystal trials. You want to avoid precipitants where every drop is fully precipitated--these precipitants may specifically interact with the protein. You also want to avoid precipitants that never precipitate. If you are lucky you may find a well with crystals. Armed with this solubility information, you can make intelligent choices of precipitant concentrations to start with. Unstable proteins will form precipitant in every condition regardless of precipitant or concentration. In this case, you need to find conditions to stabilize the protein for at least 24 h or you stand little chance of growing crystals. Possible stabilizers are lower-temperature metal ions, cofactors, non-ionic detergents, glycerol, and ligands. If a number 2Stura, E. A., Satterthwait, A. C., Cairo, J. C., Kaslow, D. C., and Wilson, I. A. (1994). Reverse screening. Acta Crystallogr. D50, 4 4 8 - 4 5 5 . 3These plates and other crystallization supplies as well as excellent material on crystallization can be obtained from Hampton Research, Irvine, California, http://www.hampton research.com. Another source is Emerald Biosciences. 4These incubators need to be of the type that can both cool and heat to keep a constant temperature so close to room temperature. Heat-only incubators designed for bacterial growth at 37 ~ C are not suitable.
9
1.2 ProteinCrystalGrowth Precipitant
Buffer
1
2
3
4
PEG 600
PEG 4K
PEG 10K
(NH4) 2 SO 4 PO 4
Citrate
15%
10%
7.5%
0.75M
0.8M
0.75M
24%
15%
12.5%
1.0M
1.32M
1.0M
33%
20%
17.5%
1.5M
1.6M
1.2M
42%
25%
22.5%
2.0M
2.0M
1.5M
0.2M imidazole malate pH 5.5
0.2M imidazole malate pH 7.0
0.2M imidazole malate pH 8.5
0.15M sodium citrate pH 5.5
Nail 2 PO 4 K2HPO4 pH 7.0
FIG 1.5
5
6
10mM
sodium borate, pH 8.5
Enrico Stura's grid screen.
of possible variants of a protein are candidates for crystal trials, such as different species or mutant constructs, the grid screen can be used to select the best candidates for further trials. Look for proteins that exhibit sharp transitions from clear to cloudy drops. We have used the g r i d t o screen for solubility of a membrane-attached protein that was engineered to be soluble by screening various constructs to test hypotheses about how the protein was attached to the membrane. In our initial attempts all the drops were cloudy, and after many rounds of mutagenesis we found a mutant that showed a clear transition from soluble to precipitate in high salts. Prior constructs precipitated within 24 h in all the conditions and never produced crystals. The soluble mutant eventually produced large crystals that we subsequently used to solve the structure.
Initial Trials In all aspects of protein crystallography except initial crystal trials, the more past experience you have the better. Beginner's luck is definitely a factor in finding conditions for crystallizing a protein the first time. This is partly because beginners are more willing to try new conditions and will often do naive things to the sample, thus finding novel conditions for crystal growth. This is also because no one can predict the proper conditions for crystallizing a new protein. There are conditions that are more successful than others, but to use these exclusively means that you will never grow crystals of proteins that are not amenable to these conditions. So fiddle away to your heart's content. What is needed is to observe carefully what does happen to your
10
LABORATORYTECHNIQUES
sample under different conditions and to note carefully the results. The least experienced part-time student can outperform the most expensive crystallization robot because he or she has far more powerful sensing faculties and reasoning abilities. Leave the "shotgun" setups to the robots. Having said all this, I present in Table 1.1 a recipe to use for initial trials. The most commonly used methods for initial crystal trials are the hanging-drop and sitting-drop (Fig. 1.6) vapor-diffusion methods. The batch method can actually save much protein if done properly. In the hanging-drop method many different drops are set up. Most of these will never crystallize. It is hoped that just the right conditions will be hit upon in a few of the drops. A method that I have used successfully for many years is to place a small amount of protein in a 1/4-dram shell vial with a tightly fitting lid (caplug). Small aliquots of precipitant are added slowly. After each addition, the shell vial is tapped to mix the samples, then held up to a bright light. When the protein reaches its precipitation point, it will start to scatter light as the proteins form large aggregates that cause a faint "opalescence." Slowly make additions to the sample, waiting several minutes before each addition to avoid overshooting the correct conditions. If two vials are used, they can be leapfrogged so that when one reaches saturation, the other will be just below. For example, set up the first vial with 20 #I of protein plus 2 #1 of precipitant and the second with 20 btl of protein plus 4/_tl of precipitant. Then add 4/zl to the first and then 4/.tl to the second, and so forth, so that one of the vials is always 2/.tl ahead of the other. When opalescence is achieved in one vial, put both away overnight to be observed the next day. If the precipitation point is overshot, a small amount of water may be added to clear the precipitate
TABLE 1.1 Conditions for Initial Trials
Precipitant
Concentration range
Additives
Polyethylene glycol 4000
10-40% w/v
0.1 M Tris, pH 7.5
Polyethylene glycol 8000
10-30% w/v
0.2 M Ammonium acetate
Ammonium sulfate, pH 7.0
5 0 - 8 0 % saturation
Ammonium sulfate, pH 5.5
5 0 - 8 0 % saturation
Potassium phosphate, pH 7.5
0.5- 2.5 M
2-Methyl-2,4-pentanediol Low ionic strength sodium citrate
15-60% Dialysis" 0.5-2.5 M
50-200 mM Potassium phosphate, pH 7.8
FIG. 1.6 Crystal setup using ACA plates. A cross section through a single well is shown. (A) The lips of the wells are greased where the coverslips will later be placed. High-vacuum grease can be used alone or mixed with about 20% silicone oil. The addition of oil makes the grease less viscous so that it flows more easily. (B) The lower coverslip is pressed into place. Make sure there are no gaps in the grease for air to leak through. (C) Place the reservoir solution in the well. (D) Put the protein solution onto the lower coverslip. (E) Carefully layer the precipitant (often some of the reservoir solution) onto the protein. (F) Mix the two layers together quickly by drawing up and down with an Eppendorf pipette. (G) Put on the upper coverslip to seal the well completely. Again check that there are no gaps in the grease. Wait several hours to several weeks for crystals to appear.
12
LABORATORYTECHNIQUES
and the precipitation point can be approached again more slowly. This method is less wasteful because it allows a finer searching of conditions in just two shell vials, substituting for the large number of hanging drops needed to do as fine a scan. It also encourages more careful observation of the samples. Finally some proteins do not fare well during the evaporation that occurs in hanging drops. It is impossible and impractical to systematically scan every possible precipitant that has been used for growing protein crystals. Therefore, another approach is an incomplete factorial experiment, s A small subset of all possible conditions is scanned in a limited number of experiments by combining a subset of solutions. These drops are scanned for crystals or promising precipitates. If anything is found, then a finer scan can be done to find better growth conditions. A particularly successful version of this method was developed by Jancarik and Kim6 and has been optimized to 50 conditions combining a large number of precipitants and conditions. A kit is available from Hampton Research that contains all 50 solutions premixed, so all one has to do is set up 3-5/.tl of protein sample with each of the solutions. This method recommends that you first dialyze the protein against distilled water to allow better control over pH and other conditions. Try this on a small sample first. Many proteins will not tolerate distilled water and will precipitate (or sometimes crystallize). Use as low a concentration of buffer as you can. Phosphate buffers will give phosphate crystals in several of the drops that contain divalent cations. We have used this method with some success. While you may not get usable crystals on the first trial, you may get some good leads. Some of the drops may stay clear for a couple of weeks. You can raise the precipitant concentration in the drop by adding saturated ammonium sulfate to the reservoir (but not to the drop). This will cause the drop to dry up somewhat. More ammonium sulfate can be added until the drop either precipitates or crystallizes. Another crystallization method not often tried approaches the crystallization point from the other end by first precipitating the protein and then slowly adding water until the critical point is reached. Often microcrystals are formed when the protein is precipitated. As the precipitant is lowered, protein is redissolved and crystals large enough to see may grow out of the precipitate using these microcrystals as growth centers. Also, if the excess solution is removed from the precipitant, the result is a high concentration of the protein, which may force crystals. This can be done on a micro basis using a variation of the hanging drop. For example, to use this method with SCarter, C. W., Jr., and Carter, C. W. (1979). Protein crystallization using incomplete factorial experiments.J. Biol. Chem. 254, 12,219-12,223. 6Jancarik, J., and Kim, S.-H. (1991). Sparse matrix sampling: A screening method for crystallization of proteins. J. Appl. Crystallogr. 24,409-411.
1.2 ProteinCrystalGrowth
13
ammonium sulfate, mix the drop with 10/.tl of protein sample and 3 ~1 of saturated ammonium sulfate and set this over a reservoir of saturated ammonium sulfate. The drop should dry slowly and the protein precipitate, which will give a final concentration about three times higher than at the start. Every few hours add some water to the well to lower the ammonium sulfate concentration. Keep careful track of the amount added. If the drop starts to clear, slow the addition down to once a day and add water very slowly. I have grown a large number of crystals using this method. Although they are rarely suitable for diffraction, they can be used as seeds to grow better crystals. This method allows searching a large number of conditions with a small amount of sample. Never give up on a setup unless it is completely dried; it may take several months for crystals to appear. Proteins that are not stable in buffer are often stabilized by high precipitant concentrations. Also, the presence of precipitant in a setup does not preclude crystallization. Often a crystal will grow from the precipitant. Nucleation is a rare event and may require a very long time to occur if you are near, but not right on, the correct crystallization conditions.
Growth of X-Ray Quality Crystals The elation that you experience following the appearance of the first crystals of a new protein can be short-lived. It is often discovered that the first crystals grown are of insufficient quality to use for data collection. A long series of experiments may be needed before large, single crystals can be obtained. The first step to try is a very fine scan of conditions nearest those used initially to find the optimal conditions for growing only a few large crystals in a single setup. Vary the precipitant concentration, the protein concentration, the pH, and the temperature. You may also want to try varying the buffer used and its concentration. Using different types of setup will vary the equilibration rate, which can often lead to improved growth. What are needed are conditions where nucleation is rare and crystal growth is not too rapid. Do not look at your setups too frequently~once a day is e n o u g h ~ since disturbing them can result in the formation of extra nuclei. Leave finescan plates alone for a week before disturbing. Since nucleation is a stochastic process, preparing a large number of identical setups will often yield a few drops that produce nice crystals by chance. This is most useful only if you have sample to waste. If nucleation is unreliable, then seeding is often the answer. Two methods of seeding are used: microseeding and macroseeding. In microseeding small seeds obtained by crushing or those usually present in a large number
14
LABORATORYTECHNIQUES
in old setups are introduced into a fresh drop of preequilibrated protein. Seeds will usually grow in conditions where nucleation will not occur. An extreme example of this is photoactive yellow protein, where seeds will grow in ammonium sulfate solutions at 71% of saturation but nucleation will not occur at concentrations less than about 100%. The microseeds are diluted until only a few will be introduced. This usually requires serial dilutions and can be very difficult to control. Another method is to place a very small amount of solution from a drop in which a crystal has been crushed at one point of a fresh drop without any mixing. A mass of crystals will grow at this point but often a few seeds will diffuse to another part of the drop where a large crystal may develop. The author has found that 30-100 #1 sitting drops are good for this technique. The steps involved in microseeding are illustrated in Fig. 1.7A. The first step in microseeding is to establish the proper growth conditions. Drops with precipitating agent of increasing concentration are set up and preequilibrated overnight. Then a crystal is crushed with a needle so that the entire drop will fill with microscopic seeds. A whisker or eyelash glued to a rod is then dragged through the solution to pick up a small amount of the liquid containing microcrystals. The whisker is then streaked or dipped into the preequilibrated drops. After several hours or days, crystals should grow in the drops with sufficiently high precipitant concentration. To prevent unwanted nucleation, it is desirable to use the lowest concentration that will sustain crystal growth. When proper growth conditions are established, several drops are then preequilibrated to this concentration. A crystal is then crushed as before and a few microliters of the mother liquor in the drop is then pipetted into the first of a series of test tubes with stabilizing solution and mixed well. These are then serially diluted about 10- to 20-fold so that each successive tube contains fewer microcrystals. A few microliters of each tube is then put into the preequilibrated growth drops and after several days examined for growth (Fig. 1.7B). Each drop should contain progressively fewer crystals. The goal is to find a dilution that will provide just a few crystals per drop. If the microcrystals are stable enough, it may be possible to seed many drops from this same tube to grow many large crystals. In macroseeding a single seed is washed and placed in a fresh, preequilibrated drop (Fig. 1.8). The seeds need to be well washed in 2 ml of artificial mother liquor in a plastic petri dish. The dish is gently swirled to dilute any microseeds. The seed is then transferred with a minimum amount of solution to another dish with a precipitant concentration (found by experiment)in which crystals slowly dissolve. This produces a fresh growth surface on the seed and dissolves any microcrystals. The crystal is then transferred with a small amount of solution and placed in a fresh setup. Microseeds can be
1.2 ProteinCrystalGrowth
15
A. Find correct concentration
L
1. Crush crystal
2. Wet whisker increasing concenFation
FIG. 1.7 Microseeding techniques.
broken off by mechanical disturbances. Because protein crystals are soft and fragile, a gentle technique is necessary for this method to work. Let the crystal fall in to the fresh drop and settle of its own accord. Do not disturb the crystal after placing it in the growth drop. Often the less-dense dissolving solution will layer on top of the drop and mix only slowly, allowing any microseeds to dissolve. Several problems can be found with this technique
LABORATORYTECHNIQUES
16
1,.
3. I
2.
----
1
4.
!
6.
,/,
@ s. i
7.
8.
FIG. 1.8 Macroseeding method of growing larger crystals. First, two solutions are prepared in small petri dishes, a storage solution (usually a few percent higher in precipitant than the growth concentration) in which the crystals are stable for a long time and an etching solution in which the crystals will slowly dissolve over several minutes (usually a few percent in precipitant lower than the growth concentration). About 2 ml of each is needed, and the dishes should be kept covered to prevent evaporation. (1) Using a thin capillary, draw up several small but well-formed seed crystals. (2) Transfer these crystals into the petri dish with the storage solution. (3) Gently rock and swirl the storage solution petri dish to disperse the seeds throughout the dish. This dilutes microseeds and separates the crystals from each other and from any precipitate that might have been transferred with them. (4) Pick up a single seed from the storage solution and transfer it to the etching solution, bringing with it as little of the storage solution as possible. (5) While observing the crystal through a microscope, let it sit and occasionally rock the dish gently. (6) The corners of the crystal should start to round and the faces may etch, leaving scars and pits. (7) Pick up the crystal with as little solution as possible and gently transfer it to a fresh drop of protein preequilibrated to the growth conditions overnight. Often the crystal will fall out of the transfer capillary of its own accord so that no solution need be added to the drop. (8) Over several hours or days the crystal should grow larger.
1.3 CrystalStorageand Handling
17
and it will not work in every case. Sometimes the crystal may be so fragile that a trail of microcrystals is left in its wake. Often the seed will not grow uniformly; instead, spikes form on the seed surface. Other times the new growth will not align perfectly with the seed, causing a split diffraction pattern. In this case try using a smaller seed that will not contribute substantially to the overall scattering. Often the dislocation between the old and new crystal can be seen. It may be possible to expose only a region of the crystal away from the old seed. The seed should be freshly grown and well formed. If imperfect seeds are used, then you will only grow larger imperfect crystals. Often several generations of seeding will be needed to produce single crystals. Multiply twinned crystals can be crushed and fragments macroseeded until single crystals are obtained. Better crystals can sometimes be obtained by further purifying the protein sample to make it more homogeneous. Isoelectric focusing is especially useful. If you have a large amount of sample (>100 mg), then you can use preparative isoelectric focusing. Smaller amounts of sample (< 20 mg) can be chromatofocused on Pharmacia-LKB XX media. This method has proved successful in several cases. Be aware, though, that both these methods introduce amphylytes into the solution that can be difficult to remove. The last method for improving crystal quality, and often the best, is simply to look for more conducive conditions. For example, Chromatium vinosum cytochrome c' can be crystallized easily from ammonium sulfate, but these crystals are always so highly twinned that they are unusable, even for preliminary characterizations. By searching for new conditions, it was found that PEG-4K at pH 7.5 produces usable crystals that can be grown to large dimensions and will diffract to high resolution. This is very common for proteins. If they crystallize under one condition, chances are they will crystallize under another condition in a different space group and in a different habit. If there were only one condition out of all possible ones, I doubt that very many proteins would ever crystallize.
.....
1.3 . . . . .
CRYSTAL STORAGE A N D H A N D L I N G
Protein crystals can be stored for a few years and still diffract. Some precautions will help increase lifetime. Some proteins can be simply left in the drops in which they grew. Others, though, will grow small unaligned projections on their surfaces if kept in the original drop. These need to be transferred to an artificial mother liquor for storage. The artificial mother liquor must be found by experimentation. Usually, raising the precipitant a
18
LABORATORYTECHNIQUES
few percent is all that is needed. Do not use mother liquor with precipitant at the growth level. The protein in the crystal is in equilibrium with the protein in solution, and if mother liquor at the growth conditions without protein is substituted, the crystal will partially or wholly dissolve to reestablish equilibrium. Higher precipitant concentrations drive the equilibrium toward the crystal. For the same reason, do not store the crystal in a volume larger than necessary. Too high precipitant concentration will result in cracked crystals because the change in osmotic pressure will cause them to shrink. Change the reservoirs in vapor-equilibrium setups to prevent drying. Observe the crystal in the artificial mother liquor for several days before committing more crystals to it. Ideally, a crystal in a new artificial mother liquor should be examined by X-ray to confirm that no damage to the diffraction pattern has occurred. Keep the crystals in the dark. Light causes free radical chain reactions in the solution which will cross-link and eventually destroy the crystals. This is especially true of polyethylene glycol. Commercial PEG contains an antioxidant to retard polymerization caused by light that will slow, but not completely prevent, oxidation of PEG solutions. Solid PEG and PEG solutions must be stored in the dark at all times.
.....
1.4 . . . . .
CRYSTAL SOAKING
To solve the phase problem, the most common method used is multiple isomorphous replacement. In this method one or more heavy atoms are introduced into the structure with the most minimal change to the original structure that is possible. This gives phasing information by the pattern of intensity changes. A heavy atom must be used to produce changes large enough to be reliably measured. Only minimal changes, or isomorphism, are necessary because the primary assumption of the phasing equations is that the soaked crystal's diffraction pattern is equal to the unsoaked crystal's diffraction pattern plus the heavy atoms alone. For more details, see later chapters and the suggested readings. Heavy atoms or substrates that are to be introduced into protein crystals are usually soaked in an artificial mother liquor containing the reagent of choice. The compound is prepared in an artificial mother liquor solution at about 10 times the desired final concentration, and then one-tenth of the total volume is layered onto the drop containing the target protein crystal. Diffusion occurs within several hours to saturate the crystal completely. With some heavy atoms, secondary reactions often occur that can take several
1.4 CrystalSoaking
19
days. Heavy atom compounds are usually introduced at 0.1- to 1.0-mM concentration. A typical protein in a 10-~tl drop requires roughly micromolar concentrations for equimolar ratios. Many compounds will not dissolve well in the crystillization solution. In these cases it may be beneficial to place small crystals of the compound directly in the drop. Also, many heavy-atom compounds will take several hours to hydrate. If they do not completely dissolve at first, be patient. Gentle heating may speed dissolution. Soak time is more difficult to determine. If the soaking drop has several crystals, they can be mounted at different time intervals. Some heavy-atom reagents that are highly reactive may destroy the crystals and yet be useful if soaked for a short time. Other heavy-atom compounds undergo slow reactions that may produce a new compound that will bind. For instance, a platinum compound in an ammonium sulfate solution will eventually replace all of its ligands with ammonia. As a crystallographer you are not as concerned with exactly what binds to your protein as long as something heavy binds at a few sites in an isomorphous manner. A good way to check that something is binding is to place the crystal in a capillary so that when you invert the capillary the crystal will slowly settle. When heavy atoms bind to the protein, they will increase its density and cause it to settle faster. Similarly, if an artificial mother liquor can be sufficiently concentrated so that it has a slightly higher density than the protein crystals, the crystals will float. When a sufficient number of heavy atoms bind, the crystal will sink. This property can be used as a way to screen a large number of solutions quickly and has the advantage that small crystals can be observed underneath a microscope. The change in osmotic pressure of the increased density mother liquor may cause the unit cell of the crystal to shrink and, if so, any changes found in the diffraction pattern may be due to this effect rather than heavy-atom binding. In any case, it is a simple matter to resoak a fresh crystal in the usual mother liquor. What heavy atoms should you try? Table 1.2 presents a partial list of heavy-atom compounds in the order that I usually try them. (Whenever you meet another crystallographer, first you swap crystal-growing tales and then you always ask what heavy-atom compounds he or she has had particular success with.) Because most of the heavy-atom compounds are extremely toxic, extreme caution is in order. Some people experience respiratory distress and allergic reactions when exposed to these compounds. Therefore, always wear suitable (not latex) gloves and work i n a well-ventilated area. For reproducibility it is best to use fresh solutions; old solutions may oxidize and/or dismutate with time. Solutions must be kept in the dark and preferably under argon. The bottles of reagents themselves should be stored in a well-ventilated area with the caps sealed with Parafilm.
20
LABORATORYTECHNIQUES TABLE 1.2 Useful Heavy-AtomReagents and Conditions
Reagent
Conditions
Platinum tetrachloride Mercuric acetate Ethyl mercury thiosalicylate Iridium hexachloride
1 mM, 24 h
1 mM, 2-3 1 mM, 2-3 1 mM, 2-3 100 raM, 2-3
Gadolinium sulfate Samarium acetate Gold chloride Uranyl acetate Mercury chloride Ethyl mercury chloride
.....
days days days days
100 mM, 2-3 days 0.1 mM, 1-2 days 1 mM, 2-3 days 1 mM, 2-3 days 1 raM, 2-3 days
1.5
ANAEROBIC
..... CRYSTALS
Many proteins lose activity if exposed to air and so they must be grown anaerobically. In other cases the protein must be kept anaerobic in order to reduce it to its active conformation. The easiest method is to use an anaerobic hood. Solutions are passed in and out an airlock and crystals can be set up and handled with conventional techniques. For large-scale work this is by far the best method, but not all of us have access to an anaerobic hood. Another method uses a glove bag. This is a plastic bag with gloves that you can put your hands in to manipulate samples. Everything that you are going to use must be inside the bag before you seal it. This can present some logistical problems. A very simple anaerobic apparatus invented by Art Robbins consists simply of a capillary filled with degassed solution into which you float a crystal (Fig. 1.9). One end of the capillary is sealed by melting and the other is sealed with a layer of diffusion pump oil. Dithionite crystals can be dropped into the oil layer, through which they will float into the lower liquid. Any residual oxygen will be destroyed by the dithionite, and the oil layer prevents the entry of new oxygen. In our laboratory Cu-Zn superoxide dismutase crystals have been reduced in this manner and have stayed reduced for over
1.5 AnaerobicCrystals
21
FIG. 1.9 Simple anaerobic apparatus. Degassed mother liquor is placed in capillary and then a crystal is introduced. The crystal should be large enough so that it will wedge itself in the tapered portion of the capillary as it sinks. Mineral oil is then layered over the mother liquor to form a seal. The top few millimeters of the mother liquor, which have been exposed to air, can be drawn off with a capillary inserted through the oil layer. Solid reductant, such as dithionite, is then placed on top of the oil and allowed to sink through the oil into the mother liquor. Overnight the dithionite will diffuse to the crystal and reduce it. Excess oxygen is destroyed by the dithionite. The data can then be collected by mounting the capillary on a goniometer head as is normally done. Crystals reduced in this manner have remained oxygen-free for over a year.
a year. T h e d a t a are c o l l e c t e d by m o u n t i n g t h e c a p i l l a r y d i r e c t l y o n a g o n i o m e t e r h e a d . T h e size of t h e c a p i l l a r y in this case is c h o s e n so t h a t the c r y s t a l will w e d g e p a r t w a y d o w n . T h e e x t r a s o l v e n t d e c r e a s e s t h e d i f f r a c t i o n d u e to a b s o r p t i o n , b u t it w a s still p o s s i b l e t o get a 2 . 0 - A d a t a set u s i n g a l a r g e crystal.
......
2
......
DATA COLLECTION TECHNIQUES
.....
2.1 . . . . .
PREPARING CRYSTALS FOR DATA COLLECTION Protein crystals must be kept wet or they will disorder. Since solvent forms a large portion of the crystal lattice, a large change from the crystallization conditions will cause the crystals either to dehydrate and crack or to melt. For room temperature work, crystals are usually mounted in thinwalled glass capillaries. 1 The thin glass wall minimizes absorption of the scattered X-rays and also minimizes background from the glass. For protein work use the glass capillaries. Quartz capillaries are stronger, but the quartz scatters strongly around 3-~i resolution in a sharper band where the glass scattering is diffuse. Solvent contributes to background, which is always bad, and so as much solvent as possible must be removed without letting the crystal dry up. Alternately, you may want to use the newer cryocrystallography techniques covered in Chapter 6. However, freezing can cause changes in the unit cell that make them nonisomorphous, and a few proteins are not amenable to freezing. In these cases it may be necessary to use the methods in this chapter. 1Available from Charles Supper Company. 23
24
DATACOLLECTIONTECHNIQUES
Crystal-MountingSupplies Before mounting a crystal, make sure you have all the supplies you need at hand: 9 Capillaries. Thin-walled capillaries are needed in a variety of sizes. You will want to have a large supply of suckers previously made to pick from. 9 Tweezers. Two pairs of tweezers are needed: a pair with straight ends and a curved pair for prying up coverslips. 9 Scissors. A sharp pair of surgical scissors is needed for cutting capillaries and another pair for cutting filter paper in thin strips small enough to fit inside capillaries. Do not cut paper with the pair meant for cutting glass or they will quickly dull, and once dull they shatter rather than cut the glass capillaries. 9 Capillary sealant. Dental wax and other types of low-temperature wax are the traditional means of sealing capillaries. Recently, 5-min epoxy has become popular. The epoxy requires no heat, sets quickly, and forms an immediate vapor barrier even before it sets. The handiest kind is clear, comes in a dual-barreled syringe and is quite fluid before it hardens. Avoid types that are thicker and more like clay before hardening. 9 Plasticine. This is also known as nonhardening modeling clay and is available in toy stores. It is very useful for sticking capillaries to goniometers and for holding them in position while mounting crystals. When warmed by rolling with the fingers, Plasticine can be wrapped around thin capillaries without breaking them. An alternative is to use pins that are sold for use with goniometers by Supper and Huber. The glass capillary is inserted into a hole in the pin and held in place with wax or epoxy. 9 Filter paper strips. Cut Whatman #1 filter paper into strips thin enough to fit into capillaries for drying. The ideal strip is about 50 mm long, tapering from about 1.5 mm at one end to a fine point. The strips tend to curl when cut and can be straightened by gently curving in the opposite direction with fingers. Thin paper points originally meant for dental work are available from Hampton Research. As these come, they are too short to reach into capillaries. Mount the fine size into the end of a #18 syringe needle and then they will reach the crystal in most capillaries.
Mounting Crystals There are several ways to mount a crystal in a capillary but they all accomplish the same goal. The method used almost exclusively in our lab is as follows. A capillary at least twice the width of a crystal is used. It is shortened by breaking with a pair of sharp tweezers so that it is about 4 cm
2.1 PreparingCrystalsfor Data Collection
25
long. If it is not shortened it will be too long to fit onto most X-ray cameras. The broken end is sealed with either melted dental wax or 5-min epoxy. The large funnel-shaped end is left open; the crystal will be placed in the capillary through this end. A ring of wax or epoxy is placed where the funnel end narrows to make a place where the capillary can be cut later. Without the ring, the capillary may shatter completely. A small ball of Plasticine is warmed in the fingers and gently wrapped around the capillary to serve as a mounting base. Then the capillary is put aside; use the Plasticine to stick it in a handy spot where it can be reached later. The crystal to be mounted is selected and another capillary that will fit inside the first into which the crystal can be sucked is readied (Fig. 2.1). We use a piece of rubber tubing that fits over the capillary at one end with the other end going into a mouthpiece. It takes a little practice to get the knack of the sucking operation. The liquid will tend to stick at first because of surface tension, and then it comes all in a rush, requiring a little back pressure. Instead of mouth pressure, a syringe can be used to suck up the liquid. It is harder to control the syringe, however, and it takes one of your too-few hands. For toxic solutions, such as heavy-atom soaks, always use a syringe. Before the crystal is sucked out of the drop, a small amount of reservoir solution is sucked up and placed in the bottom of the previously prepared capillary (Fig. 2.2). Often a thin piece of filter paper is then pushed down the capillary to hold the reservoir liquid and to prevent it from moving. The crystal is then sucked up into the transfer capillary (Fig. 2.3). This frequently requires blowing liquid gently back and forth over the crystal to free it from the surface it grew on. More stubborn cases can be removed by very gently inserting the sharp point of a surgical blade between the crystal and the surface to pry it loose. Once the crystal is freely floating, it can be sucked up. The sucker is removed from solution and then a little air space is drawn in. This helps prevent the liquid from being drawn out by capillary action at the wrong time, wedging your crystal between the sucker and the mounting capillary. The sucker is then guided into the capillary, while observing through a dissecting microscope. The crystal is then gently expelled into the capillary and the sucker quickly removed. Some mother liquors are easier to handle than others and some will insist on sweeping the crystal between the sucker and the capillary wall, catastrophically crushing the crystal. This result can be avoided by first placing a band of reservoir liquid in the capillary into which the end of the sucker is inserted; then the crystal can be gently blown out. This leaves a large amount of solution to be removed later. In general, you want to blow out the crystal with as little solution as possible. The next step is to remove the solution around the crystal (Fig. 2.4). A very thin, fine capillary about 0.1-0.2 mm in diameter works the best for removing large amounts of liquid. Use one small enough that the crystal will not fit
26
DATA COLLECTIONTECHNIQUES
I
I
I
J
FIG. 2.1 Making crystal transfer pipettes. The transfer pipettes or suckers used for mounting crystals are made from 200-/~1 capillaries or any thin-walled piece of glass about 1/8 in. in diameter. The capillaries are useful because they come with tubing and mouthpieces. The glass is easily softened by holding over a Bunsen burner flame while turning slowly. When the glass softens, it is removed from the flame and the ends pulled apart. Hold it still for a moment to cool and then put it aside. Then draw out several more. Each drawn-out capillary is cut in the middle to form two pieces. The cut capillary is then bent by holding briefly over a flame until the end droops. With this technique you should be able to produce a number of different sizes. The ends are usually tapered and the proper bore can be obtained by cutting them off at the appropriate length.
inside. Start r e m o v i n g the liquid at the edges first. M a n y liquids with a high surface t e n s i o n c a n n o t be fully r e m o v e d this w a y a n d r e q u i r e f u r t h e r r e m o v a l w i t h a strip of thin filter p a p e r . This can be w o r k e d up n e x t to the crystal, w h e r e it will slowly a b s o r b all the free liquid. Leave a small a m o u n t of liquid b e t w e e n the crystal a n d the capillary to h o l d it in place. T h e capillary is t h e n sealed w i t h either d e n t a l w a x or e p o x y . E p o x y has the a d v a n t a g e of being
2.1 PreparingCrystalsfor Data Collection
27
FIG. 2.2 Preparing X-ray capillary for mounting. If at any time during mounting you want to temporarily seal the capillary, you can simply plug it with a softened piece of Plasticine. This gives you an opportunity to find something you forgot or to take a break. Sometimes it is necessary to allow viscous liquids time to bead up again before you can fully remove them.
cool; there is some danger with dental wax that heating will hurt the crystal. This can be minimized by laying a strip of wet tissue on the capillary over the crystal before the melted wax is applied. Another common mounting technique is to fill the capillary with liquid and float the crystal down into it. The crystal can be picked up in a minipipette and placed into the tube held vertically. Or the crystal can be sucked directly up into the capillary along with the mother liquor. Both methods require a large amount of liquid to be removed before the crystal is ready. However, some liquids are very difficult to dry completely as they tend to stick to the glass, and an excessive amount of time and effort may be required
28
DATACOLLECTIONTECHNIQUES
FIG.2.3 Crystalmounting.
to dry the capillary. These methods are easier and may also be gentler on the crystal. A disadvantage of the method is the necessity of making an artifiical mother liquor with which to fill the capillary. This artificial mother liquor sometimes damages the crystal when it is transferred. The gentlest technique of all is to grow the crystal in the capillary and then remove the growth so-
2.1 PreparingCrystalsfor Data Collection
29
FIG. 2.4 Dryingcrystals. (A) Remove large amounts of liquid by drawing up into a drawn-out pipette by capillary action or by sucking. (B) Final drying is done with a thin piece of filter paper. (C) The crystal should have a small amount of liquid to keep it wet and to help it adhere to the capillary walls.
lution for data collection. This has been necessary for some protein crystals with very high solvent contents.
Drying Crystals H o w dry does the crystal need to be? This depends upon the particular protein, and therefore requires experimentation. If some crystals are left too wet, they will dissolve slowly. If too dry, some crystals may crack. If a lowtemperature apparatus is to be used, then temperature gradients may cause the liquid to distill around the capillary, either cracking or dissolving the crystal. To prevent such damage, use a short capillary with as little free liquid as possible. A piece of filter paper may be used to wick solution around the capillary and reequilibrate it. Polyethylene glycol solutions are amazingly tenacious, and a layer that slowly beads up around the crystal will remain bound to the glass, destroying your careful drying work. One remedy for this is to place the unsealed capillary in a sandwich box with a reservoir of crystallization liquid to keep the crystal wet while you wait about half an hour for the liquid residue in the capillary to draw up around the crystal and rewet it. This time when you remove the liquid the crystal will stay dry and you can seal it. The sandwich
30
DATACOLLECTIONTECHNIQUES
box is also handy to have to give yourself a break if you think that the crystal is getting too dry. You can reequilibrate the crystal in the box before continuing.
Preventing Crystal Slippage Crystals are held in place by the surface tension of the thin film of liquid between the crystal and the capillary wall. In most cases this is adequate, but sometimes the crystals will slip slowly or suddenly. Some methods of data collection are gentler than others and the crystal is less likely to slip. Any slippage of a crystal during data collection is a problem. The crystal can leave the center of the X-ray beam or it can rotate, changing the pattern of the diffraction. There are several ways to avoid this situation. First, dry the crystal thoroughly; excess liquid around the crystal encourages slipping. (See the preceding remark on viscous liquids and drying crystals.) The slippage may be due to excess liquid that builds up around the crystal after data collection has begun. For instance, if you use a low-temperature device there may be a temperature gradient along the capillary that causes water to distill from one end of the capillary to the other. This changes the vapor equilibration point at your crystal and can cause it to get wetter. In the worst cases a bead of liquid may form above the crystal and slip down onto it and dissolve it. To avoid this, keep the capillary as short as possible and put a wicking material such as filter paper in the capillary to encourage reequilibration of the liquid. Second, mechanically holding the crystal in place with fibers may be used. This should be a last resort, as material used to hold the crystal in place will add to the background scattering. Pipe-cleaner fibers have been found to be useful for this purpose. A third method is to glue the crystal in place with a glue that dries in a thin film over the surface of the crystal and cements it into place. The glue and the method used is described by Rayment. 2 Finally, freezing the crystals as described in Chapter 6 will prevent crystal slippage. Also consider the shape of the capillary relative to the surface of the crystal you are mounting. If the crystal has a flat face, then mounting inside a large-diameter capillary will provide a better contact between the crystal and the glass (Fig. 2.5). Conversely, a small capillary may be better suited to a crystal with many facets that presents a more curved surface. In fact, by floating crystals down capillaries filled with liquid, it is possible in extreme cases of slippage to wedge the crystals into the capillary where the glass tapers. 2Rayment, I. (1985). In Methods in Enzymology, Vol. 114, pp. 136-140. Academic Press, San Diego.
2.2 Optical Alignment
31
FIG. 2.5 Choose the capillary size to fit the shape of the crystal.
.....
2.2 . . . . .
OPTICAL ALIGNMENT T h e n e x t steps will be m a d e e a s i e r if t h e c r y s t a l is first a l i g n e d o p t i c a l l y . T h i s is a c c o m p l i s h e d u s i n g a s p e c i a l g o n i o m e t e r s t a n d c a l l e d a n o p t i c a l a n a l y z e r a n d t h e d i s s e c t i n g s c o p e . T h e c r y s t a l in t h e c a p i l l a r y is p l a c e d o n a g o n i o m e t e r h e a d t h a t is, in t u r n , m o u n t e d o n t h e o p t i c a l a n a l y z e r . T h e first o p e r a t i o n is t o find t h e c e n t e r of r o t a t i o n of t h e a n a l y z e r w i t h r e s p e c t t o t h e m i c r o s c o p e r e t i c u l e s (Fig. 2.6). R o t a t e t h e a n a l y z e r to 0 ~ a n d n o t e t h e
FIG. 2.6 Steps in entering a crystal on a camera using the crosshairs as guidelines. The view through the microscope is shown in three steps A, B, and C. The rotation axis is horizontal and the direct beam passes through the crystal vertically. The microscope crosshairs in this example are not perfectly aligned with the rotation axis of the camera. (A) View at 0 ~ (B) View at 180 ~ The translation on the goniometer is moved so that the crystal is halfway between the two positions observed at 0 ~ and 180 ~ (C) The final position after correction: the crystal will now be in the identical position at both 0 ~ and 180 ~ Note that the crosshairs do not go through the center of the crystal. The center is not defined by the crosshairs but by the center of rotation. If the crosshairs and the center do not coincide, the crosshairs should be adjusted to facilitate future alignments. Never assume the crosshairs are centered unless it has been done recently, since high-power microscopes can become misaligned easily, especially if they are frequently moved.
32
DATACOLLECTIONTECHNIQUES
position of the crystal, then rotate it to 180 ~ and note the position. The center is the midpoint between these two positions. The crystal can be translated by using the slide on the goniometer head to move it to the midpoint. Another check of 0 ~ and 180 ~ is usually needed to fine-tune the centering. Repeat these steps for 90 ~ and 270 ~ Note that the center is defined by the range of motion as the axis is rotated and not by any particular point in the microscope. The most c o m m o n source of frustration in alignment is to assume that the crosshairs on a piece of equipment correspond to the center of rotation. Never make this assumption: the center is that position at which the crystal does not move when rotated. If the crystal has a definable axis, you may want to align it with the rotation axis. This is done by comparing the views at 0 ~ and 180 ~ and adjusting the arcs on the goniometer until the crystal axis in both views is in the same position. This is then repeated for 90 ~ and 270 ~. The third alignment that needs to be made is the position of the crystal faces relative to 0 ~ In general, crystal axes either are perpendicular to a face or they pass through an edge (Fig. 2.7). If the goniometer head provides a z-rotation,
1
~
o
.-~,
',-m
~
~
"~
~
o.-
~
~-~
~_
~-~
,~
O
,.
~
~
o --~
0
..~
~
~
~
O
,-,
~
"-"
q=
.-~
~
~-~
X
"a
~,~ ~_~
9' < -
o'~
~
~
~.~
,,~
~,..~
.~
o "'~
_~
.m" "~ ~)
_~
~.~ ~ T o
..~
9 ~,.m,--~,.~ ro~ ~.~.-'-'
""~
"
.,~'~ ~
[..T,
.
'~
~
~
10 9 i
8 ~
A 6
\
4" 3 2" 1 0 ~. 10
"~ ~~-~ I 12
15
2O
25
30
40
50
60
2f/ FIG. 2.17
Resolution versus 2~9 for CuK~ (1.54-A) source.
FIG. 2.18 Precession photograph (15~ This precession photograph is of E. coli endonuclease III hOl zone. Note the m m symmetry with a mirror on each axis. Also note that every odd spot is missing along the two major axes. The other two zones both show the same symmetry and systematic absence, identifying the space group as P2,2~2~. Photo courtesy of Drs. Chefu Kuo and John Tainer, Scripps Research Institute.
2.4 PreliminaryCharacterization
49
FIG. 2.19 Rotation photograph. A 2 ~rotation photo of aconitase taken at Stanford Synchrotron Radiation Laboratory on beamline 7.1. The rotation axis is about the horizontal. The horizontal band across the center of the film is due to a piece of Mylar that supports the beam stop. (Photo courtesy of Dr. C. David Stout, Scripps Research Institute.)
film perpendicular to the rotation axis and at the m a x i m u m diffraction angle. The m a x i m u m rotation angle can be estimated with the formula
] ArotatiOnmax- [tan-~(ceHmin edge]3
- spot width.
For example, suppose that we want to collect a 2-A data set on an orthorhombic crystal with unit cell dimensions a = 100, b = 75, c = 50 A and, given the optics, the spot width is 0.3 ~. The crystal is m o u n t e d so that a, the longest axis, is along the rotation axis. W h e n we are rotating with c in the plane of the film, the m a x i m u m permissible rotation is limited by b, so
50
DATACOLLECTIONTECHNIQUES
rotation axis
FIG. 2.20 Rotation geometry projected down rotation axis. The volumes swept out by two successive rotation photographs are marked 1 and 2.
that t a n - 1(2/75) - 0.3 = 1.2 ~ and w h e n c is limiting, the m a x i m u m rotation angle is 2.0 ~ The m a x i m u m angle is thus 1.2 ~ To collect a full data set we need 90 + 20 .... . W i t h CuK~ radiation (1.54 A), 2~9 is 45 ~ at 2 fl~. So, to collect a full data set w i t h o u t overlaps, we need 135/1.2 or 113 films. Some saving can be m a d e by using 2 ~ for the part of the data set where c is limiting and switching to 1.2 w h e n b is limiting.
Blind Region N e a r the rotation axis there is a region of reciprocal space that cannot be collected because the Lorentz correction 9 is very large. The Lorentz correction accounts for the a m o u n t of time that a reflection spends in diffracting 9
Blundell and Johnson, pp. 319-320.
2.4 PreliminaryCharacterization
51
conditions. Near the rotation axis this time gets to be very large, and at the limit a reflection directly on the rotation axis is always diffracting. To fill in the data, it is necessary to rotate about another axis in order to sweep out the data near the rotation axis. If you have the crystal mounted on a goniostat, this may mean simply moving chi or phi, otherwise the crystal will have to be remounted. In the absence of symmetry, the crystal will have to be rotated by at least 2~)max, and with mirror symmetry the angle will be #max-Another strategy is to rotate with the crystal mounted such that the nearest axis is 20o-30 ~ off the rotation axis. If you are using autoindexing this is the ideal solution, as it maximizes the amount of unique data. Offsetting the axes of the crystals from the rotation axis is always desired for maximizing unique data, but in the past crystals had to be nearly perfectly aligned or the software being used could not index, and therefore could not integrate, the film. For instance, consider the film in Fig. 2.19. It is set so that the mirror symmetry in the vertical direction is along the vertical. This makes for a pretty film but it also means that a partial on the right half has a corresponding mate on the left. If the crystal was rotated so that the mirror was off the vertical by a few degrees, the corresponding mirror-related reflections would be such that when one was partial the other would likely be whole. It may be desirable to have mirror symmetry when collecting Bijvoet pairs in order to use an anomalous scattering signal. In this case, the mirror-related reflections may be Bijvoets, and thus collecting them both at the same time with the same geometry may maximize the accuracy of the small difference between pairs. An excellent set of three articles on rotation photography can be found in M e t h o d s in E n z y m o l o g y , including use with large cells, synchrotron radiation, and integration of intensities. 1~
White-Radiati0nLaue Photography In Laue photography, the crystal is held still during the exposure, and multiple wavelengths, or white radiation, are used rather than monochromatic radiation. This gives many more diffraction spots at once than with monochromatic radiation, and recently, with the availability of synchrotron radiation light sources that give off a continuous spectrum of useful X-ray wavelengths, several groups have successfully used this technique to study reaction intermediates. A single Laue photograph of 100 ms exposure can have nearly an entire data set for high-symmetry crystals. Laue photography is very sensitive to the mosaicity of the crystal. Crystals that are too mosaic 10 See Methods In Enzymology, Vol. 114: Harrison, et al., p. 211; Rossman, M. G., p. 237; Fourme, R., and Kahn, R., p. 281.
FIG. 2.21 White-radiation Laue photographs. (A) Laue photograph of photoactive yellow protein taken at beam line X26C, National Synchrotron Light Source, Brookhaven National Laboratories. Crystal size: 60 x 60 x 30/2m ~. Exposure: 30 ms. The image plate used was scanned on a Fuji BAS2000 scanner. The image plate was placed in a cassette with a rubber front that was then mounted on the X-ray camera by means of a mount similar to those used for film cassettes. (B) Close-up of I.aue pattern. The horizontal line across the middle of the image is displayed in cross section at the bottom of the screen. Because of the low background noise of image plates and their high sensitivity, weak diffaction spots are well imaged. In the center just below the horizontal line is a nodal where several circles cross. Laue patterns are indexed by noting the positions of several nodals. The software then searches through all possible orientations for one that matches the list of nodals. Nodals are low-index reflections and are usually energy multiples, hence especially bright. (Photos courtesy of Dr. Keith Moffatt and Zhong Ren, University of Chicago.)
2.4 PreliminaryCharacterization
53
FIG.2.21 (continued).
to use with Laue photography can still be used with monochromatic radiation. A Laue photograph is shown in Fig. 2.21. Note that the pattern is that of many rings. Where these rings intersect is called a nodal. The positions of these nodals are used by the Laue software to orient the crystal by searching through all possible positions and finding the one that best predicts that nodal pattern.
SpaceGroupDetermination Space groups are determined by examining the diffraction pattern and noting any symmetry and systematic absences. It is necessary to have a copy of the International Tables for Crystallography to look up the symmetry patterns to find the identity and number of the space group. A list of the seven crystal systems and their main features is given in Table 2.2. Since proteins
e--,i
o
~.~
~
o
o
I
o
o
II II
II II
o
II
_~e,,I
~, e q
eq
a.,
t",l
9 9
v
o
a.,
~l-
II
II
I
,.._. 9
eq
II
e,,I
II
,.., .---
~
II
II
o t'-,I
v
II
II
t~
II
V II
9
9
II
II
II
v
-6
0
N
2:
9
~
II
C~
II II
e~ eq
cq e~
2.4 PreliminaryCharacterization
55
are asymmetric objects and occur only in the L-form, they cannot be involved in symmetry elements requiring inversion centers, mirrors, or glide planes. This limits the possible space groups to 65 out of the 230 mathematically possible space groups and leaves 2-, 3-, 4-, and 6-fold axes along with the corresponding screw axes, and centering, as the possible symmetries. We will consider each in turn. 2-Folds. A 2-fold causes the presence of a mirror plane perpendicular to the 2-fold axis in the reciprocal space pattern after the addition of the inversion center (see Fig. 2.22). A 21 can be distinguished by the absence of every other spot on the axis that lies in the plane of the mirror. The presence of even one exception to the screw axis absences means the axis is a 2-fold. The 2-fold constrains the angles between the 2-fold and the other two axes to be 90 ~. 3-Folds. A 3-fold causes 3-fold symmetry in the reciprocal axis except on the 0-layer, where the inversion center raises the symmetry to a 6-fold. Thus it is usually necessary to take an upper-level precession photo to differentiate between a 3-fold and a 6-fold. There are two 3-fold screw axes, the 31 and the 32. They give identical absences along the 3-fold axis: only every third spot is present. They cannot be distinguished from each other at this point and must be left to distinguish later. In addition, the 3-fold symmetry gives rise to a hexagonal lattice that constrains the a and b axes to be identical and the angles between axes to be 90 ~ between the 3-fold and the other two axes, and to be 120 ~ (60 ~ in reciprocal space) between the two non3-fold axes. 4-Folds. A 4-fold gives rise to 4-fold symmetry to the diffraction pattern in the plane perpendicular to the 4-fold axis. It also constrains two axes to be identical to each other and all axes to be at 90 ~ A mirror will be found at the plane passing through the origin and perpendicular to the 4-fold (hkO). There are two possible types of screw axis: 41 and 43, with only every fourth spot present; and 42, with every other spot present on the 4-fold axis. 6-Folds. A 6-fold gives rise to 6-fold symmetry both on the zero level and on upper levels. In addition, a mirror is found at the plane passing through the origin and perpendicular to the 6-fold axis (hkO). Both trigonal (nonrhombic) and hexagonal space groups have a hexagonal lattice; the symmetry of the intensities must be used to tell the two apart. The 6-fold axis itself can have three different screws: 61 and 6s, with every sixth spot only; 62 and 64, with every second spot only; and 63, with every third spot only on the 6-fold axis. Two pairs of the screw axes, 61 and 6s, and 62 and 64, can be told apart only at a later stage.
56
DATACOLLECTIONTECHNIQUES
§
Q-
2-fold
A
3-fold
4-fold
§
Q-
§
6-
2-fold + center
A
3-fold + center
4-fold + center
FIG. 2.22 The effect of adding an inversion center to a 2-, 3-, and 4-fold. On the left is the symmetry in real space with a comma used to indicate an asymmetric object. A " + " and a solid symbol indicate an object above the plane of the page and a " - " and an open symbol indicate an object below. Note in the case of the 2-fold that an inversion center forms a mirror perpendicular to the 2-fold. In projection there are two mirrors and, therefore, the 0 level will also show two mirrors, while an upper level will show only one mirror. An inversion center plus a 3-fold still has the same symmetry as before, although in projection it will appear to be a 6-fold. A 4-fold plus an inversion center adds a mirror in the plane of the paper. A 6-fold (not shown) is similar to a 4-fold.
Rhombic. Rhombic is a special case of trigonal and is characterized by having all three axes equal and all three angles equal. It is the hardest system to diagnose because of the difficulty in finding the zones and determining their relationships when the axes are not near 90 ~. In certain cases, C-centered monoclinic may really be R32, so this may be worth checking out.
2.4 PreliminaryCharacterization
57
Centering. Centering can be detected by systematic absences throughout the diffraction pattern. Centering can cause confusion about the direction of the principal axes. Always use the symmetry to determine the lattice. For instance, in a C-centered lattice, spots with h + k = odd are missing. At first inspection, the lattice appears to be running on the diagonals of the cell. Symmetry on the zero level will show the presence of two mirrors that give the correct direction of the two axes. The five types of centering possible are A, B, C, F, and I. A, B, and C centering are identical except for the naming of the axes. The convention is to name the axes such that the cell is C centered, so that h + k = 2n + 1 reflections are missing; that is, the ab face of the crystal is centered. (Thus, A centering has the bc face centered and B centering has the ac face centered.) F is face-centered so that all three faces ab, ac, and bc are missing the odd reflections. I is body-centered so that an extra lattice point is found at the body center of the lattice. This would be easy to miss by precession photography of 0 levels alone, although a picture of a diagonal zone might reveal it. Always assume that you may have higher symmetry than you do until proven otherwise. Never trust 90 ~ angles unless confirmed by symmetry. Never use low resolution photographs to decide the space group. Always keep an open mind about the space group until the structure has been solved and refined. To determine the correct space group it is necessary to take enough precession photos to determine the symmetry elements present, any systematic absences along the axes, and any centering. The photos are then compared with the diagrams in the International Tables. The tables are grouped according to the highest symmetry present (i.e., if you have a 6-fold, then the molecule is found in the hexagonal section). You then search for alternative space groups and ask if the precession photos you have are necessary and sufficient to eliminate all other possible space groups. This may mean taking an upper-level photo to determine the difference between some possibilities such as trigonal versus hexagonal (Fig. 2.23). Determine the size of the unit cell by Bragg's law and compare the volume of the cell in angstroms cubed with the size of the best estimate of the protein's molecular weight (MW) in daltons. Listed in the International Tables for each space group is the Z number, or the number of asymmetric units in the unit cell. Use this formula to calculate the angstroms per dalton of the asymmetric unit: volume/(Z x M W ) . 11 The expected value for this number for protein molecules ranges from 1.7 to 3.0, with the average being about 2.3. If you have a number substantially smaller than 1.7, it is likely that something is wrong or, perhaps, there is internal symmetry in the protein molecule that corresponds with a crystallographic axis. For instance, spot hemoglobin is a tetramer 11Matthews, B. W. (1968).J. Mol. Biol. 33,491.
58
DATACOLLECTIONTECHNIQUES
2.4 PreliminaryCharacterization
59
and has a dimer axis that coincides with a crystallographic axis so that the asymmetric unit contains one-half of a tetramer instead of a full molecule. If the number is substantially larger than 3.0, then there are two possibilities. One is that there is more than one molecule in the asymmetric unit, which is very common. The other possibility is that the space group has higher symmetry than you have determined. Try looking for higher-symmetry space groups that contain the symmetry you have already determined to see if there is a precession p h o t o g r a p h that you could take that would prove or disprove this possibility. For instance, R32 can be reindexed as monoclinic C2 and it can be difficult to spot the difference. If you are using X E N G E N or M O S F L M to reduce the data, it is also possible to try reindexing your data in different space groups and to look at the R-merge values. There are also programs that are commonly used by small-molecule crystallographers to search for additional symmetries in a three-dimensional data set. Finally, a c o m m o n error is to mistake pseudosymmetry for true symmetry. An example of a crystal with pseudosymmetry is given in Fig. 2.24. Pseudosymmetry appears correct at lower resolutions but breaks down at higher resolutions. Most protein crystals show some pseudosymmetry in the range of infinity to 6 A. It is unusual not to have low-order reflections on the axis that are virtually extinct, leading one to the conclusion that there is a screw axis. Always confirm screw axes to at least 3 ~i or better resolution. If you cannot confirm the screw axis to high resolution, then bear in mind that the axis is not a screw axis and try both possibilities. The presence of even a single reflection that breaks the symmetry rules out the presence of a screw axis. If that reflection is weak, however, it may be worthwhile considering that it is an artifact (in particular, K~ radiation can cause artifacts), and do not exclude the 2-fold screw until better evidence is found. One of the best confirmations of the space group is a good heavy-atom Patterson. For example, consider the case of determining whether a 2-fold or a 21 screw is present on the a axis. The Patterson map for both possibilities is calculated the same way. Even if you enter the incorrect possibility in the symmetry operators, both a 2-fold and a 2 ~ degenerate to a mirror in Patterson space. In the case of the 2-fold, the H a r k e r vectors will be at the plane x = 0.0, and for the 2~ the Harker vectors will be on x = 0.5. It is important to plot out
FIG. 2.23 Distinguishing 3- and 6-folds. (A) The 0-level photograph of photoactive yellowprotein taken down the c axis shows a hexagonal net with 6-fold symmetry. Both a 3-fold and a 6fold will show 6-fold symmetry in the 0-level. Thus an upper-level precession photograph hkl (B) was taken to distinguish the two possibilities. This photo also shows 6-fold symmetry,confirming the presence of a 6-fold axis. Another precession photograph (not shown) was taken of the 6-fold axis and showed every odd spot systematically missing. There are no mirrors, eliminating the possibility of the class 622. This is enough information to assign the space group as P63.
60
DATACOLLECTIONTECHNIQUES
FIG. 2.24 Pseudosymmetry. This precession photograph of iron-binding protein shows true 4-fold symmetry. There are also pseudomirrors along the main axes and the main diagonals. Close examination shows these mirrors to be inexact. (Photo courtesy of Andy Arvai, Scripps Research Institute.)
the entire Patterson m a p and look at all possible H a r k e r sections. (For more information on Patterson maps, see the following.)
Unit-Cell Determination A single still roughly the three concentric circles visible and these directions.
p h o t o g r a p h taken along one axis can be used to determine directions of the unit cell. You should have a pattern of around the beam center. Portions of the lattice will be can be used along with Bragg's law to determine two
2.4 PreliminaryCharacterization
61
A d
=
2 sin( tan-l(A/F))2 where A is the spacing between spots, F is the crystal to film distance, and A is the the wavelength of X-rays (1.5418 A for copper targets). It is best to measure a long row and divide the length by the number of spaces to get a more accurate determination. Also, the closer in to the center the row is, the more accurate the approximation of using Bragg's law will be, because the spacing gets stretched the farther out from the center you are. The third direction can be determined from the spacing of the concentric rings using the equation: n~ d= ]
-
cos(tan-l(r/F))'
where r is the radius of the nth circle. The circles need to be close to concentric about the beam stop (i.e., an axis aligned along the direct beam) for this equation to be accurate. The direction determined is correct if you have an orthogonal cell. Otherwise the lengths need to be corrected for the fact that the photographs show d* instead of d directly. To do this simply divide the distance by the sine of the appropiate angle. For example, the correct value of b is b/(sin~). More accurate distances can be determined from the undistorted lattices found on precession photographs. For this you will need photographs of at least two zones. Measure a number of spots in a row and divide by the number of spaces as above and use the same equation derived from Bragg's law. Again, for nonorthogonal cells these distances will need to be corrected.
Evaluation of Crystal Quality The number-one piece of information requested about protein crystals is the limit of observable diffraction. The desire to have this number be as high a resolution as possible (i.e., to have the smallest numerical value) has led to some rather creative definitions where the single highest-resolution spot is used to report this value. It is better to report the resolution where at least one-third of the possible reflections are still visible above background. Even this is pushing it, but resolution inflation is common and pervasive. Another factor to consider is the mosaicity of the crystal. Mosaicity is a measure of the order within a crystal. If a crystal has low mosaicity, the crystal is highly ordered and diffraction spots will be sharp. Highmosaicity crystals will have broader peaks because of lower crystalline order (Fig. 2.25). Since increased mosaicity means that a spot is in diffracting conditions for a larger range of angles, mosaicity may be recognized as broadened lunes on diffraction patterns. Or, if you are using a diffractometer or
62
DATA COLLECTION TECHNIQUES
FIG. 2.25 Mosaicity of crystals. Each block is composed of one to many unit cells. Top: This crystal has high order (right) and thus the diffraction of a single spot is sharper (left). Bottom: This crystal has lower order and broader diffraction spots.
area detector, the profile of a peak may be directly measured. While increased mosaicity in itself may not be a problem, it may indicate other problems. For instance, if when mounting the crystal you let it dry too much, the mosaicity will be increased. If the crystal is suffering from radiation damage (heating, drying, etc.), it is quite likely that the mosaicity will increase. If the crystals have high mosaicity, it may be worth trying to see if a crystal can be mounted to give a diffraction pattern with less mosaicity. Mosaicity may also indicate twinning, where the crystal is actually made up of several crystals joined together. Unless the twinning can be accurately accounted for, it is not possible to use the amplitudes of a twinned crystal to determine the X-ray structure. Finally, increased mosaicity is usually accompanied by lower-resolution diffraction overall and lower signal-to-noise ratio, since the counts are spread over a larger diffraction angle. In looking for twinning it is important to look for extra families of circles that cannot be accounted for by a single crystal in the beam (Fig. 2.26). A twinned crystal is one or more crystalline units joined together. Sometimes the joining is apparent in the morphology, but often the only way to tell is from the diffraction pattern. Still photographs or small-angle photos are best for this purpose. Be cautious in assigning twinning due solely to split spots (Fig. 2.27). If the crystal is slightly misaligned about the center of the camera or the camera is misaligned, this can cause split spots because the
FIG. 2.26 Image of twinned crystal. An image of a crystal with a twinning defect. Two separate, unrelated sets of concentric circles are evident. This crystal cannot be used for data collection.
FIG. 2.27 Split-spot profile. A split-spot profile such as this may indicate a cracked crystal or twinning.
64
DATACOLLECTIONTECHNIQUES
Ewald sphere is not centered on the camera center. It is possible then for a reflection to occur twice in near proximity and to produce split spots. It may be more fruitful to align the camera carefully rather than to throw the crystal away. On the area detector or diffractometer, twinning can be recognized from looking at the profile of several spots in different areas of reciprocal space. For area detectors, 1 0 - 2 0 frames of 0.05~ ~ in oscillation angle are taken on the profile of spots as a function of oscillation angle is plotted. This can be done using the frameview program from XtalView (see following). Diffractometers allow a continuous scan in one of several angles. The presence of split spots indicates a twinned or cracked crystal, which must not be used for data collection. Are the crystals big enough for data collection? This is a question with so many parameters that it is not possible to give a good answer. It is always possible to grow a larger crystal, although it may take considerable experimentation and hard work. The size needed is determined by the quality of the diffraction pattern, not its physical dimensions. Crystals with large unit cells have weaker diffraction patterns than do similar crystals with smaller cells, with the result that a larger crystal is needed. (Actually they diffract the same amount of p h o t o n s - - i n the larger cell these photons are spread out over more reflections.) In the end you have to decide which questions you wish to answer with your experiment. If you want an atomic-resolution structure of a mutant protein to look at small changes from the wild type, then clearly a 4-A diffraction pattern is not enough. It may be worthwhile collecting data on a small crystal for now so that you can start working on structure solution at low resolution while you wait for larger crystals to grow. Avoid collecting data just because you can do it. If you collect a 2-A data set but all the spots beyond 3 A are below the noise level, then it is really a 3-A data set and will not give you any information beyond this.
.....
2.5 . . . . .
HEAVY-ATOM DERIVATIVE SCANNING WITH FILM The traditional method of scanning for heavy-atom derivatives is to use screened precession photos with a precession angle of 5~ ~ or higher. The method is inefficient in that it takes a longer exposure to collect the same number of photons by precession photography than it does to by other methods because most of the diffracted X-rays are blocked by the screen. Shorter exposure times can be used if several-degree-rotation photos or low-angle screenless precession photos are used. In any case the object is to compare the intensities of the heavy-atom film with an equivalent "native" film and to look for intensity changes. The unit cell can be quickly checked by overlaying
2.5 Heavy-AtomDerivativeScanningwith Film
65
equivalent rows on the native film. If the unit cell of the putative derivative changed significantly (> 0 . 5 - 1 . 0 % ) , then the derivative may not be usable. Deciding if there are intensity changes can be difficult for the beginner because it is necessary to differentiate between different exposure times and differences in the rate of falloff for the entire pattern. The best way to convince yourself that the changes are real is to look for reversals where the intensity is greater in one photo and another pair where the intensity differences are reversed (Figs. 2.28 and 2.29). A good heavy-atom derivative has obvious differences. Most photos will not have large differences but may show one or two differences. Remember, "One difference does not a derivative make." The differences should occur at all resolutions. Differences will be found in the lowest-resolution reflections between infinity and 10 A from differences in solvent contrast because of the presence of the heavy atom in the solvent, even if there is no binding to the protein. The differences of an isomorphous derivative will fall off slightly with resolution and will increase with resolution if the derivative is not isomorphous. However, unless the pattern of differences is obvious, it is probably better to decide these questions by collecting some data on the derivative and determining the size of the differences with resolution statistically on a large number of reflections. If rotation photos are used, be careful that you do not compare spots that could be partial in one photograph but not in the other. Examine only reflections roughly perpendicular to the rotation axis and at least one row from the edge of the lunes. Reflections near the rotation axis are probably partial, and very small differences in crystal orientation can cause large intensity changes. Knowing this, it is possible to use rotation photos of about 5 ~ to scan for derivatives for most unit cells. Choose the largest angle you can without getting overlap below 3 A. Align the crystal carefully with still photos 90 ~ apart on the spindle. It is not necessary to align the crystal as precisely as for a precession photo, but be aware that a different pattern of partials can be confused with true intensity changes. In any case the worst that can happen is that you falsely identify a derivative and collect an extra data set. This is far better than missing a derivative altogether. The other common method of finding derivatives is to scan using the
!
2
FIG. 2.28 An intensity reversal between otherwise identical spots on two films. Note that in film 2 the upper spot is larger than the lower, whereas it is the opposite in film 1.
66
DATACOLLECTIONTECHNIQUES
FIG. 2.29 Derivative and native films compared. Left: native iron-binding protein; right: the iron-binding protein soaked in iridium hexachloride. There are clear intensity changes, and many examples of reversals can be found. Also note that the pseudomirror symmetry between top and bottom has been clearly broken in the derivative. area detector. As people are gaining familiarity with area detectors, this is becoming more common. In most laboratories with both detectors and cameras it is easier to get time on the camera, and you can usefully fill time by scanning for derivatives with film while waiting for the detector to become available. In using the area detector, collect enough frames to index and integrate a small a m o u n t of data. This is then merged with the native data, and the resulting statistics (see following) can be used to determine if the crystal is derivatized. If it is not, it can be removed and another crystal tried with a minimal waste of time. This is k n o w n as the "take-it-off" strategy. This method can be used to scan several crystals in a single day. It takes overnight to make a precession photo for the same purpose, and then one is usually comparing fewer total unique spots and the method is not quantitative.
67
2.6 Overall Data Collection Strategy
. . . . . 2.6 . . . . . OVERALL DATA COLLECTION STRATEGY Unique Data The essence of data collection strategy is to collect every unique reflection at least once. First you need to determine the unique volume of data for your space group. This is done by considering the symmetry of your space group and including an additional center of symmetry. Thus the space groups P222, P2122, P21212, P212121, C222, and C2221 all have m m m symmetry in reciprocal space because both a 2-fold and 21 screw degenerate to a mirror plane when a center of symmetry is added. To determine the unique data, you can look up your space group in the International Tables and determine the reciprocal space symmetry (also called the Patterson symmetry because the Patterson function also adds a center of symmetry). In Table 2.3 tl~e volume needed for each space group is listed, For instance, for orthorhombic
TABLE2.3 Unique Data for the Various Point Groups Crystal system Triclinic
Class 1
Data symmetry 1
Unique data
-h,h;
-k,k;
-h,h;
0, k;
0, h; Monoclinic (2-fold parallel to b) Orthorhombic Tetragonal (4-fold parallel to c)
2/m
Rhombohedral
Hexagonal (6-fold parallel to c) Cubic
-k,k;
l,-I
0, k; 0,1or
0, h;
0, k;
-l,l
mmm
0, h;
0, k;
0,1
4/m
0, h; 0, k; 0,1or 0, 1 and any 90 ~ about c
4/mmm
0, h; k - > h , k ; h - > k , h ; 0, k;
3
3
0, h; - k , 0; 0, lor 0, 1 and any 120 ~ about c
3 32 32
3
0, h; 0, h;
0, k;
O, h;
k >- - h/2, k - - h , 1
longest cell edge (~k) 8.0
The factor of 8 can be changed to 12 if mirrors are used. In practice, spots have some width, so that the center-to-center distance that two reflections can have and still be resolved is also dependent upon the optics and the particular crystal. In practice, it is better to err on the safe side. Trying to squeeze too much data onto the detector and overlapping adjacent spots will lower data quality. The easiest and best way to determine the d is to first do a rough backof-the-envelope calculation and then put the crystal on the machine. Find an orientation with the closest spacing on the detector, collect a short data set with the longest axis on the face of the detector, and examine some of the frames (Figs. 2.35 and 2.36). The closest spots should be clearly separated. On the Hamlin detector this can be a single pixel, while on the Bruker (with XENGEN) detector there should be a separation of at least 3 pixels. If the spacing is too close, the detector must be moved back. Do not collect data with overlapping spots! An area detector can be equipped with three types of goniostat, twocircle, three-circle, or four-circle. The two-circle is most limited, consisting of a rotation axis and a swing movement for the detector. Some improvement can be made by mounting crystals so that the rotation axis is a diagonal of the unit cell. With such a setup it is very hard to collect a single data set at high resolution from a single mount.
78
DATACOLLECTIONTECHNIQUES
FIG. 2.35 Frame from San Diego Multiwire Systems (Hamlin) area detector. Frame from a multiwavelength data collection experiment at beam line I-5 at the Stanford Synchrotron Radiation Laboratories. Spots are well separated. Data collection geometry has been set up to allow
Bijvoet pairs to be collected simultaneously on the left and right halves of the detector (rotation axis is horizontal at this beam line) by taking advantage of the mirror symmetry of the samples space group (P2~2~2~). (Frame courtesy of Brian Crane, Scripps Research Institute.)
A three-circle goniometer has a rotation axis and a ~ rotation mounted with X fixed at 45 ~ A swing angle is also provided for the detector. Data are usually collected by rotating around the rotation axis for as far as possible. The crystal can be rotated around the 4~ axis to collect new data. Usually rotating 4~ 90 ~ will give the most new unique data. A four-circle goniostat will allow the most control over data collection. It has all the movements of the three-circle, and X can be adjusted a full 360 ~ Because of the sizes of the detector and the X circle, however, these collide after co has moved over a limited r a n g e m u s u a l l y about 60 ~ To overcome this
2.9 AreaDetector Data Collection
79
FIG. 2.36 Area detector frame of data collected on a Bruker area detector. The frame is 0.25 ~ rotation of Fe-binding protein. The detector was located at 17 cm. Note that the spots (see lower right) are just being separated.
disadvantage, the crystal is ratcheted by advancing ~b 60 ~ and a n o t h e r a) sweep collected. A useful recipe for data collection using a four-circle goniostat with a Bruker area detector and an o r t h o r h o m b i c crystal is given in Table 2.4. It collects the unique data in a m i n i m u m a m o u n t of time at 2-A resolution at a crystal-to-detector distance of 12 cm. The crystal is m o u n t e d so that one axis is a p p r o x i m a t e l y along the capillary (i.e., at X 0~ o) and this axis will be coincident). This needs to be only accurate to a b o u t +_5~ Optically center the crystal on the goniostat. M o v e X to 0, a) to 50, and set the swing to 22.5 ~
80
DATACOLLECTIONTECHNIQUES TABLE2.4 Example of Bruker Area Detector Data Collection for Orthorhombic Crystals
Swing
X
4)
Oscillation (a))
Number of frames
-10
22.5
15
0
0.25
240
50
-10
22.5
15
60
0.25
240
50
20
22.5
75
0
0.25
120
Run
Start
1
50
2 3
End
N o w rotate 4) and take still frames until a 0 layer (the 0 layer is the one that passes t h r o u g h the b e a m stop) is centered on the detector (Fig. 2.37) so that the outer edge of the circle is at the edge of the detector. Define this 4) angle as 0 ~ As data are collected, the 0 layer circle will move from the center of the detector t o w a r d the b e a m stop. The X is then offset to m a x i m i z e the a m o u n t of unique data. O t h e r w i s e the data on the top and b o t t o m of the detector will be related by m i r r o r symmetry. Since not all the data can be collected in one run (we need 90 ~ + 22.5~ a n o t h e r run is done with 4) rotated. To fill in the data that were missed by the limit of the detector height, a fill-in run of half the length at X 90 - 15 ~ or 75 ~ is used. If Bijvoet pairs are desired, then
+m
~.
..
+t0
....iiiiiiiiiiiiiiiiiiiiiiiiiiiiiii i. . . . 9!!" i!
:"':'" .:!
. .......... "":.. .."......... :. "i.
--~,...... ,i .... ~:~...... -! ....
............................. 'i:; I uea op "i shadow
i---~~~
i ii;;! !!i!;ii i.i.i
.:'::" .i:"
,,:
................................ .... .............
.,,:
::"i ~r
..,...................... ~..~'~.. .... ~~
i!i!i!:!!i!i;i.ii 9-..;-.-.,;
A
D,
o....o
B
FIG. 2.37 Starting position for ()rthorhombic data collection. (A) The crystal has been aligned so that one axis is vertical, coincident with the rotation axis, and the crystal is rotated until the 0 layer stretches from the beam stop to the edge of the detector. The 0 layer always intersects the beam stop. (B)X has been rotated 15~to maximize the unique data and to minimize the effect of the blind region around the rotation axis.
2.9 AreaDetectorData Collection
81
TABLE2.5 Example of Bruker Area Detector Data Collection for Orthorhombic Crystals
Run
Start
1
50
2 3 4 5 6
End
Oscillation (co)
Number of frames
0
0.25
240
180
0.25
240
60
0.25
240
-15
240
0.25
240
75
0
0.25
120
-75
180
0.25
120
Swing
X
-10
22.5
15
50
-10
22.5
50
-10
22.5
50
-10
22.5
50
20
22.5
50
20
22.5
-15 15
4'
interleave runs at X = - X , ~b + 180 ~ leaving the other angles the same, for a total of six runs (Table 2.5).
Increasing Signal-to-Noise Ratio Other than modifications to the optics and the beam stop, there are several easy ways to lower the background for marginal crystals. The first is to decrease the width of each frame so that each reflection takes about three frames to diffract completely. The background is a continuous value as the oscillation angle changes, whereas the spot is not. Taken to the extreme, it easy to see how this helps. If each frame is 1.0 ~ wide and the spot diffracts for 0.25 ~ of this oscillation, then in the pixels containing the reflection, background will have accumulated for four times longer than the reflection counts. This will greatly decrease the signal-to-noise ratio. In tests on our Bruker area detector we have found that collecting 0.1 ~ frames instead of 0.25 ~ frames increased the I/or(I) ratio for weak high-resolution reflections beyond 2.0 A by two times. A second method of reducing background is to pull the detector back. The background falls off as a square of the distance from the crystal (ignoring air absorption for now). To a first approximation, the diffracted rays are parallel and do not decrease in intensity with distance. So doubling the distance from the crystal will decrease the background four times. Of course, it will also decrease the amount of data that can be collected in a single frame. If distances greater than about 15 cm are used, a helium path is necessary or the gains in background will be lost to air absorption. If you have many small crystals, or if your crystals are not radiation sensitive, an increase in signal can be had by pulling the detector back and collecting more crystal positions to make up for the lower reflections per frame.
82
DATACOLLECTIONTECHNIQUES
In tests at Scripps using a Bruker area detector, we found that I/~r increases about 1% per centimeter for helium versus air for distances greater than 10 cm. So at a d of 20 cm, an increase of 10% in I A r is expected. Hamlin-style detectors require greater distances, so helium paths are a must.
.....
2.10 . . . . .
IMAGE PLATE DATA COLLECTION Image plates are relatively new but have the potential of becoming the data collection method of choice. They have a high spatial resolution of 1 0 0 - 1 5 0 / ~ m , similiar to film, and subtend a large angle so that more data are collected at once. ~8 Image plates can be used either as an alternative to film or as a replacement for the detector in an area detector. In the film mode they can be used in the same cassettes that X-ray film is used in and scanned off-line up to several hours later. In the area detector mode they are automatically scanned after each exposure by apparatus built directly into the machine. The dynamic range of an image plate is much higher than that of filmmit can reach 12 bits for image plates, whereas film is limited to 8 bits in practice. 19 Image plates are exposed with X-rays, as with any other detector, and the X-ray photon causes a chemical change in the plate coating that releases a fluorescence that is detected by a photomultiplier when scanned with light of the proper wavelength. Image plates are read out by a laser beam on a scanner. The quality of this scanner largely determines the limits of the image plate. The construction of a high-quality scanner is a technically difficult feat because of the mechanical precision needed and the high quality of the electronics needed to take full advantage of the image plate's capability. The photomultiplier must have low noise and use a high-quality analog-to-digital converter, and the laser used for scanning must be stable and must hit precisely when scanned. Image plates have a wider range of sensitivity with respect to X-ray wavelengths, which gives them higher counting efficiency at higher energies. This makes them the detector of choice for white-radiation Laue experiments that use very bright synchrotron light sources. Because only a few exposures are needed for Laue data sets, manual handling of the plates is not a great disadvantage. For collecting data sets with monochromatic radiation where hundreds of exposures are needed, an automated method of scanning the plates, such as the MAR Research scanner (Fig. 2.38), is a necessity. Miyahara, J., et al. (1986). Nucl. Instrum. Methods A246, 572-578. 19This is based upon practical experience and is not a theoretical limit in either case. 18
2.10 ImagePlate Data Collection
83
FIG. 2.38 Frame from a MAR research image plate. The crystal was rotated 1~about the horizontal axis for a 5-min exposure using a rotating anode source. The edge of the image is about 2.1-A resolution.
Image plates are erased by exposing to white light. This means they can be handled in the room light before they are exposed to X-rays. After exposure they must be protected from light. Cosmic background radiation will slowly expose the plate, so they need to be freshly erased before being used. Exposure to very bright X-rays such as the direct beam will cause a spot that will take a long time to erase and can even show up for m a n y e x p o s u r e erasure cycles (months). With the use of an image plate as a film replacement on a monochromatic X-ray setup, the data are collected using the rotation m e t h o d as previously described. Software used for the analysis of film can be adapted easily
84
DATACOLLECTIONTECHNIQUES
by removing the corrections for film sensitivity, because image plates are linear. Since image plates are becoming common for use in other applications to replace film, such as radiography of gels, an image plate scanner may be available at your institution. These scanners are perfectly adequate for replacing film in preliminary characterizations of crystals.
. . . . . 2.11 . . . . . SYNCHROTRON RADIATION LIGHT SOURCES The term synchrotron light is misleading because the sources of synchrotron light are usually electron storage rings. Synchrotrons are very different machines that are never used directly as X-ray sources. The first observation of synchrotron radiation was made using synchrotrons, and therefore the name. It is inaccurate to say "We are collecting data using a synchrotron," but the term has become so common that "synchrotron" is now synonymous with synchrotron radiation light source in protein crystallographic jargon. An excellent book on synchrotron sources and crystallography is Helliwell's Macromolecular Crystallography with Synchrotron Radiation. There is room here only to touch on the subject and to point out areas of special interest.
Differences from Standard Sources Synchrotron radiation as available at a storage ring has a continuous spectrum in the area of interest to protein crystallographers and is very bright (Fig. 2.39). Even after tight monochromatization where only a small fraction of the total energy is used, the sources are still up to two orders of magnitude brighter than the best rotating anodes. The tight monochromatization can mean lowered backgrounds and decreased radiation damage for the same exposure. Furthermore, the optics at storage rings is usually far superior to anything used in laboratory sources providing tightly collimated highintensity beams. One reason for the better optics is that the source is located meters away instead of within less than a meter, giving effectively a parallel source. The combination of brighter, tighter optics makes synchrotron sources the best for very large unit cells such as are found in viruses with cells from 300 to 1000 A. In our experience with many different crystals we have always found an increase in signal-to-noise ratio at storage rings. The ability to tune the wavelength allows the use of more optimal energies. Wavelengths near 1.0 A ~ show very little absorption by the capillary and the solvent around the crys-
2.11 SynchrotronRadiationLightSources
Synchrotron
radiation___ ----.,,-I e ......
85
J
I
e+
C
!
magnet
Wiggler FIG. 2.39 Synchrotron radiation source. A storage ring has high-energy electrons held in orbit by bending magnets (A). As the electrons accelerate around the curve they emit synchrotron radiation (B). Because the beam is so intense, all experiments are done in shielded hutches that are interlocked so that personnel cannot be inside while the shutters are open. A wiggler (C) is a method of increasing the brilliance of the X-rays by combining several beams from local excursions of the electron path.
tal, allowing wetter mounts while obtaining better signal-to-noise ratios. Corrections due to absorption are minimized. We have not found that harder radiation decreases lifetimes; in fact lifetimes are longer, since the absorption that causes free radical damage is more efficient at lower energies.
Special SynchrotronTechniques The simultaneous availability of all wavelengths led to the development of white-radiation Laue photography. Exposure times for Laue photographs can be very shortml0-ms exposures for a typical lysozyme crystal at the best s o u r c e s u a n d yet contain almost all the diffraction information in one or two photos. Furthermore, since most of the factors that need to be corrected for in reducing the data are a function of wavelength, especially absorption,
86
DATACOLLECTIONTECHNIQUES
a9
0
FIG. 2.40 How overlaps arise with white radiation. In white-radiation Laue photography a range of wavelengths is used simultaneously, and thus there are many Ewald spheres in diffracting conditions simultaneously. Two are shown here that differ in wavelength by a factor of 2. The resulting diffraction exits the crystal in the same direction and is recorded on the detector in the same spot, leading to an energy overlap.
the presence of the same reflection measured at different wavelengths in the same data set (Fig. 2.40) allows these parameters to be accurately accounted for by least-squares scaling. Moffat and co-workers have collected lysozyme data sets that compare favorably with data collected a diffractometer. 2~ The brightness of the source and high-quality optics make the storage ring an ideal place to collect data on very large unit cells like those found in viruses.
Time-Resolved Data Collection The short exposure times needed at the storage rings has allowed the collection of time-resolved protein crystallographic data. Reactions are initiated by laser flashing or in flow cells (for very slow reactions) and then data are collected by white-radiation Laue photography at appropriate time points. When an undulator is used to intensify the beam and white-radiation Laue photography on storage phosphors, the exposure time can be as short as microseconds. One of the chief difficulties can be the relatively high concentration of a protein crystal. This makes it difficult to deliver enough substrate, and the optical density can be very high. If light is to be used to start a photoreaction, 2~ B., and Moffat, K. (1987). In "Computational Aspects of Protein Crystal Analysis. Proceeedings of the Daresbury Study Weekend, DL/SCI/R25" (Helliwell, J. R., Machin, P. A., and Papiz, M. Z., eds.), pp. 84-89. See also, Helliwell, J. R., et al. (1989). J. Appl. Crystallogr. 22, 483-487.
2.12 DataReduction
87
high absorption necessitates intense light sources and causes gradients across the crystal. It is better to illuminate off the absorbance peak at a position where the crystal is still transparent to the light so that light can get to the entire crystal volume. These experiments are, therefore, technically demanding and must be done carefully to ensure that most of the crystal is synchronized; otherwise the time resolution will be lost. Reactions can also be started by diffusing in substrates using a flow-cell apparatus. In this case, the reaction must be very s l o w - - o n the order of h o u r s - - o r else the diffusion time will be greater than the reaction time and the reaction will not be synchronized across the crystal.
. . . . . 2.12 . . . . . DATA REDUCTION Integration of Intensity Integration of the intensity in a spot is a matter of separating the background counts from the reflection. Two methods are in general use: 1. Mask and count. In this method the region that is to be considered the spot is masked and the pixels within this region are summed (Fig. 2.41). The background is determined from the pixels adjacent to the spot and this value is subtracted to give the final intensity. The method works well when spots are well above background. The pixels can be counts in the case of counters, as in diffractometers and area detectors, or optical densities in the case of film. nP
nB
Ihkl = ~ countsP - ~ countsB 1
1
2. Profile fitting. In profile fitting a curve is fit to the data and the area under the curve is taken to be the intensity (Fig. 2.42). The curve, or profile, can either be a geometric shape such as a Gaussian or it can be derived by averaging over the brighter spots. The advantage of the latter method is that bright reflections can be used to determine the profile, which is then applied to weak reflections. Different profiles are usually used depending upon the position of the spot on the detector. For example, the detector might be separated into a 4 x 4 array and a different profile used in each of the 16 areas. Then, to find the area of the spot, this curve is best-fit to the counts found in the area where the spot is predicted to be, and the area under the curve is then used to find the integrated intensity rather than the counts themselves.
88
DATA COLLECTIONTECHNIQUES
1D
i
FIG. 2.41 Integration by masking in one and two dimensions.
Error Estimation An a c c u r a t e e s t i m a t i o n of the e r r o r is i m p o r t a n t . T h e e r r o r of a single reflection is t e r m e d its or. C o n t r i b u t i o n s to or: C o u n t i n g statistics-or .......t ~ - V ' N p e a k + N b a c k g r o u n d (note t h e i n clusion of b a c k g r o u n d counts) 9 Instability of d e t e c t o r ; usually a c o n s t a n t 9
t I
I
FIG. 2.42 Profile fitting. Profile fitting can more accurately find the intensity of a peak--especially in the example on the right, where the background is sloped.
2.12 DataReduction
89
9 Profile fitting: deviation from observed and ideal shape 9 Local variation in background Other sources of errors are saturated pixels (photographic film is especially vulnerable to this), overlapped profiles, and errors in background models. Merged multiple measurements of several reflections should be weighted by o'. Different data reduction packages will determine different values of o-, and the data are probably better averaged without ~r weighting. In my experience the o- of some packages can differ by at least a factor of 2. Reflections are often rejected by the ratio of intensity to ~r, I/o(I).
Polarization Correction The polarization correction arises from the dependence of scattering efficiency as a function of scattering angle. For polarized sources, the scattering efficiency is also a function of the change of polarization direction with the angle of the scattering plane. Sources can be polarized by a monochromator, so this correction is dependent upon the optics of the source used. For unpolarized radiation, p = 1/211 + cos2(2{3)].
Lorentz Correction The Lorentz correction accounts for the rate with which a reflection passes through the Ewald sphere. Reflections near the rotation axis remain in diffracting conditions for a longer time. At some point this correction becomes so large that the reflections very close to the rotation axis are rejected.
Decay or Radiation Damage Prolonged irradiation of a sample induces radiation damage. Decay usually affects higher-resolution reflections faster. If there is a choice, the higher-resolution data should be collected first. Decay should be monitored by collecting a set of standard reflections. If the decay exceeds about 20%, data collection should be halted. Although a decay correction can partially account for decay, different reflections can decay at different rates so that a single decay parameter cannot restore the accuracy of the data set. Radiation damage can be reduced by lowering the temperature (see Chapter 6). This slows down the free radical chain reactions that are thought to induce radiation damage. Decay is a function of time and dose. However, it is not linear with dose, and brighter sources can collect more counts before the same
90
DATACOLLECTIONTECHNIQUES
amount of decay sets in. This is a great advantage of synchrotron sources. Also, once a sample is irradiated the free radical chain reactions are initiated and will continue even after the beam has been off for some time. Irradiation affects samples at different rates, and some samples are very sensitive. The presence of a metal that absorbs X-rays more efficiently, such as iron, platinum, or mercury, can speed up decay.
Absorption Absorption is probably the largest source of uncorrectable error in data sets. The path length of the diffracted X-rays through glass, crystal, solvent, and air determines the amount of absorption. This path length is different for each reflection. Unfortunately, there is no entirely accurate way to model this absorption. Two approaches are generally used. In the first, experimental measurements are made of the absorption in different directions through the sample, and each reflection is corrected by these factors. In the second method, a least-squares fit is made to the differences between symmetryrelated reflections as a function of some parameter believed to be a function of absorption. The experimental correction is easily calculated in the case of a diffractometer. In the case of two-dimensional detectors, the second method is normally used. The overall error in a data set can be estimated by comparing symmetry-related reflections, which in the ideal case would be indentical. The reflections are calculated as
~,,(I,, - L) R ~y111111 ----
...... 3 ...... COMPUTATIONAL TECHNIQUES
There are several different crystallographic software packages available and it would be impossible to cover them all. The XtalView package is used for specific examples in this book. XtalView is a window-based visually oriented package that is especially easy for novices to learn. Options and commands are shown as buttons, sliders, and menus. All options are visible, making it easy to spot them and ideal for publishing in book form. You may not want to use XtalViewmperhaps you already have a favorite package. In any case, most programs have similar options and features. For consistency, a single package is necessary for this book so that we can get right to explaining the methods and spend less time explaining the particular implementation. XtalView was written at the Research Institute of Scripps Clinic by the author. It runs under X-windows, which is available on most workstations (Figs. 3.1 and 3.2). At present it has been ported to Sun workstations, (including the SparcStation series), Silicon Graphics, and DECstations running ULTRIX. DENZO, MOSFLM, and XPLOR are used as the primary examples for data collection and protein refinement, which XtalView does not include.
91
92
COMPUTATIONAL TECHNIQUES
A
r~
XtalView
Xtalmgr
Project: examplesA
~-~
I ~ examples
~
Crystal: cvccp
Ne....
Directory:. /as d/prog/XtalV iew/exam pies
Utilities: ~
Applications: ~
resflt limit1 [limit2] < >
xHeavy Command: xheavy ccp.aul.sol
(LiSt Files)
('Auto Name Output)
Input Argument 1:
ccp.aul .sol Filter: *.sol
(Add Args)
( Run Command]
( History,,.~
Input Argument 2:
Output Argument:
Fi Iter:
Filter: *,phs
r---i
I ccp.aul.sol
I i~
ccpaul ano,sol
r, 1
ccp,calc.phs ccp.phs ccp50.phs hp50.phs
o x
FIG. 3.1 XtalView xtalmgr. (A) The XtalView xtalmgr program is used to organize data and to start the individual applications. It has a graphical user interface using buttons, pulldown menus, and scrolling lists. Data are organized into Projects, which can be entered and edited using the field at the top of the window. The Crystal field is a keyword used to access the parameters for a specific crystal type such as the unit-cell parameters and the the space group symmetry operators. Other applications are selected from a pulldown menu (not shown) accessed from the Applications glyph. Selecting an application causes all files with the correct extension to be listed in one of the three file lists at the bottom. Files can be selected from these lists by clicking on them with the mouse. The command line is then built up using Add Args and then the application is started with Run Command. (B) The crystal editor is used to enter the unit-cell parameters, space group information, and any other relevant information. The space group symmetry operators for all space groups are kept in a table and can be accessed either by space-group number or by symbol as found in the International Tables for Crystallography, Vol. 1. The information entered into the editor is then available to all XtalView programs by simply entering the crystal keyword ( c v c c p in this example).
3.1 Terminology
93
B
r-'~
Crystal Ed itor Crystal: cvccpA Title: Unit Cell: 49.2 5G.7 98.8 90.0 90.0 90,0 S pace Gro u p: P2(1)2(1)2(1) Space Group#: 1 ~ ( Find Space Group by number) Symmetries: ~ 1/2-x,-y, 1/2+z; 1/2+x,1/2-y,-z; -x,1/2+y,1/2-z, Other Fie Ids: Keyword: Data:
(ReplaceField)
(Create
Field) (DeleteField)
chromatium vinosum ccp orhtorhombic form ncrsymml 1.0 0.0 0.0 0.0 1.0 0.0 0,0 0.0 1.0 0.0 0.0 0.0 ncrsymm2 -0,99881 -0.03213 0.03673 0.02744 -0.99217 -0.12187
( Up date Th is Crystal)
FIG. 3.1 (continued).
.....
3.1 . . . . .
TERMINOLOGY Reflection
A reflection is a single X-ray-diffraction vector that is the combined scattering resulting from the individual scattering of all the electrons in the unit cell along a particular direction. It has a magnitude, tFI, that is referred to as F, a phase, a, and the Miller indices h, k, I. The diffraction vector for protein is called Fp and the same diffraction vector with a heavy atom soaked in is FpH. The diffraction vector for the heavy atom alone is fh (the lowercase letters remind us that fh can never be directly measured but is always calculated). Two separate observations of the same reflection will be called F1 and F2.
l m m ,
9
9
9
,
9 .
.
L,
m
m
. 9
m'l ..
.
,
m
,
,
m.
.m 9
a-
9
". 9
9 9
,.
,
9
,~ .,
9
9
o.
9
,
.m
9
o
9
.
m.
."9
9
9 9
a
..
9
9
~
,o
i'...
,
a
9
.
.
9
-" . .. . ,ram..
9
9
9
9
9
,.
9
.
9 C~3
mm
9
9
9
9
9 9
.
.
.
.
9
9
;
mm
9
,
9 *
-.
.
9
9
,
.
,.
,n
9 -
.
*
9
9
n u
-
9
.m
I
9
.
*
...~
. l i e
9
9
9
..
.
9
.m ,
9
.
9
9
9
9
.
" (
..
.-
"
"... .,
m.
m a
o m
9
9
9
~I
.
.
.
.
. 9
9
o
.r
9
.
.
.
.
9
. 9
,m . . .
.
..
9
9 9
.
, ..
. o
m,
,
~.~ n
9
,
9
ram.
.
9
,,o
n
.-
a
. .
9
9 9 r
0
9
,.
.
9 a
9
,
IB
~
9
,.
9
.
.
;'';,
o.
9
.
~
.
9
9
.
.
9
,
9
9
9
9
.
9
-.
v~..~ *~"~
,.
9
..
b
9
.
9
m
,
9
.
.
:"
,
.
n "
9
m
"1
n 9
9
m
,
.
,.
-'"
9 9
"n
9
.
9
mm
9
1000 r~[~ % of average (Read Phase File) Grid
Output
Number in X: 30 ~ Y: 35 [X'[~ Z: 60 [XT~ OR precentage of m i m i m u m resolution: 33 OR approximate spacing in Angstroms: 0.00 Map File: ccp.aul.map
( Calcu late) Opened Fourier coefficients are (2*Fo -Fc) * exp (i *phi c) Opened for reading f i l e /tmp_mnt/asd/prog/XtalViev/examples/ccp.aul.phs 1244 reflections read, resolution l i m i t s : 37.160431 - 5.000852 H l i m i t s : 0 - 9 K l i m i t s 0 - 11 L l i m i t s 0 - 19
FIG. 3.9 Xfftprogram interface. Xfft is used to calculate electron density maps by a fast Fourier transform method. The user interface illustrates the options available. The user selects the desired options by selecting buttons and entering numbers in the appropriate fields and then pushes the Calculate button. The options can be saved and later loaded using the Defaults menu button. map. If these differences are caused by outliers, they can completely swamp the signal. Therefore, a filter has been added to xfft to detect obvious outliers and delete them. They are detected by rejecting any reflection where the IF1 - F2] > [p * (F1 + F2)/2], or where the absolute value of the difference is greater than p times the average of the two reflections, and is usually set to 1 0 0 % for isomorphous differences and about 3 0 % for anomalous difference Pattersons. If p is set greater than 2 0 0 % , then no differences will be rejected. An example of the usefulness of this filter is shown in Fig. 3.10.
3.4 The PattersonSynthesis
127
B ~176176176176176176 j(..zj,' ' Y'
,3
'
'~..]
~.5~
0.000,0.000
,~.~,,,
,
,
,
0.500,
cJ
0.444 x = 0.500
x = 0.500
FIG. 3.10 Effect of filtering outliers. (A) Gold-derived Patterson map without filtering of outliers. (B) The same data filtered so that if IFp - Fpnl is greater than 100% of Fp + FpH, the reflection is rejected. This filter rejects a handful of very large differences that were dominating the Patterson and making it uninterpretable. After removal of the outliers, the Patterson was interpretable, and this derivative turned out to have excellent phasing power.
There are Nz peaks in a Patterson map, where N is the number of atoms in the unit cell. Of these, N are vectors between the same atom and fall on the origin. This leaves N ( N - 1), and the unique peaks are N ( N - 1 ) / Z , where Z is the number of asymmetric units in the Patterson space group. For example, if we have 3 atoms in the asymmetric unit of an orthorhombic space group, then there are 12 atoms in the unit cell. There are 144 vectors, of which 132 are not at the origin. Since the Patterson has a Z of 8, there are 132/8 unique vectors or 16.5. The fraction occurs because some peaks are on mirror planes and are shared by two adjacent asymmetric units.
Harker Sections The peaks on a Patterson map result from all possible combinations of vectors between atoms in the unit cell including symmetry-related atoms. The symmetry-related peaks fall onto special positions, are called Harker peaks,
128
COMPUTATIONALTECHNIQUES
and fall onto Harker sections. For example, in the space group P2, for every atom at x, y, z there is a symmetry mate at - x, y, - z for which a vector will occur at (x, y, z) - ( - x , y, - z ) = 2x, 0, 2z (Fig. 3.11). Thus, on the Harker section y = 0 peaks can be found at 2x, 2z for each atom in the asymmetric unit. However, while all Harker peaks fall on Harker sections, not all peaks on a Harker section are Harker peaks. If two atoms, not related by symmetry, happen to have the same y coordinate, they will produce a cross-vector on the section y = 0. The positions and relationships of Harker peaks (also called self-vectors) can be found by subtracting the possible combinations of symmetry operators for a given space group pairwise as shown in Table 3.2. The peaks for the example in Table 3.2, space group P2221, fall onto three Harker sections: x = 0, y = 0, and z = 1/2. The Harker sections can be found by noting places in the table where one of the coordinates is a constant. An examination of the space group symmetry will also reveal the Harker sections, although not the full algebraic relationship. A 2-fold axis will have a corresponding Harker section at 0 in the plane perpendicular to the axis, a 2-fold screw at 1/2, a 3-fold screw at 1/3 and 2/3, and so forth.
Solving Heavy-AtomDifferencePatters0ns Before you can use a heavy-atom derivative for phasing, the positions of the heavy atoms in the cell must be found. If no phasing information exists, the only map that can be made at this point is a difference Patterson map. The Patterson map must be solved for the x, y, z positions that produce this pattern of vectors. Note that the solution to the Patterson function is not unique; many different sets of positions can explain the same Patterson. The different solutions fall into two categories, origin shifts and opposite-hand
- ~ 2x,1/2 ~I -x,y+l/2
r
F2 v
0,0
0 x,y --~
-2x,-1/2
FIG. 3.11 Originof Harker planes. Left: Two atoms related by a 2-fold screw axis along y that displaces the second copy of the atom by 1/2 along y relative to the first. Right: Patterson synthesis is constructed, the self-vectors fall along the planes y = _+1/2.
3.4 The PattersonSynthesis
129
TABLE 3.2
Harker Vectors for Space Group P2221 P2221
x, y, z
x, y, z 1/2 + -x, y, 1/2 - z -x,
-y,
x, -y, -z
z
-x,
- y , 1/2 + z
- x , y, lA - z
x, - y ,
0, 0, 0
2 x , 2y, 1/2
2 x , 2y,
1/2 2x, 0, 1/2 - 2z
0, 0, 0 0, 2y, 1/2 - 2z
0, 1/2- 2z 0, 2y, 1/2 - 2 z 0, 0, 0
2x,
0, 2y, 2z
2x, 0, 1/2 - 2z
2x, 2y, 1/2
0, 0, 0
2x,
-z
0, 2y, 2 z 0, 1/2 - 2z 2x, 2y, 1/2
choices. Fortunately, any self-consistent solution will give phases that will produce the same protein map. The choice of origin is arbitrary, and it will not matter which is chosen. The choice of hand is important because only a right-handed solution is correct. However, for isomorphous derivatives an incorrect choice of hand will produce a left-handed protein map that otherwise is identical to the right-handed map (this is n o t true for anomalous data). This can easily be remedied by inverting the signs of all of the heavyatom positions and recalculating the phases. If two derivatives are solved from their Patterson maps, there is no way to k n o w whether they have the same origin and hand. To solve this problem, difference Fouriers are used (see the following) to put both derivatives in the same framework. If this is not done, then the derivatives cannot be combined. There are two ways to solve Patterson maps. One involves visual inspection and the calculation of heavy-atom positions by hand. This is usually not as difficult as it might seem at first because symmetry produces many special relationships between Patterson peaks that make it easy to find the solution. The other method involves using the computer, which, when it works, is obviously the easier method. But there is no guarantee that the computer can find the correct solution. Before trying any method, evaluate the quality of the Patterson map to see if it is possible to solve it at all. Frequently, the first derivative has a Patterson that is not easily solved. It is sometimes more fruitful to keep looking for a derivative that produces an easily solved Patterson map that can be used to bootstrap the rest. In my experience, this simple strategy has almost always worked. Of course, if you have four protein molecules in the asymmetric unit, your chances of finding a single-site derivative are pretty unlikely. To improve your chances of solving a Patterson, either by hand or by computer, it helps enormously to make the best Patterson map you can. Because the differences are squared, Patterson maps are easily overwhelmed by a few large differences. Outlier differences should be filtered out as discussed earlier. A value of 100% is usually about right for this filter, since larger
130
COMPUTATIONALTECHNIQUES
differences are often incorrect. Deleting differences smaller than 100% can lower the signal-to-noise ratio. If the derivative has a high merging R-value (above approximately 0.20), then a larger percentage may be needed. The other filter to be set is resolution. Data below about 25 A are often clipped by the beam stop on most data collection systems. If the beam stop is not perfectly round, the amount of clipping could be different between data sets. Also, very-low-resolution reflections will have strong differences that are due to the change in contrast from native to derivative mother liquor, either because the soaking mother liquor is higher in precipitant or because the dissolved heavy atoms in the soaking mother liquor interfere. Often, leaving out such data will improve the Patterson map. As resolution increases, data become noisier because they are weaker, and often become less than perfectly isomorphous. This can give higher-resolution Patterson maps lower contrast. As a first approximation, 4 - 5 A is a good upper-resolution limit. The quality of the Patterson can be judged by looking at the peak-to-background ratio of the Patterson map. The background is taken to the root-mean-square, or sigma (or), of the entire map. Peak heights are then expressed as ratios of the peak height over the sigma of the map. If you use xcontur to contour your Patterson map, it automatically sets the first contour level to lo-, the second to 2o-, and so on. Look first at the Harker sections for large peaks. If the Patterson is complex, then the Harker sections may be crowded and looking at general sections may be more fruitful. Try varying the resolution and outlier filter to produce the best contrast ratio. I cannot overemphasize the need to filter outliers and very-low-resolution data; otherwise interpretable Patterson maps may be overwhelmed by a few bad differences (see Fig. 3.10 for an example). The effect of changing the upper-resolution cutoff should be more subtle. At some point the contrast will be maximal, but the basic features of the Patterson map should not change with resolution. If they do, this is a bad sign, indicating that the differences are not due to the isomorphous addition of heavy atoms. Classical Methods In addition to this explanation of how to solve Patterson maps by hand, examples are given in Chapter 5. In most cases the strategy is first to find a Harker peak or set of Harker peaks that gives a first site and then to use cross-peaks to find additional sites. Make a table of symmetry operators and find the positions of the Harker peaks as is illustrated in Table 3.2. Sometimes special relationships can be found between Harker peaks that make it easy to find Harker peaks that arise from the same site (see the Patterson examples in Chapter 5, Section 5.1). This helps in identifying non-Harker
3.4 The PattersonSynthesis
131
peaks that happen to fall on a Harker section. Related sets of cross-peaks can also sometimes be found, and relationships between Harker peaks and crosspeaks can often be identified, making it easy to find related sets of peaks. These relationships can be found with a little simple algebra using the symmetry operators of the space group. For example, say you have a Patterson in space group P222. The three sections x = 0, y = 0, z = 0 are Harker sections and the peaks on these are from vectors 2z, 2y, 2z. You can look for single-site solutions that match between different Harker sections. In some cases a single site will explain all the Patterson peaks, and there are no significant non-Harker peaks. Congratulations, you are done. If not, look for a cross-peak. You can find a second site by taking the position of site I and adding the cross-peak. However, remember that you do not know from which symmetry-related atom in the unit cell this peak arises. Also you do not know in which direction the vector goes: from atom A to B or from B to A. Therefore, you must try all of these possibilities. In this space group there are four symmetry-related atoms: A1, A2, A3, a4. Given a cross-peak X we need to try A~ - X, A2 X, A3 - X, A4 - X and also A1 + X, A2 + X, A3 + X, A4 + X to find atom B. We can confirm atom B by finding the Harker peaks predicted by atom B and also the other cross-peaks generated by symmetry-related atoms at sites B and A. These vector calculations are most easily done by a computer. Xpatpred in XtalView can generate all the heavy-atom vectors given a list of sites. These can then be loaded into xcontur and compared against the Patterson map. Each of the possibilities can be tried and the results quickly scanned by looking at the agreement with the peaks in the Patterson map. You can maximize your chances of finding a match by starting with the highest Harker peaks and the highest cross-peak. This may fail, though, because the highest peaks may be due to two or more peaks that happen to fall in the same place, and the extra height is fortuitous. Many examples of this can be seen in the Pattersons for the heavy atoms of C. vinosum cytochrome c', illustrated in Chapter 5. A noncrystallographic 2-fold causes many sites to be related, and the vectors fall in clumps. Also, it is possible that the cross-peak X is not between A and another atom, but between two other atoms. If you cannot solve a Patterson, put it aside and look for a derivative with one that is solvable. Later you can easily solve the difficult Patterson by cross-Fourier (see the following) with the solved derivative phases, so all is not lost. Computer Methods Several programs have been written that try to solve Patterson maps automatically. I know of three, HASSP, written by Tom Terwilliger, SHELXS, a commonly used small-molecule crystallography program, and XtalView/
132
COMPUTATIONALTECHNIQUES
xhercules, a correlation search method I wrote. HASSP and SHELXS first look for single-site solutions that explain the Harker vectors and then look at cross-peaks and try to find pairs of positions with the best match to the density in the Patterson map. This is very similar to the classical method. One problem I have seen is that, since the programs do not always account adequately for the fact that a peak has already been used, overlapping solutions can be found. Both programs work well on clean Patterson maps. The XtalView program xhercules uses a different approach to automatic Patterson solution that works well but is very computer-intensive. A single atom is moved around the entire asymmetric unit on a grid, and at each position a correlation is calculated between the observed differences and the calculated heavy-atom amplitudes. (The use of a correlation function, rather than an R-factor, is important because the scale factor cannot be computed correctly and the correlation is independent of scale.) This atom is then placed at the position with the highest correlation (Fig. 3.12). A second atom is then moved about the asymmetric unit and the correlation calculated with the first atom held fixed. This atom is then fixed at its highest correlation. The relative occupancies are then refined by another correlation search. A
.O(X) ,
~1
a
.,
J
&
~r
I
"
0.500 Z = O.(X~)
FIG. 3.12 Xherculescorrelation map section of a single-site correlation search for a platinum derivative of photoactive yellow protein. The space group is P6 ~--hexagonal with 6~ screw axis along the z direction. The single site is found at 0.25, 0.08, 0.0 (z is arbitrary because there is no orthogonal symmetryelement). The second peak is related to the first by Patterson symmetry.
3.4 The PattersonSynthesis
133
third atom can then be searched for in the same manner, and so forth. Each correlation search takes a large amount of computer time--several hours on a typical workstationmwhich gets longer as more atoms are added. An intelligent choice of the asymmetric unit helps reduce time. Remember that the asymmetric unit is one-half the size in reciprocal space. The search grid needs to be at least one-fourth of the minimum resolution and preferably one-sixth. The resolution cutoff should be as low as 6 A for large unit cells and as high as 4 A for small unit cells. There must be a large ratio of differences to atom parameters, (x, y, z), because the differences are only an approximation of the heavy-atom vectors. For most proteins at the resolutions suggested, this can be kept at about 50 to 1. Although the method can be used automatically, it is far better to check the results against the Patterson map. This can be done by writing out the solution, reading it into xpatpred, and displaying the output of predicted vectors on the Patterson map with xcontur. For best results, an idea of the relative occupancy of unfound sites is needed, although tests have shown that it is not critical. The relative occupancy can easily be estimated by inspection of the remaining density in the Patterson map. How successful is the method? It seems very robust. A six-site solution to a complex Patterson with many overlapping vectors was correctly found for a PtC14 derivative of C. vinosum cytochrome c' in P212121 (three Harker planes at x -- 0.5, y - 0.5, z - 0.5). The solution was found before any other derivative and was later confirmed independently by cross-phasing from another derivative whose Patterson was solvable by inspection. The platinum derivative evaded a manual solution because the largest peaks on the Harker sections turned out to be due to a mixture of cross-peaks and Harker peaks. Another successful approach to heavy-atom position determination entails direct methods. The programs used by small-molecule crystallographers, MOLTAN and SHELXS, have served this purpose, using the derivative differences as input.
Anomalous Difference Pattersons If there are atoms in the structure that have an anomalous scattering component (i.e., have an absorption edge near the wavelength being used to collect data), then the differences between the Bijvoet pairs may be large enough for phasing. For proteins, the most likely naturally occurring anomalous scatterer is iron, and many of the heavy-atom compounds used, such as mercury, platinum, uranium, and the lanthanides, have significant anomalous scattering signals. Centric reflections do not have an anomalous component because the signal exactly cancels out in a centric projection. If you
134
COMPUTATIONAL TECHNIQUES
have collected symmetry mates of centric reflections, they can be used to estimate the anomalous signal by comparing them to the acentric reflections. The difference between the centrics is the "noise," and the "signal" can be found using the formula (noise) 2 + (signal) 2 = (differences) 2. In practical terms this formula tells us that the signal will be larger than simply the difference between the centrics and the acentrics. The anomalous signal, in fact, can be detected even in noisy data because the pairs are usually collected from the same crystal near in time, which eliminates many of the scaling problems. Thus, random noise is a less serious problem than systematic errors, such as those due to absorption and X-ray damage. The expected anomalous difference (F + - F - ) / ( F ) can be estimated using the same equation as for the isomorphous case (see preceding) except that fH is replaced by 2f". Some expected anomalous differences are listed in Table 3.3 for atoms that give usable signals for CuK,, radiation. If you can tune the wavelength as at a synchrotron, the signal can be optimized, and edges for more elements become available. Before an anomalous difference Patterson can be calculated, the data should be nearly complete to the resolution you wish to use. This means
TABLE3.3 Anomalous Scattering Signals for CuK,, (1.54A) Radiation Percentage F ~ - F
Percentage A F/F 1 0 - kDa
32 - kDa
1 0 0 - kDa
Electrons
( 3 0 - kDa
protein)
Af"
protein
protein
protein
S
16
5.1
0.6
0.46
0.26
0.14
8.3
Element
Fe
26
3.2
2.46
1.38
0.77
Pd
46
15
3.9
3
1.68
0.94
Ag
47
15
4.3
3.31
1.85
1.03
I
53
17
6.8
5.24
2.92
1.63
Sm
62
20
12.3
9.47
5.29
2.95
Gd
64
20
11.9
9.16
5.12
2.86
Pt
78
25
6.9
5.31
2.97
1.66
Au
79
25
7.3
5.62
3.14
1.75
Hg
80
25
7.7
5.93
3.31
1.85
Pb
82
26
8.5
6.54
3.65
2.04
U
92
29
13.4
10.32
5.76
3.22
3.4 The Patterson Synthesis
135
complete in terms of Bijvoet pairs, which means collecting twice as many data in the correct positions. An anomalous difference Patterson (also known as a Bijvoet Patterson) is made using the differences between the acentric reflections as the Patterson coefficients (Fig. 3.13). Care must be taken not to use the centrics. Since in XtalView the difference between centrics will be 0, even if the centrics are left in, they will make no contribution to the differences. Interpret an anomalous difference Patterson exactly as you would an isomorphous difference Patterson. Because the centrics are left out, however, even a perfect set of anomalous differences will give series termination errors that can lead to small peaks not due to scatterers. These can be detected by making a calculated Patterson using the same reflections list as the observed Patterson and coefficients calculated from the heavy-atom positions (with XtalView you can use STFACT for this purpose). If a peak appears that is not in the atom list used for the calculation, it must be a series termination error. In fact, the lower the resolution, the higher the percentage of centric reflections, so very-low-resolution anomalous difference Pattersons have more noise due to series termination. One use of anomalous differences in heavy-atom work is to compare them to the isomorphous difference Patterson. An anomalous scatterer represents independent measures of the heavy-atom positions and, as such, comparison of the two Pattersons gives extra confidence in determining the heavy-atom positions. A peak on both Pattersons is more likely to be correct. It also possible to make a Patterson by combining the information from both sets of differences. Before doing this, it is worthwhile first to check that the anomalous Patterson actually has some signal. Adding noise to the isomorphous Patterson will not make it more interpretable. There are two methods used to combine the signals. The simplest is to make a Patterson with the coefficients AF{so + AFa2no . The other uses FHLE coefficients, 6 which gives slightly higher peaks than the first. In either case, the improvement of the Patterson is better judged by noting the number of contours a peak of interest is above the root-mean-square density (or sigma) of the Patterson map. If the main peaks do not increase, it is possible that the anomalous signal is too small to be of use. If the anomalous data are too incomplete to give a Patterson map alone, it is still possible to use combined coefficients to augment the isomorphous data. Again, use the criterion of peak height to judge the effectiveness. Another criterion is the flatness of areas without peaks. They should become cleaner if the Patterson has been improved. 6For a discussion and derivatization of pp. 338-340.
FHLE
coefficients, see Blundell and Johnson,
COMPUTATIONALTECHNIQUES
136 A 0.000 ~.ooq
,
,
,
Y.
.
.
p.5oc
B 0.0o09.009
.
,
,
X,
Q.50C
Z
0.500
0.50( X=0.5~
Y=0.5~
G 0.000
3.oo9 , ~ x ,
. 9.~
0.50(3 z = 0.500 FIG. 3.13 Sulfite reductase Bijw~et difference Patterson. The anomalous scattering of sulfite reductase is due to the presence of an Fe4S4 cluster and a heine iron. At low resolution the individual scatterers are not resolvable and form a single large site. The space group is P2 ~2121 and the coefficients are (F~I - F , )2. (A)m(C) Harker sections x = 0.5, y = 0.5, and z = 0.5. Note that the peak on the section x = 0.5 overlaps onto the other Harker sections. At higher resolution the peaks are resolvable from the edges. (D) A three-dimensional stereo view of the Patterson, showing that there are just three peaks, not including the origin peak at O, 0, O.
3.5 FourierTechniques D o.oooP.OqO , , . x. ,
:).OqO
137
.
,
, x,
.
O,S(X:
A
0.50( Y=O.O-0.5
FIG. 3.13 (continued).
. . . . . 3.5 . . . . . FOURIER TECHNIQUES It is a happy day in any structure determination when Fourier maps, which are very straightforward to interpret, can be used instead of Pattersons. On the other hand, a Fourier requires phases, and the quality of the phases very much determines the quality of the resulting map. For Pattersons, the quality is dependent only upon the accuracy of the amplitudes used. In fact, it has been shown that while a Fourier map made with random amplitudes but correct phases is easily interpreted (Fig. 3.14), the opposite case, correct amplitudes and random phases, is not (which is the root of all this phasing trouble). This does not mean that the amplitudes can be ignored when there are phases. In crystallography we work in the gray area between the two extremes of correct and random phases. In this case, correct coefficients (e.g., 2Fo - Fc) do make an important difference in the quality of the resulting map. Also, do not take this to mean that the amplitudes do not need to be accurate. Without accurate amplitudes there is no way to derive accurate phases.
Types of Fourier
Fo Map The classic Fourier synthesis comprises the observed amplitudes with the most current phases Fo, Ceca~c,where Fo is the observed diffraction amplitude. It is not sufficient for all crystallographic needs and there are many
138
COMPUTATIONALTECHNIQUES
other types. In particular, when the phases are calculated, this type of map is subject to model bias.
Fc Map This is the least useful map for crystallographic purposes: the calculated amplitudes are phased with the calculated phases, and you get back exactly what you put in. Still, it can be used for checking what a map should look like, especially at lower resolutions, and to check for series termination problems. For instance, even an F~ map in the resolution range 5 - 3 A can be choppy and hard to interpret because of the missing low-resolution terms. An Fc map can also be used to check if programs are working correctly. If the Fc map does not look like what you put in or lacks the correct symmetry, then there is an error somewhere.
Fo - Fc, or Difference, Map The difference Fourier, F,, - Fc, O~calc, where ~calc is the calculated phase, is very useful in terms of information content but may be hard to interpret. In this map there are peaks where density is not accounted for in the model used to calculate Fc and holes where there is too much density in the model. This map is especially useful for finding corrections to the current model: for example, looking for missing waters, finding movements in mutants, misfittings. Other types of differences map are often used. An isomorphous difference m a p with F p H -- F p , apr,,tci,1, gives the positions of heavy atoms. Another difference map is F ....t~,lt -- Fwila-typc, which can be used for looking at mutant protein structures (if the mutation crystallizes with the same unit cell). Note that a difference Fourier, F .....ta,, I Fwild_typc ' tSgwild_typc' is not the same as the difference between two Fouriers, (Fmutant , c[ ..... tant) -- (Fwild-typc, 15[wild-type), which is equivalent to the difference between two electron density maps. There are three basic patterns of density found in a difference map. A peak indicates electron density in the F,, terms that is not accounted for in the F~ terms. A negative peak indicates a position having less density in the Fo terms than in the F~ terms. These differences can arise from movement of
FIG. 3.14 (A) Fouriers with random phases or amplitudes and (B) the equivalent section of a Fourier synthesis with random phases. The thick lines represent the model that was used to calculate the correct amplitude and phases. Note that the map with random amplitudes is still interpretable, but the random phase map is uninterpretable. This apparently means that phases are more important than amplitudes. However, since phases are not directly measurable and must be determined from the measured amplitudes, accurate amplitudes are necessary to determine the phases accurately.
•
i
,
f
I
140
COMPUTATIONAL TECHNIQUES
atoms, changes in B-values, or a change in occupancy. A third pattern is positive density paired with negative density. This indicates a shift in position from the negative to the positive density. The final position of the atom or group may be difficult to determine from the difference density. Its actual position is somewhat short of the positive peak because the negative hole next to the positive peak distorts its shape. 2Fo - Fc Map The 2Fo - Fc map is the sum of an Fo map plus an Fo - Fc map (Fig. 3.15). It contains information from both the classic Fourier synthesis and a difference map and is easy to interpret because it looks like protein density. The quality of the 2Fo - Fc map depends upon the quality of the phases--a fact that often seems forgotten in the literature, where such maps
FIG. 3.15 Fourier maps: the same section of map is shown using (A) F,, coefficients, (B) F,, - Fc, and (C) 2F,, - Fc. The F,, map does not contain any information that was not already in the model. The difference map shows some unexplained density in the solvent region that is taken to represent disordered solvent molecules. The 2F,, - Fc map is equivalent to adding the two maps together and shows the unexplained solvent density as well as the model density. This property makes it the most popular map type.
f
v"
V
FIG. 3.15
(continued).
142
COMPUTATIONALTECHNIQUES
are often used as "proof" of the correctness of a structure. At R-values above 0.25, they are of dubious value, since they will look remarkably like whatever model was used to calculate the phases. In these cases a better map to use is the omit map, where the model in question is left out of the phase calculation (see below). Another problem with 2Fo - Fc maps is that they do not show the exact final position of the model in positions where there are errors, although they can come quite close. The 2F,, - Fc map shows where the model is and where the model should be. Usually the final position is a farther away than the map shows, but it is usually close enough to bring the model to within the radius of convergence for a refinement program. Variations on the 2Fo - Fc map, such as 3Fo - 2F~ (or 0.5F,, + (Fo - F~)) and 5F,, - 3Fc, are sometimes used and are claimed to alleviate the problem of phase bias. Sigma-A-Weighted Map
2mFo -
DFc
A 2Fo - Fc map weighted by two terms derived from a sigma-A analysis of the data m and D gives the coefficients 2 m F , , - D F c , ce~a~, where m and D are between 0 and 1. In essence the idea is to weight the terms by taking into account the difference between F,, and Ft. The analysis is done in thin shells of resolution, and the shells with a lower agreement, or higher R-factor, will be downweighted. Since, in general, the R-factor rises with resolution, this convention has the effect of downweighting the higher terms with higher errors and removing noise from the maps. It also tends to remove phase bias because phase bias is worse for high R-factor data as noted in the previous section. As the R-factor of the data set reduces to 0.0, the maps become 2Fo - Fc maps at the limit. 2 m F o - DF~ maps are probably the best for general fitting. To a large extent they obviate the need to find the best resolution at which to make maps. If the high-resolution data have an R-factor, it is automatically downweighted. Omit Map In the omit map the portion of the model to be examined is left out of the phase calculation altogether, and the rest of the model is used to phase this portion of the map (Fig. 3.16). This powerful feature of a Fourier is pos-
FIG. 3.16 Omit map. (A) The highlighted and labeled model in the center of the helix was omitted from the phase calculation and then an F,,, ce....., map was made. Note that the density for the omitted atoms comes back because of their presence in the amplitude information. (B) The model with 25% of the residues omitted. The density for the omitted residues, although noisy, still comes back. The maps are at 1.8-A resolution and contoured at I or.
A
144
COMPUTATIONALTECHNIQUES
sible because all parts of the model contribute to every reflection. The chief difficulty in an omit map is using the correct scale factor to scale Fo to Ft. The sum of Fc will be smaller than that of Fo if a portion of the model is omitted. The correct procedure is to use the proper scale before anything has been omitted and keep this scale factor after the model has been omitted. Only about 10% of the model should be left out at any given time, so it is necessary to make many omit maps to examine the entire structure. Even omit maps can have some residual phase bias, as will be explained in the section "Phase Bias."
Figure-of-Merit Weighted Fo Map A figure-of-merit weighted F,, map is used for MIR and to compensate for the error in the MIR phases. The coefficients are F,, times the figure of merit. This map can be thought of as having each reflection weighted by the confidence in its correctness.
Fast Fourier Transform The fast Fourier transform (FFT) is what its name implies: a quick Fourier synthesis algorithm. The FFT works by dividing the unit cell along the three principal directions by integer multiples, n x , ny, nz. Depending upon the particular FFT used, these must be multiples of small prime numbers. The FFT used in XtalView can use multiples of 2, 3, 5, and 7. This gives a wide range of possible integer values. In addition, the grid cannot be too coarse. The coarser the grid, the faster the FFT can be calculated and, also, the faster the resulting map can be displayed. The grid must be at least twice the maximum h, k, I values in the input structure factors. This can be calculated from the formula n x = 2(a/d,,,~,,), where a is the cell edge and d,nin is the resolution limit of the input structure factors, both in angstroms. If the grid is less than this, the FFT will return incorrect values because the cell will be undersampled. At a sampling of two times, a map will be coarse, and a better sampling is 3(a/d,,,m), which gives a smoother map that is easier to interpret. In cases where one cell edge is substantially longer than the other two, this edge can be sampled at two times and the others at three times. Such a map is hard to distinguish from one that is sampled finely in all three directions. Sampling on grids finer than three times will be slower and is usually not necessary. If no interactive inspection of the map is contemplated, sometimes oversampling is used to make smoother contour lines for displaying a static picture.
3.6 IsomorphousReplacementPhasing
145
Solving Heavy Atoms with Fouriers Once even a single derivative has been solved, the single isomorphous replacement (SIR) phases are usually of sufficient quality to solve the rest of the derivatives by difference Fourier using the coefficients FpH - Fp, a s~R. With XtalView, a derivative difference Fourier, also called a cross-Fourier, is made by the following steps. First, xmerge is use to merge the derivative data with the native data as previously described for difference Pattersons, to produce a file with the derivative scaled to the native. This file is then phased using xmergephs to merge the .fin file coefficients with the best available protein phases. The option to switch Fp and FpH should be checked so that when the Fourier is made the peaks are positive. This new phase file with h, k, l, FpH, Fp, ~protein. is then run through xfft with the Fo - Fc option, which in this case will make a FpH -- Fp map. Find the biggest peaks on the map using xcontur. The resulting coordinates are the correct coordinates on the same origin as the derivative(s) used to calculate the phases. The peaks obtained this way should be checked against the Patterson map of the derivative to be sure that the heavy-atoms positions are valid. The most common spurious peaks are ghost peaks, those present at the position of the heavy-atom model for the derivative used to produce the phases. Less obvious is the possibility of ghost peaks at the opposite hand position. This can happen if the solution is on a special position, if it is centrosymmetric, or if it is pseudocentrosymmetric. Particularly confusing is a centrosymmetric heavy-atom solution that gives rise to centric phases in an otherwise acentric space group. 7 Maps made with these phases will contain both left-handed and right-handed solutions to the new derivative superimposed. The map may still be useful if this can be resolved. An easy way to check this problem is to look at the distribution of phases for acentric reflections. If they fall on the cardinal points, 0, 90, 180, and 270, then the phases are centric; if they fall near but not exactly on them, the phases are pseudocentric.
. . . . . 3.6 . . . . . ISOMORPHOUS REPLACEMENT PHASING Heavy-At0m Refinement When you have solved one or more derivative data sets for heavy-atom positions, your next step is to refine the positions and search for minor sites and/or missing sites. The goal behind refinement is twofold: to improve the heavy-atom parameters and to get statistics that give information about the 7A single site in a polar space group is centrosymmetric, as is a multisite solution where all the coordinates in the polar direction are the same.
146
COMPUTATIONALTECHNIQUES
quality of the derivative and the highest resolution at which it can be used. Some guidelines about statistics will be given, but it must be remembered that they are only guidelines and that two heavy-atom derivatives with identical statistics can produce maps that differ in quality. To calculate the most accurate phases, heavy-atom parameters are refined to improve the parameters and to get the best estimates of the errors. An accurate estimate of the errors is essential for good multiple isomorphous replacement phasing, since the errors control the figures of merit and the relative weighting of each reflection. In XtalView, the heavy-atom refinement is done using xheavy (Fig. 3.17). Xheavy departs from the more traditional refinement programs by using a correlation search instead of least-squares refinement. The advantage is that it avoids local minima; the disadvantage is that it takes more computer time. With the faster computers available today, this is not a significant problem. The phases are calculated in two steps. First, each derivative is treated separately, and estimates of the errors and phases are made without protein phasing information. Second, these phases are combined and a better estimate is made using these phases to produce a more accurate set of phases. The correlation search in xheavy is done by moving the atom in a coarse box over a large range and then in progressively finer boxes and over a smaller range until no improvement is found. At each point the correlation is made between the observed differences and the calculated difference. The correlation used is EA~
(3.14)
X/'EA 2o X E A 2' (
where A is the observed heavy-atom difference and A is the calculated difference. This function has the advantage of being immune to the scale between the calculated and observed differences, a Each atom is moved and then the occupancy and B-value are refined by a second correlation search.
Is0m0rphous Phasing 9 Because of errors in the data and errors in the solutions and because true isomorphism is probably rare, isomorphous phasing must be done in 8The scale factor cancels out. Say we have two quantities FI and F2 related by the scale factor s such that EFI = sS~F2, then the correlation is
s
* sF2
sY, FIF2
~v/EFI 2 E(sF2)2 sX/EFI 2EF22 and the scale factor s cancels out. 9Be sure to read Watenpaugh, K. D. (1985). "Overview of Isomorphous Replacement Phasing," in Methods in Enzymology, Vol. 115, pp. 3-15. Academic Press, San Diego. Also read Belle, J. D., and Rossman, M. G. (1998). A general phasing algorithm for multiple MAD and MIR data. Acta Crystalogr. D, 54, 159-174.
3.6 Isom0rphousReplacementPhasing
147
terms of probabilities. What we know are the amplitudes ]Fp] and ]FpH], and the vector FH, which is calculated from the heavy-atom model. What we want to find out is the best phase of the vector Fp, given the errors in the data. The chief difficulty in this is estimating the errors in the data correctly. Blow and Crick assumed that all of the error lies in the magnitude of Fpu, from which follows the equation 10 P ( c e ) - exp
2E 2
),
(3.15)
where E is the estimate of the error and E(~) is the lack of closure error at a given value of ~, the phase angle given by the expression if(CO)--]FpH[
--
]Fp +
F.],
(3.16)
which is the difference at a given phase angle between the the measured ]FpH[ and the amplitude of the sum of the vectors FH and Fp. The error estimate E is given by the equation (E 2) = ((]FpH ----- F p ] -
]FH])),
(3.17)
which is the difference between the calculated heavy-atom vector and the observed heavy-atom vector. If the reflection is centric, then it is possible to estimate the error by simply assuming that the observed difference FpH -- Fp is the observed heavy-atom amplitude. For the acentric case, the difference FpH -- Fp will be, on average, smaller than FpH -- Fp by 1/V~ (see above). Thus, the E can be estimated by E ~
E (IFPH centrics
FpI-
IF-l) ~ §
E (LFH- F IV - IFl)2-
acentrics
(3.18) In the general acentric case, there will be two maxima in Eq. 3.15 that are equally probable. The best phase is the weighted average of the two. In the centric case, the equations give one peak. To resolve the twofold ambiguity of acentric reflections and to overcome the errors, several derivatives are used. For multiple isomorphous replacement, we repeat this process for each derivative in turn and multiply the phase probabilities. To simplify this process, we can use the Hendrickson-Lattman coefficients A, B, C, D to store each phase probability: 1~
D. M., and Crick, F. H. C. (1959). Acta Crystallogr. 12, 794-802.
148
COMPUTATIONALTECHNIQUES P(c~) = exp(A c o s ( a ) + B sin(a) + C c o s ( 2 a ) + D sin(2c~))
(3.19)
This equation allows us to reconstruct the probability from the four coefficients and is a more compact form to store. To multiply two probabilities together, the coefficients are simply added i.e., A M~R = A DER~ + A DER2 + A DER3+ " " ")- Equivalent equations for the Blow-Crick equations above have been derived that give nearly identical results. ~ The equations for isomorphous replacement phasing are - 2 ( F 2 + F~ E2
F2H)FraH
A is,, =
B is,, =
-2(F~ + F 2 E2
F(,H)FpbH
- F2(a(~ C is,, =
(3.20)
b{~)
E 2
-2F~a.b. W i~,, =
E 2
where aH = COS(all) and bH = sin(a~) with a~ as the calculated heavy-atom phase. The best phase can then be found from the following procedure. The phase probability is calculated from the combined A M~R, B M~R, CM~R, D MIR for every 15 ~ using Eq. 3.19 to generate the phase probability distribution. The phase is then calculated by integrating the probability distribution: ~,~ X ~
P(a)cos(a) 9
~
Y
E02~ P ( c e ) s i n ( c e ) Z,,
ah~t = t a n - ' ( y / x ) ,
(3.21)
and the figure of merit is given by m = V'x 2 + y2.
(3.22)
The figure of merit, m , is the probability of Ct'bcst being correct. It ranges from a value of 0.0, where all phases are equally probable, to 1.0, where the phase is correct. The average figure of merit over all reflections gives us an estimate of the accuracy of the protein phase set. Xheavy makes two passes through the phasing process in order to get a better estimate of E for each derivative. The initial set of protein phases 11Hendrickson, W. A., and Lattman, E. E. (1970). Acta Crystallogr. B26, 136.
3.6 Is0morphousReplacementPhasing
149
uses centric data, if available, to calculate the initial estimate of E, the error, which is calculated as a function of the size of Fp by fitting a curve to the data. The initial Hendrickson-Lattman coefficients A, B, C, D are calculated with this estimate. If there is more than one derivative, the coefficients of each derivative are summed and the protein phase and figure of merit are calculated. In the next pass, all the data are used to calculate E, using the initial estimate of the protein. The new coefficients with the updated E-values are again summed together and a final protein phase and figure of merit are calculated. When calculating a map with isomorphous replacement data, it is important to calculate a map weighted by the figure-of-merit for each reflection. This is done by choosing the Fo* fom option in xfft. The effect of using figureof-merit weighting is shown in Fig. 3.18. The average figure of merit for the entire map gives an idea of the quality of the map. Figures of merit from different programs are not directly comparable unless they use comparable methods of calculating E. The error estimate is used in the denominator of the phasing equation and directly affects the figures of merit calculated. Also, phase sets run through solvent-flattening procedures or density modification will have higher figures of merit, regardless of whether they are really improved. Nonetheless, there are some rules of thumb for figure-of-merit values and the corresponding map quality. If the figure of merit is less than 0.5- to 3.0-A resolution, the map will be noisy and very difficult to interpret. Around 0.6 A the map starts to be interpretable, and above 0.75 A, the map is almost certainly interpretable. A map that is above 0.8 A, without any modifications after the isomorphous replacement phase calculation, will be of excellent quality and a joy to interpret. In my experience, the increase in figure of merit that occurs with solvent flattening is of little use in judging the final map quality. It may be used as a relative number to judge the effect of using different solvent-flattening parameters. In the end, the best way to judge the quality of the phase set is actually to examine the map (Fig. 3.19). To search for more heavy-atom sites, try to use a difference map or a residual map. In the difference map the coefficients used are FpH -- Fp, O/protein, which produces a map that shows the positions of all of the heavy atoms (see the previous discussion on difference Fouriers). Any peaks on this map that are not already in the heavy-atom solution can be added and refined. Since ghost peaks from other derivatives included in the phasing may show up, especially if the figure of merit is low, be careful not to add peaks that are in other derivatives used to calculate the phases. Whether the site is truly in common or just a ghost peak can be verified by examining the Patterson map. Add peaks in the order of size until the relative occupancy falls below 0.10.25 times the highest site. Adding too many minor sites may only model
A XHeavy
(iquit)
Crystal: cvccp Unit Cell: 49.20 56,70 98.80 90.00 90.00 90.00
Directorv: /tin p_ m nt/as d/prog/XtalV iew/exam p les Derivative File: ccp,aul,sol
Load Derivative v )
~Save Derivative v )
Output Phases: ccp.aul,phsA Derivatives"
aul
(Delete) Method: I ~
Calculate Protein Phases (Apply)
(Abort)
- - - Second Pass to get final phases - - >>>RePhase DERIVATIVE aul : Scaling coefficients 0.000000 0.000282 0.999464 Scale on FPH = 1.001572 from 1244 reflections E coefficients- 0.078110 1.292850 7.843384 from 1244 reflections in 50 bins Mean figure of merit = 0.650 for 319 centric reflections Hean figure of merit - 0.365 for 925 acentric reflections Hean figure of merit - 0.438 for 1244 reflections >>>Protein Phases" Mean figure of merit 0.650 for 319 centric protein phases Mean figure of merit 0.365 for 925 acentric protein phases Hean figure of merit 0.438 for 1244 protein phases determined from 1244 derivative phases
FIG. 3.17 Xheavy program. (A) Xheavy is used to refine heavy-atom derivatives and to calculate isomorphous replacement phases. A derivative, or solution, file contains the information for one or more derivatives. This information can be edited using the Derivative Edit window (B). Xheavy refines heavy-atom positions by maximizing the correlation of F,, and IF,. - FpHI by moving each atom in turn until no further improvement can be found. The relative occupancies are then refined using the same correlation. To calculate protein phases, the single isomorphous replacement phases are first calculated for each derivative in turn. These are then combined to give an initial estimate of the protein phase. This protein phase is then used to get a better estimate of the SIR phases, and finally the new SIR phases are combined to give the final protein phases. The two-pass protein phasing method was adapted from PHASIT, a program written by William Furey of the University of Pittsburgh.
151
3.6 IsomorphousReplacementPhasing B
r~9
xHeavy Derivative Edit Derivative: aul, DataFile: ccpniaul.fin File Type: ~ Phase Type: ~
fin Isomorphous
Weight: 1O0
1
[3 100
Resolution: 1000,00
to" 5,00
SigmaCut: 3 Delta/Average Filter: 100 Sites:
] AU1 AU2 AU3 AU4 AU5
-0,6184 -0,3630 -0.8749 -0.1140 -0.6392
-0.1040 -0,380t -0.3513 -0,0984 -0,4134
X: -O.G1842
Label: AU1 Atom: AU+3
-0,1471 -0.1281 -0.0912 -0,0954 -0.0088
percent
I~
H
AU+3 0,0115 AU+3 0.0181 AU+3 0.0020 AU+3 0.0024 AU+3 0.0019
V: -0,14710
Occupancy: 0.01150
(Insert) (Apply)
C Replace)
17,0 I 17,0 17.1 17.1 17,1
Z: -0.10400 B: 17,01
(Delete)
(Reset)
FIG. 3.17 (continued).
errors in the data and make the phases worse, not better. A residual map is similar except that the coefficients are (FpH - Fp) - (Fpnca~c -- Fp) ~protein, which results in the peaks accounted for in the solution to be removed from the map so that only new positions are shown. These can be treated in the same manner as for the difference map. As more derivatives are found and added, the resulting protein phases should improve (Fig. 3.20). With the improved protein phases, it is worthwhile repeating the difference Fouriers for all the derivatives to look for minor sites. Always make sure there is something in the Patterson map to justify the addition of the new site (especially check the cross-peaks). A site may not
A
f,
f,
3.6 IsomorphousReplacementPhasing Endo[][
Map Nal
* Thll
x-
o.oooo
1
hand2
5.0A
Sep
153 26,1990
-o. l ooo
I
FIG. 3.19 Low-resolution MIR map: 10-A slab of a typical low-resolution MIR map at 5 A. The clean solvent boundaries (dashed lines) indicate that this solution is on the right track.
give all the peaks on the Patterson, but if there is only one or n o n e , the site is not justified. If y o u r h e a v y - a t o m refinement p r o g r a m uses the lack-of-closure error refinement, be w a r y of m i n o r sites in one derivative that have the same position as a strong site in a n o t h e r derivative. This site is likely to refine to a g o o d o c c u p a n c y w h e t h e r it is real or not. Thus, lack-of-closure refinement is rarely d o n e a n y m o r e .
Heavy-Atom Phasing Statistics If y o u look at only one n u m b e r in the p h a s i n g statistics, l o o k at the phasing p o w e r or (FH)/(E), the size of the h e a v y - a t o m a m p l i t u d e s over the
FIG. 3.18 Effectof figure-of-merit weighting: MIR phased maps with no figure-of-merit weighting (A) and with figure-of-merit weighting (B). The weighted map is somewhat cleaner and easier to interpret, making it the map of choice for MIR phasing.
154
COMPUTATIONALTECHNIQUES
error. This statistic takes into account the two factors that determine heavyatom phasing power: the size of the differences and the size of the errors. This number should be above 1.0 for the derivative to contribute any to the phasing. Phasing powers above 2.0 are good, and the very best derivatives can sometimes reach 4.0. Since the sizes of the differences are fixed for a given data set to increase the phasing power, you must lower the errors by improving the solution or collecting better data. Otherwise, phasing power may be increased by longer soaking times and/or higher concentrations for a given derivative. More careful data collection and attention to scaling and merging may also lead to lowered systematic errors. As resolution increases, the phasing power falls off, and this can be used as a guide of the m a x i m u m resolution at which the derivative can be used. FH decreases with resolution because the scattering factors for the heavy atoms fall off with resolution. The errors increase with poor measurements, scaling errors, and nonisomorphism. Often a derivative is nonisomorphous above a certain resolution but can be used for phasing below this. The program SHARP is particularly good at handling nonisomorphism. Another useful number is the centric R-value, for which the rule of thumb goes: above 0.70, the solution is wrong; for 0 . 6 0 - 0 . 6 9 , the derivative may be useful, but look for improvements to the solution; for 0 . 5 0 - 0 . 5 9 , the solution is definitely useful; below 0.5, is an excellent derivative. You should find phasing power and the centric R correlated. There are lots of other possible phasing statistics, but the two that are mentioned have been the most used over the years.
Including Anomalous Scattering There are two ways anomalous scattering can be included in the phasing. One is as an adjunct to the isomorphous phasing by measuring FJH and Ffi~. Many heavy-atom compounds have a useful anomalous signal (Table 3.3), and provisions for including the anomalous signal are included in most isomorphous phasing programs. The equations for this have been worked out in terms of A ......, B ......, C ....., D ......, and these terms are simply
FIG. 3.20 Addition of heavy-atom derivatives. The same section of MIR map is shown, using one, two, three, and four derivatives to determine the phasing. (A) Notice that the single-derivative map is largely uninterpretable, with breaks in the main chain, and an effective resolution that is quite low. (B) As a second derivative is added, things improve dramatically. (C) With the third derivative, the isoleucine side chain on the left is becoming visible. (D) The fourth derivative has little effect, and further addition of derivatives of this quality will probably not improve the map much. Compare these maps with improved versions shown later (Fig. 3.29).
I ..~
~" / B
0 C
O \
1
)
P
P
./ FIG. 3.20 (continued).
.S
I
3.6 Is0morphousReplacementPhasing
157
summed, as for an isomorphous derivative. Also, inclusion of the anomalous scattering can improve Patterson maps. Since the anomalous portion of the heavy-atom structure factor is at right angles to the real part, the Bijvoet difference F~H -- Fb-n will be largest when Fp - Fp. is smallest, so that the two are complementary in their phasing power. One major difference holds when you are using the anomalous scattering. The hand of the heavy-atom positions must be correct. Since there is no way to determine this a priori, both hands must be tried. In one direction the figure of merit should be slightly higher than in the other. The map can also be checked for clues from c~-helices and B-turns (see the following), both of which are handed. Another strategy for determining the correct hand is to cross-phase a second derivative with SIRAS (single isomorphous replacement with anomalous scattering) phases in both possibilities. In one map the peak height of the derivative should be higher than in the other map. All lines of evidence should point to the same answer. If there is a conflict (e.g., the map has left-handed helices even though the figure of merit is higher), be sure that F~H and Fp-H have not become switched somewhere. Switching can occur when you use a lefthanded description of the machine you collected data on (i.e., assigning the rotation angle to the opposite direction) or transform the indices that switched the handedness without also switching F~-Hand F~-H, such as h = - h. The other method is to use a native anomalous scatterer, that is, a scatterer in the native crystal such as an iron cofactor in a heme or iron sulfide, cluster. There are some extra difficulties in this case. There is no anomalous signal in the centric data, so these reflections must be left out of the syntheses. As can be seen in Table 3.1, centric reflections can represent a sizable fraction of the data in the low-resolution ranges. The centric isomorphous differences do not suffer from the approximations of the acentric ones, and their presence in Patterson maps, heavy-atom refinement, and error estimation in phasingmake a large contribution to the robustness of these methods. To get around this, use native anomalous Pattersons and refinements with the top 30%, or so, of the data (excluding outliers). The theory behind this is that the larger differences are more likely to appear when FA is colinear with the protein phase and, thus, these differences are better approximations of FA. In practice, the author has found in at least two cases that the Patterson maps were little different whether all or 30% of the reflections were included. Since the terms are squared in a Patterson, the smaller differences do not add much to the map and their absence makes little difference. If a derivative exists, a Bijvoet difference Fourier can be used to find the native anomalous scatterer (Fig. 3.21). This is the same as in the derivative case except that 90 ~ is subtracted from the phase. If you are using xmergephs in XtalView, there is an option to do this built into the program. Refinement can be done using
158
COMPUTATIONALTECHNIQUES B
A 0.00C 3.500.
,
X .
.
.
.
.
1.00(
0.000.0.500.
i
l
I
X.
/
;
~.
,s~
,
,
I 1.00~
c3
/~ |
,
i
/.;p
0
I,," o,~176 ,,
0
/'
~x
.'5~--~'~
\.2 P
/*"'N t'
",, ;.~176
\
~,~176
j 9;
C?
p
r
"~,
;-,,.
0.50(
0.50( Y = 0.000
Y = 0.000
FIG. 3.21 Bijvoet difference Fourier. Sulfite reductase has a large, anomalous signal at 1.54-A wavelength because of the presence of an Fe4S4 cluster and a siroheme iron. A Bijvoet difference Fourier using MIR phases from two derivatives is shown using data in left- and righthandedness. Solid contours are positive density, and dashed contours are negative. (A) The correct right-handed case: a large peak is found at the site of the Fe4S4 cluster. (B) The incorrect left-handed case, showing a large hole.
conventional heavy-atom refinement programs, since F,! is proportional to FA when the reflections with the 2 5 - 3 0 % largest differences are used. This approximation will be valid enough for refining the x, y, and z. A full refinement can be done with the computer program ANOLSQ written by Wayne Hendrickson (Columbia University), SHARP or xheavy. If the protein contains a cluster of known geometry such as a n Fe4S4 cluster, then a rigid-body refinement of a standard cluster'2 will give improved accuracy by reducing the number of refinable parameters. Finally, the native anomalous scatterers can be included in the phasing process. While native anomalous scattering has been sufficient to solve a protein without using multiple-wavelength methods in only a few cases (see the following), such scatterers can contribute substantially to an isomorphous solution. Hendrickson-Lattman coefficients can be derived for the native anomalous case: 12An example of this can be found in McRee, D. E., Richardson, D. C., Richardson, J. S., and Siegel, L. M. (1986).J. Biol. Chem. 261, 10,277-10,281.
3.6 IsomorphousReplacementPhasing
159
2(F~- - Fp )a A A ano
E2 2(F~ - Fe-)bA E2
B ano
(3.23)
a2 _ b2 C ano
--
D ano
--
E 2
2abA E 2
where aA = FA COS(teA) and bA = FA sin(teA) where Fa, tea is the calculated anomalous scatterer structure factor. Again, the model must be in the correct hand for the phases to be correct. This may be judged by examining maps in both hands and by looking at the figure of merit in both possibilities.
Choosing the Absolute Heavy-Atom Configuration The heavy-atom configuration that you solve, by any method, is ambiguous. Two different configurations will both equally account for the Patterson maps and refine equally well. These configurations are often called "hands." Since there is no way to a priori know the correct configuration, the one that is actually in the crystal, both possibilities must be tried. One configuration will produce a correct map and the other an incorrect map. You can switch between the two configurations by inverting the heavy-atom solution through the origin. To do this, take each coordinate (x, y, z) and replace it with its negative ( - x , - y , -z). This will work in all space groups; there are more possibilities for most groups, but we will not cover them all. Rather, we discuss four of the possibilities for choosing absolute heavy-atom configuration.
MIR without Anomalous In the case of MIR without anomalous scattering, the two possible configurations are indistinguishable except that one will produce a right-handed map and the other will be the left-handed mirror image. If there are helices in the map, it is easy to tell which is the correct map by looking at the handedness of the helical screw axis. Put the helix vertical on the screen. Curl your fingers in the direction that the helix rises, if the thumb points up this is a right-handed, correct, helix. If the thumb points down, this is a left-handed, or incorrect, helix. If you have only/3 sheet, you can tell by carefully examining the configuration of the amino acids. If the map is left-handed, all the amino acids will be D- instead of the L-isomer. You can use the C O R N
160
COMPUTATIONALTECHNIQUES
method to make this determination. Pick a clear amino acid and turn the density such that the side chain (R) is at the top, the C - - O is at the left, and the N is at the right (and thus spells CORN). If the hydrogen on the C~ atom points toward you, the amino acid is the L-isomer; if it points away, it is the D-isomer.
MIR with Anomalous Including Multiwavelength Anomalous Dispersion (MAD) Unlike isomorphous data, where the wrong configuration is identical except for hand, with anomalous data one configuration produces correct phases, and the other is incorrect. If anomalous data are included, the two maps can be told apart by inspection. The correct map should have proteinlike features and clear solvent channels. The incorrect maps will have noisy density and few if any clear solvent regions. It is also possible to tell by looking at the electron density histograms. The incorrect map will have a slightly broader histogram around 0 and consequently less area in the large positive rho region. (Look ahead to Fig. 3.38 in Section 3.12, to see the effect of adding error to the phases.) If both maps look good, it probably means that the anomalous data are weak and the isomorphous phases are dominating. In this case you can use criteria for the previous case, MIR without anomalous, and just look for right-handed helices to pick the correct map. If the one of the maps is clearly correct and the other is nonsense, but the correct map has left-handed helices, then the Bijvoet pairs got flipped somewhere along the way [i.e., F ~ is really F and vice versa]. You can either correct all the data by reversing the Bijvoet pairs, and recalculate the phases with inverted heavy-atom positions, or flip the phases by negating them all (i.e., flip them about the real axis).
SIR with Anomalous in a Polar Space Group with a Single Site There is a special case for polar space groups only, such as P2~ or P61 if you have only a single heavy-atom site. In this case, the ambiguity of the phases can be broken by including the anomalous data. These phases can then be used to cross-phase your other derivatives and the resulting MIR phases will be in the correct configuration. If the map is left-handed, see the explanation for the previous case; the Bijvoets must be flipped.
Enantiomorphic SpaceGroups If you are in a space group that has an enantiomorph, such as P41, which has as its enantiomorph P43, then you also have an ambiguity in the
3.6 IsomorphousReplacementPhasing
161
correct choice of enantiomorph. In protein crystallography, the enantiomorphs occur when you have a 41, 61, or 62 screw axis, in which case the enantiomorph will be the space group with a 43, 65, o r 64 screw axis, respectively. You must try both space groups and look at the maps. One of the space groups will be correct and the other incorrect. If you also have anomalous data, you must try both absolute heavy-atom configurations as well as both space groups, which gives four possibilities that all must be tried. If none of these produce usable maps and you think you have enough phasing power, you may have one of the following problems: 9 Incorrect space group (look for other related space groups) 9 Incorrect heavy-atom solution(s) (check against the Pattersons and cross-phase) 9 Derivatives on different origins (recheck the cross-Fouriers) 9 Poor or ambiguous derivatives (remove them) 9 Twinning
Fine-Tuning of Derivatives Ideally, the computer should be able to refine all the derivatives to the best possible values and produce the best possible map. In reality, "fiddling" with the derivative parameters often leads to a better map. If you have a multiple-derivative solution, the first step is to see whether removing a poor derivative improves the map. To tell whether the map is better or worse, it helps if you can find a recognizable feature. A helix is usually the best for this, since the geometry of helices is tightly constrained. Another feature to look at is the solvent. As the phases improve, the contrast between the protein and solvent should improve and the solvent should become flatter. Once you have decided on a section of map to look at, you can adjust the parameters, recalculate the map, and try to decide if it is better. If you are using XtalView, you can set up all the windows you need and just keep reexecuting them to view the results. If removing a derivative seems to improve the map, then try lowering its resolution limit to see whether it is still usable at a lower resolution. Check for heavy-atom ghost peaks and holes (large peaks or holes at the position of one of the heavy atoms). These anomalies can be removed by adjusting the occupancy of the site up or down. If you used lack-of-closure error, heavy-atom refinement, and refined occupancy often, ghosts or holes are almost sure to be present. Keep an accurate record of the changes you make so that you can return to a good solution later. Be objective. Increasing the resolution of your map beyond the phasing power of the derivatives will not make it easier to fit.
162
COMPUTATIONALTECHNIQUES
Solvent flattening of very noisy data will only yield solvent-flattened noise. Beware of snake oil and wooden nickels.
.....
3.7.....
MOLECULAR REPLACEMENT Many structures can be phased by using a homologous structure and molecular replacement. ~-3In this method, the homologous probe structure is fit into the unit cell of the unknown structure and the phases are used as an initial guess of the unknown structure phases. A six-dimensional search is required to find the best match of the probes transform to the observed transf o r m - t h r e e angles and three translations. Fortunately, it is possible to split this search into two three-dimensional searches: a rotation search followed by a translation search. As an example of the amount of time this saves, suppose that a search of one dimension takes 10 s of computer time. If we split up the search, the total time is 103 + 103 = 2000 s versus 106 = 1,000,000 s, or more than 11 days. How identical does the probe need to be? This depends upon the structural identity of the two proteins as opposed to the sequence identity. Since we do not know the structural identity, we are forced to use the sequence identity as a guide. A (very) rough rule of thumb is that above 50% sequence identity a molecular replacement solution should be straightforward, since chances are that these two proteins are structurally very similar. Other factors are the s-helical content (more is better), and an anisotropic shape for the probe molecule is desirable. The largest problem with searches using low-homology probes is that once the solution is obtained, the phases will be poor estimates of the true phase, and there will be a high bias toward the probe structure, making it difficult to refine the correct structure. This bias is the main drawback of the molecular replacement method. In many cases, a probe that represents only a portion of the structure is available: for example, in an antibody-protein complex. The method is robust enough that the probe can be accurately positioned in many of these cases. Molecular replacement is often thought of as an easier alternative to multiple isomorphous replacement. However, in practice I have seen rotation-translation solutions take as long as, or longer than, heavy atom 13An excellent discussion of molecular replacement along with several examples can be found in an article by E. Lattman (1985). "Use of the Rotation and Translation Functions," in Methods in Enzymology, Vol. 115, pp. 55-77. Academic Press, San Diego. The classic work on molecular replacement is a collection of articles edited by M. G. Rossman (1972), The molecular replacement method, Int. Sci. Ser., 13.
3.7 MolecularReplacement
163
searches. The combination of both methods is more powerful than either alone. In the heavy-atom case, the phases are noisy but unbiased. In molecular replacement, the phases are heavily biased. A particularly successful strategy is to let a single derivative and a molecular replacement solution cross-check each other and combine the phases to produce a better map than either method can produce alone. The phases of the molecular replacement solution can be used to cross-Fourier the derivative differences in order to find the heavy-atom solution. These single isomorphous replacement phases can then be combined with the molecular replacement phases to filter them and to remove the phase ambiguity, thus producing a map superior to what either can produce alone.
Rotation Methods Rotation searches are actually done in Patterson space. Consider the Patterson of a protein molecule packed loosely in a lattice. In general, the short vectors will be intramolecular vectors, and the longer ones will be intermolecular. In a rotation function we want to consider only intramolecular vectors. Since all vectors in a Patterson start at the origin, the vectors closest to the origin will, in general, be intramolecular. Of course, closely spaced lattice contacts will also produce short intermolecular vectors, but they should be in the minority. By judiciously choosing a maximum Patterson radius, we can improve our chances of finding a strong rotation hit. The second choice to be made is the resolution range to use in calculating the Patterson. Higher-resolution reflections (above about 3.5 A) will differ markedly even between homologous structures as they reflect the precise conformation of residues. Lower-resolution reflections reflect the grosser features of the structure, such as the relation of secondary structural elements. Very-lowresolution reflections, below about 10 A, are heavily influenced by the crystal packing and the arrangement of solvent and protein, which is, of course, more dependent on the particular packing arrangement than on the structure of an individual protein molecule. Thus, the resolution range used for rotation searches is usually within 10-3.5 A, with 8 - 4 A being common. In practice, several ranges can be tried. The first step is to calculate structure factors for the probe structure in an artificial P1 cell. The cell should be about 30 A larger than the probe in each direction so that there are no intermolecular vectors in the Patterson radius used for the rotation search. The probe is usually centered at the origin of this cell to simplify later steps. These calculated structure factors are then used in the rotation search. While the search is not actually done computationally in this manner, it can be conceptualized as follows. A Eulerian angle system is used with three nested angles (Fig. 3.22). The range of the angles is
164
COMPUTATIONALTECHNIQUES Z
u r
FIG. 3.22 Definition of Eulerian self-rotation angles.
chosen to cover the unique volume of the space group of the unknown structure. One angle is incremented 5 ~ and the other angles are moved through all their values every 5 ~ and at each point the match of the probe and observed Patterson functions is calculated and stored. The first angle is incremented 5 ~ more and the process repeated. After all angles have been calculated, the list is sorted, grouped into peaks, and printed (Fig. 3.23). The peaks are usually reported in terms of their size relative to the root-mean-square peak size or sigma. If the probe structure is a good match for the unknown structure and the proper resolution ranges have been chosen, then there will be a single large hit at several sigma. Often, a decreasing series of peaks is found at slightly differing sigma. In such cases, the correct peak may not be the first in the list. The order of the list may be altered by changing the resolution ranges slightly. Since the correct peak is unknown, there is no a priori way to decide on the correct ranges. In these cases a new probe should be looked at if one is available. If not, there is the option of continuing the process, going down the list of rotation hits to see wheter a decision can be made based on the behavior of subsequent steps. The asymmetric unit of the rotation function depends on the symmetry of both the probe's Patterson and the Patterson of the unknown. This problem has been examined and reported on in detail, 14 and Table 3.4 lists the more common cases when the probe is space group P1 for each of the 10 possible Patterson space groups for proteins. It makes a difference which Patterson is rotated and which is held still. This can be decided by a careful reading of the rotation program's documentation, because both conventions of rotating the probe or the unknown are used. When comparing hits, keep in mind that in Eulerian space, the operator 7r + ~)~,- ~)2,7r + ~ is an identity operator. Two rotation hits that look different may in fact be the same if the identity operator is applied. 14
Rao, S. N., Jih, J., and Hartsuck, J. A. (1980). Acta Crystallogr. A36, 878-884.
165
3.7 Molecular Replacement
4~-90
~=0
FIG. 3.23 Self-rotation function example. Peaks on this section (K = 180) show positions of 2-fold symmetry in the diffraction pattern. The large peak in the center is due to the crystallographic 4-fold (which can be thought of as two 2-folds). The peaks around the outer edge are positions of noncrystallographic 2-folds.
Improving the Probe It may be possible to improve the hit by systematically leaving out pieces of the probe and doing the rotation search again. For example, every three residues can be deleted and the size of the rotation hit examined. The absolute sizes of the hits are compared, not the peak/or ratio. If parts of the probe structure that contribute to the match and are thus likely to be homologous are removed, a lower hit will be found. If a portion is interfering, then removing it will result in a larger hit (Fig: 3.24). A pmhe cnn then he built, leaving out several of the residues that raise the hit and, thus, the overall rotation hit can be improved. A similar procedure can be done, leaving off side chains beyond the C e atom (i.e., polyalanine) on the hypothesis that the main chains follow the same path but the side chains differ considerably.
COMPUTATIONALTECHNIQUES
166
TABLE 3.4 Rotation Function 1 Probe rotated Unknown Patterson group
1 Probe held still
01, 02, 03
0+, 02, O_
01, 02, 03
0+, 02, O-
0 - < 0 1.3 A), since absorption is minimized and this can provide a good reference wavelength for scaling out the errors due to X-ray absorption, as well as a good "native" data set for refinement. The data sets (i.e..fin files) for the heavy-atom phasing program are thus: PI native anomalous PK native anomalous RE native anomalous PI-PK isomorphous (PI merged with PK) H-RE isomorphous (PI merged with RE) PK-RE isomorphous (PK merged with RE) In this notation, PK will have the maximal size of f" and provides the largest anomalous signal. PI-RE will have the largest dispersive difference or Af'. To check the usefulness of each of the six possible data sets, the Bijvoet difference Patterson map can be checked for each wavelength and the isomorphous difference Patterson map for each wavelength pair. If the Patterson map is noisy and the peaks due to the heavy atom are small ( 1.5 A), the refinement strategies can be improved to take advantage of the increased number of data available. By going to higher resolution, we can add many more parameters to the model, including thermal anisotropy, split side chains, and riding hydrogens. The validity of these has been well known in small-molecule crystallog-
3.11
203
Refinement of Coordinates
D ,
180
,+ +
i ii
'+ +
+!
+ 120
! I
,
|
!
l
I ~"~.~
-"a I
"-
I
I
-120
:
i
I
I
I
I
,
'
§ +
+I
.....
+
+
. . . . . . . . . .
+
+
+ +
+
+
1-t-! !
!
+
i
1 +
i
i++
r.......... H ........... --k.
9
-180
+
+
,
-180
"'"-'i
I
4-- W
Ill Ii If I! I!
...... +
. . . . -I,i + ',
+
+
9
i
T+
+
i
+,,____+_ ........
I I I !
-~,...,|
............. + + + ---_1_
9 .... l
! ,
§ ....
',J
"I"
iI
Ii Ii
I
"',, '~:
:
+
,-ii x i!
I
i+:
---+-.... N.--~,...:' '
-60
~, t ~ ' "
+ +
i 1 !
+
+i+ -,
+
II
i
""- ,
:+
+ ++
++
:
-t-__.I,' . . . . . ;'. . . . . . . . . . . . . . . . .
i
Psi
'~
i
60 I . . . . . . . . . . . . -.-.- ' -
0
+11",
I+
I
:
+~
i .... "-+--+ ~.§
+ + '
I! \ \
I
............
-120
+
.....
I . . . . . .
-60
FIG. 3.33
o Phi
l,
60
. . . . . . .
i ...... 4-
120
.
180
(continued).
raphy for many years, where R-factors of 1% are obtainable. The program to be used here is SHELX-97, the latest version of SHELX written by George Sheldrick. 27 SHELX has not been widely used to proteins in the past, partly because converting the large number of macromolecular coordinates and residue structures to small-molecule conventions was troublesome. However, the new version has features that make these conversions unnecessary or automatic. It also builds the geometry restraints needed for refining structures at less than atomic resolution. SHELX-97 has several advantages for ultra-high-resolution refinements: 1. The structure factor calculation is more accurate than in XPLOR or TNT. At resolutions above about 1.8 A, XPLOR and TNT show significant 27Sheldrick, G. M. (1998). " H i g h Resolution Structure Refinement" in Crystallographic (K. W a t e n p a u g h and P. E. Bourne, eds.). O x f o r d University Press, Oxford.
Computing 7
204
COMPUTATIONAL TECHNIQUES
180 i\
i
i"
+i+ ++ ,
'
+
,
++.Z!
\
120 i
i
I
,,
s~S ~SSp a SS i (" I
60
.........
I .....
.i.! . . . . . . . . . . . . . .
'
I ,
I
"-. + ""
Psi
++ -k
~ |
.
.
.
.
.
.
.
I . . . . . . . . . . .
I'I| ,
L
'...I.,illi.l
, -~-
+ ' ~ ; ' l i ~" l. i i i ~ " L
i--
Ii ii ii
"
i .
.
.
- v ~ -!:~', ""-..
s'!
9
L ........
I ', " .
-. " ....
II
, _._._++,,.,,
II +
.
.,".. ,I
-
9
",.4-
," I I I
,
I
',
i
i
.
,
l I
I
.
-60 i. . . . . . . . . . . . .
i.~
. . . . . . . . . . . . . . . . . .
..I
-120
-180 -180
120
-60
FIG. 3.33
0 Phi
60
120
180
(continued).
errors that are due to the use of an FFT approximation, and by 1.0 .Ji these errors have become a significant portion of the R-factor. 2s 2. Anisotropic B's can be used if there are enough data. Proteins typically exhibit considerable anisotropy, and including this model increases the accuracy at the expense of more parameters. 3. SHELX includes generation of fixed or riding hydrogens. These hydrogens move with the atom they are bonded to and effectively make the structure factor calculation more accurate. 4. Partial structure is easily generated and refined. SHELX has free variables that allow correct refinement of the occupancies of the split parts. 2Sin an FFT from model to structure factors, the model coordinates themselves are not transformed, but an approximation of the equivalent electron density built on a grid. In theory, an accurate enough representation can be computed, but in current practice it usually introduces a small error.
3.11 Refinementof Coordinates
205
One of the chief drawbacks is that the program is considerably slower than the above-mentioned refinement software. However, with the increased speed available on today's computers this is no longer a serious impediment. The quality of the density of very-high-resolution structures is greatly increased (Plate 7). Features of SHELX
Use of Intensity Data Considerable experimentation by Sheldrick and others has shown that it is best to refine directly against the original intensity data (I) rather than ]FI, and to include all the data, even the negative observations. The argument for this is that the intensity data, consisting of real observations, have errors that give an expected distribution which for the weakest data has a negative component, that is, because of the experimental error, some of the weak data will be negative in intensity as they are measured to be less than the surrounding background. For example, consider a reflection that is exactly 0 in intensity. Half the time it is measured it will be greater than 0 and half the time less than 0. If we threw out the negative measurements and then averaged, we would find that the reflection now has a positive intensity. Leaving out negative intensities raises the mean of the weak highest-resolution data and thus affects the weighting scheme. This change to the weighting leads to slower convergence. Since it is impossible to take the square root of a negative number, when IFI's are used negative observations are removed from the data set, leading directly to this problem. As discussed later, the inclusion of negative intensities is also needed for correct uncertainty calculations.
Riding Hydrogens Riding hydrogens are hydrogens added to the model in a fixed geometry to the heavy atoms. Thus, they are not free to refine but move with the heavier atom to which they are attached, their chief effect is to make the structure factor calculation slightly more accurate by accounting for the small, but measurable, contribution to the scattering in the crystal. Indeed, if they are left out the heavy atom will move slightly in the direction of the left-out hydrogen to account for the missing density. Given accurate data, riding hydrogens become significant somewhere around 1.5-A resolution. An R-free cross-validation test can be used to verify that riding hydrogens are valid. Typically a 1% drop in R-free is found when hydrogens are added.
Anisotropic Thermal Parameters At lower resolutions a single isotropic thermal factor, B, is used to represent the thermal vibrational motion of the atom, as well as static disorder
206
COMPUTATIONALTECHNIQUES
within the crystal. A more complete model of this motion uses six parameters: three to describe the axes of motion of the vibration, and three to describe the magnitude in each direction. In reciprocal space, these motions are described by a symmetrical matrix with the terms, Uij, which are the actual parameters refined. We have added a feature to the latest version of xfit to view these termal parameters in a manner similar to the familiar program ORTEP, but in real time. Examples of this are shown in Fig. 3.34. The main obstacle to using anisotropic thermal parameters at lower resolution is the lack of sufficient data to justify adding five more parameters per atom. As can be seen in Table 3.5, increasing the resolution from 1.9 A to 1.35 A results in three times as much data being available. If anisotrpic thermal parameters are added at 1.9 A, the ratio of data to observed parameters becomes about 1 : 1 (10,741 : 9211 ), considerably less than the 2 : 1 minimum required for a well-conditioned least-squares minimization. The exact point at which this crossover occurs for a given protein crystal depends on the solvent content. If the solvent content is higher than 50%, then there are fewer protein atoms for the same volume unit cell and the crossover is at a lower resolution. For low-solvent content crystals, the 2 : 1 crossover happens at a higher resolution.
Solvent Model SHELX includes a bulk solvent model based on work by Moews and Kretsinger 29 and being used increasingly in all refinements. Most refinement protocols in the past have excluded from the calculations the so-called solvent region (roughly, reflections between infinity and 7-5 A). Using the bulk solvent correction, which includes just two adjustable parameters, K, a scale factor, and B, a thermal parameter, it is possible to include all the data from infinity on up with a substantial drop in R-value for the low resolution data. This has been done in the refinement example in Table 3.5. A further advantage is that maps that were calculated without the solvent region can be subject to ripple as a result of series termination errors (see Sec. 3.12, Resolution Cutoffs), and the inclusion of the low-resolution terms removes that problem.
Split Side Chains In high-resolution structures, especially in frozen high-resolution structures, discrete disorder of the protein can often be detected. For instance a side chain on the surface may have two equally populated conformers. At 2 9 Moews, P. C., and Kretsinger, R. H. (1975). Refinement of the structure of carp muscle calcium-binding parvalbumin by model building and difference Fourier analysis. J. Mol. Biol. 91,201-228.
3.11 Refinementof Coordinates
207
FIG. 3.34 Stereo figures of the Fe3S4cluster and the S~ ligands at 1.35 A illustrating the density and the corresponding thermal ellipsoids. (A) 2Fo - Fc erA-weighted electron density map contoured at 5or. (B) Thermal ellipsoids showing the 25% probability surface (i.e., an atom can be found within this surface 25% of the time) for the cluster and 50% probability lines for the major axes. Note how all the atoms move in similar directions. The long axis is along the crystallographic c axis and, since all atoms in the crystal show this same elongation, it is probably
a n y given m o m e n t half the side chains in the crystal will be in o n e o r i e n t a t i o n a n d half in the other. B o t h p a r t s can be refined s i m u l t a n e o u s l y in S H E L X , a n d the relative o c c u p a n c i e s of the t w o parts can be tied t o g e t h e r a n d also
9
0
....~
0
r
~A
~v
9
9
b
c
2:
"r
2:
v
~
~.. M
"~-
~ cxl
M
eq
eq
~4 ~
eq
d ~
d
eq
d eq
eq
~
eq
~
eq
~
eq
~
eq
~
eq
c~ ~ e~
I
I
eq
~
o~ ~
eq
e,,I
,~-
~
eq
~0
oO
~
!~
~0
e~
~
I~,
x.O
~
0
b,,
'~0
0
ee~
0
I'~
0
0
e~
~
oo
0
0
e~
It)
0
0
0
e~
0
0
0
e~
0
0
0
~
0
0'~
~
E
I
O'~
I
oo
gN
I
oo
I
I
~ N
I
I
I
~ ~
N 2
~ 9
~
0 ._Q
~r,A
2:
~
~o
m~ E2
xd~"
~2 C
0
~.~
~
3.11 Refinementof Coordinates
209
refined. We have added a feature to xfit that simplifies splitting the side chains and fitting the two halves. Depending on whether one side of the split is clicked on or the root is clicked, xfit will fit half of the residue or the entire residue. Besides inspection of the electron density maps, split atoms can be found by an examination of the anisotropic thermal parameters. SHELX detects possibly split atoms and prints a list for further inspection.
Cross-Validation To check the validity of each step and the overall refinement, the statistical parameter R-free is used. Five percent of the data is held in a separate pool and kept out of in the least-squares refinement. Thus, if R-free decreases it must be because the model has become better, while for the other 95% there is the danger that the R-value decreased because of overfitting of the free parameters of the model, leading to a false minimum. When parameters are added to the model, the validity can be checked by an expected drop in R-free. Another use is to check for the best value of refinement parameters. For example, the sigma applied to bond lengths can be varied in a series of refinements. As this is done, it is found that the R-value slowly decreases as the restraint is removed. This is expected because removing the restraint allows the minimizer to achieve a closer fit. However, R-free shows a shallow minimum where tightening the restraint causes an increase in R-free, and relaxing shows no improvement in R-free and eventually actually increases R-free. Thus, R-free can be used to find the correct target sigmas for bond lengths, bond angles, and thermal parameters. In the sample refinement of Table 3.5, note how the R-free drops 2 % when the thermal model is changed to anisotropic, and 1% when riding hydrogens are added. As a test of R-free we tried refining a model with anisotropic thermal values at a lower resolution where the ratio of data to parameters was about 1:1. Although the R-value dropped 3 %, the R-free actually increased 1% and justified our confidence that by following R-free we can avoid adding too many parameters too soon.
Checking the Refinement Besides following the R-value, R-free, and visually inspection maps, the resulting structure can be checked by looking at the final polypeptide geometry. For this we use PROCHECK. 3~SHELX does not refine any torsions and so these may be used in a manner analogous to R-free in that the torsions 3~ R. A., MacArthur, M. W., Moss, D. S., and Thornton, J. M. (1993). PROCHECK: A program to check the stereochemical quality of protein structures. J. Appl. Crystollogr., 26, 283-291.
210
COMPUTATIONALTECHNIQUES
were not restrained to target values. In the 7-Fe ferredoxin example of Table 3.5, we found that the torsions of the main chain and side chains fell well within the P R O C H E C K plots. In face we were delighted to find that the bond lengths and angle distributions were below our restraints in SHELX and even tighter than the Engh and Huber geometry 31 in PROCHECK, clear evidence that the restraints did not overly determine the geometry but were instead determined by the extra resolution of the data.
Positional Uncertainty Analysis Positional uncertainty analysis involves calculating the standard deviations of the positional parameters or, as these are known in crystallographic jargon, standard uncertainties. Traditional small-molecule methods of estimating positional uncertainties 32 involve normal matrix inversion. (The standard uncertainty was k n o w n as the estimated standard deviation until the International Union of Crystallography recommended changing the terminology.) Calculating standard uncertainties is quite distinct from refinement, but, because it is usually done by the same software, the processes are often confused. Traditional uncertainty analysis is proteins has consisted of little more than a Luzatti plot of R-value versus resolution (which as Cruikshank 33 points out is a misuse of Luzatti's method), or a erA calculation. 34 In either case these methods lump the entire structure into one average uncertainty. Some parts of the structure will be much worse and some much better. For something as plastic as a protein, these make very poor methods of uncertainty analysis. A better method proposed by Cruikshank allows for estimating uncertainties based on the atom type and its B-value. Uncertainties have been shown to correlate well to Cruikshank's equation by comparing uncertainties calculated by full-matrix inversion with those derived from the equation: 3~Engh, R. A, and Huber, R. (1991). Accurate bond and angle parameters for X-ray protein structure refinement. Acta Crystall(}g. A47, 392-400. 32Schwarzenbach, D., Abrahams, S. C., Flack, H. D., Gonschorek, W., Hahn, T., Huml, K., Marsh, R. E., Prince, E., Robertson, B. E., Rollett, J. S., and Wilson, A. J. C. (1989). Statistical descriptors in crystallography. Report of the International Union of Crystallography Subcommittee on Statistical Descriptors. Acta Crystallogr. A45, 63-75; Schwarzenbach, D., Abrahams, S. C., Flack, H. D., and Wilson, A. J. C. (1995). Statistical descriptors in crystallography. II. Report of a Working Group on Expression of Uncertainty in Measurement. Acta Crystallogr.
A51,565-569. 33Cruikshank, D. W. J., Protein precision re-examined: Luzatti plots do not estimate final erros, In "Macromolecular Refinement, Proceedings of the CCP4 Study Weekend, January 1996" (Dodson et al., eds.), 1996. 34Read, R. J. (1986). Improved Fourier coefficients for maps using phases from partial structures with errors. Aca Crystallog., 42, 140-149.
3.11 Refinementof Coordinates
,~(xi) k
(
~]nat /=1 .... Z 2 / Z
2
nob . . . . . tions -- np . . . . .
ters
)
1 + 0.04Bi + 0.003B } 1 + 0.04B w + 0.003 B 2
211
C-1/3
dmin R,
where k is about 1.0, Z is the atomic number, B w is the Wilson B for the structure, B i is the B-value of the atom in question, C is fractional completeness for the data to dmin, and R is the crystallographic R-factor. Note that as the number of observations increases, the uncertainties will decrease, and if the number of observations falls below the number of parameters, the equation becomes invalid. In this formula, the uncertainty for a given atom in a given structure depends mostly on the B-value for a given atom, Bi, and the scattering factor of the atom, Z i , which together determine the atom's contribution to the total scattering. Atoms with low B-values have lower uncertainties in their positons, and atoms with higher atomic number and thus more electrons, will also have lower uncertainty. For a typical protein it becomes possible to estimate individual atom positional uncertainties somewhere above 2.0-A resolution, with the uncertainty strongly dependent on resolution. 3s Cruikshank gives examples where a 2.0-A resolution protein structure has an average uncertainty on coordinates of 0.32 ~i (Cruikshank, 1996: see note 33). At 1.6 A resolution the average uncertainty drops to about 0.13 A, and at 1.0 A resolution the typical uncertainty is about
0.03 A. Calculating Standard Uncertainties If the normal matrix of the least-squares minimization operation is inverted, the resulting matrix, after scaling, contains at each element i j, cijcricri, where cij is the correlation between the two parameters i and j and cri and cri are their standard deviations. On the diagonal of the matrix where i - j, since the correlation of a variable with itself is 1, this becomes cry, and thus inverting the matrix gives us the standard deviation on any parameter. In crystallography these standard deviations are termed the standard uncertainties. Each atom coordinate and thermal parameter will thus have a standard uncertainty associated with it. The radial uncertainty of an atom at x, y, z
3SThe uncertainties are actually dependent on the ratio of observations to refined parameters, but since the number of observed data increases with increasing resolution, for a given protein this is an accurate statement. The resolution at which this ratio crosses 1.0 depends on each crystal's characteristics. Proteins with high solvent content will have fewer parameters to refine and proteins that have more than one molecule per asymmetric unit that can be averaged will have fewer free parameters to adjust.
212
COMPUTATIONALTECHNIQUES (a) Atom positional esds including solvent (Angstroms) 0.25
./...i
0.20
9 9
0.15-
9
o 9
:.:) 9
9
o" 9
~:~0.10, 9149
~..
11 9
0.05
o.oo-~ 0
5
10
1'5
20
25
30
3'5
40
Equivalent B-value
FIG. 3.35 Plot of positional uncertainties versus thermal parameter for carbon, nitrogen, and oxygen atoms from a 1.35-A resolution structure. The upper, black line is for carbon, the middle line is for nitrogen, and the lower line is for oxygen. Note that as the atomic number for an atom increases (carbon = 6, nitrogen = 7, oxygen = 8), and thus its contribution to the total scattering, the positiona uncertainty decreases. As the B increases, so does the positional uncertainty.
w i t h u n c e r t a i n t i e s of o.x, o-y, o.z will have a radial u n c e r t a i n t y of 9 2 (o.2 + o .2 + o.zZ). Similarly, the s t a n d a r d u n c e r t a i n t y on a b o n d can be calc u l a t e d by n o t i n g t h a t for t w o u n c o r r e l a t e d q u a n t i t i e s that are s u b t r a c t e d , the s t a n d a r d deviation of the r e s u l t a n t is o.22 j = o.)z + o.2z. T h u s if we calculate the length of a b o n d we can calculate the s t a n d a r d u n c e r t a i n t y on the b o n d , o. b,,n~, with length, l, f r o m the p o s i t i o n a l u n c e r t a i n t i e s by the e q u a t i o n o.2
bond
-
(o.x2 + o.2) x, - x2 I x2 I
- y2 + (o.2Y l + o.}') y~ I -
+ (Oz~ + O z~) z, - z~ 1
,)
/
3.11 Refinementof Coordinates
213
which is the sum of the positional uncertainties projected onto the bond to account for the direction dependence of the bond. Similar considerations can be used to find the uncertainty on any quantities derived from the atomic coordinates. The foregoing treatment assumes that the atoms are uncorrelated. If the off-diagonal elements are nonzero, this indicates a correlation between parameters. The general equation is more complex and takes into account the off-diagonal correlations and the uncertainties in the unit-cell measurements. If two independent measurements of the same quantity are averaged, such as two bond lengths in a dimeric molecule, the standard uncertainty of the average is ~r~ - X/(~rx21 + ~rx2 ~ )/2 9 The general equation for n measurements is o-~ - X/s The standard uncertainties for proteins can be calculated using the program SHELX. First the structure must be refined to convergence, and then a cycle is done using the full matrix with zero Marquardt damping and all restraints switched off (including the restraints will give artificially low values). All reflection data are used; there are no deletions due to weakness and no exclusions by resolution, even of low-resolution data often excluded as the "solvent region." SHELX then calculates the standard uncertainties using the full covariance matrix and the estimated uncertainties in the unit cell. The standard uncertainties in the derived parameters bond lengths and angles are then calculated from the positional uncertainties. The relevant SHELX commands are L.S.
1
DAMP
0
0
BOND
Remove the commands DFIX, DANG, FLAT, and other restraints, since these will artifically lower the standard uncertainties.
Block Diagonal Calculations For several reasons, only a few proteins thus far have been analyzed by matrix inversion for the uncertainty in atomic positions. The main reason has been that until recently, most X-ray structures had not been determined to high enough resolution to have a big enough ratio of data to parameters to allow meaningful calculation of standard uncertainties. Second, even when the resolution of the data was high enough, the calculation for inverting the full least-squares matrix was too large to fit in the memories of the available computers. For a given protein the amount of memory required to hold the full least-squares matrix needed for the calculation of uncertainty is given by the equation [n(n + 1)/2] • 4 bytes, where n is the number of
214
COMPUTATIONALTECHNIQUES
parameters. 36 For a typical small protein with 1000 atoms and 200 solvent molecules, the number of parameters including x, y, and z, and six thermal parameters per atom is then 1200 atoms x 9 parameters = 10,800 parameters atom which would require 222 megabytes of memory plus an overhead of about 10 megabytes. For a medium-sized protein of 3000 atoms, the amount of memory for the matrix increases to 1.4 gigabytes! Because of the manner in which the memory is accessed during the analysis, it is not possible to set up the problem to use virtual memory efficiently. -~7This means that unless the computer has enough main memory to hold the entire matrix, continuous swapping of memory pages causes the program to slow to about 1% of full speed, which also renders the computer useless to others on multiuser systems. Recently, computers with very large memories have become available. The smaller proteins can now easily be done on a lab workstation with one gigabyte of memory. A useful approximation to the full-matrix calculation is a blockdiagonal calculation, where portions of the full matrix are extracted into smaller matrices along the diagonal. A good approximation is to use a block matrix retaining the three (x, y, z) positional parameters for all atoms without the thermal parameters, which will require 32/92 or 1/9 as much memory. This gives a considerably smaller matrix of 154 megabytes for the 3000-atom example. Since the thermal parameters contain very little information about the positions of the atoms, this is a good approximation. We have done tests showing that there is only a 1% difference between calculations done with and without thermal parameters. The SHELX command for doing the positional terms by block-diagonal standard uncertainty analysis only is: L.S.
1
BLOC
1
DAMP
0 0
BOND
The memory requirements can be further reduced by breaking up the positional parameters into overlapping blocks. For example, the protein 36Since the matrix is symmetric about the diagonal, the calculation requires the diagonal plus the upper half of the matrix, which is (n(n + 1))/2 matrix elements, and each element requires 4 bytes of memory. 37To evaluate a single parameter, the column and the row containing the parameter need to be accessed. Thus if the matrix is arranged to put columns sequentially, rows will be far apart, and vice versa. If the program starts to swap, very large number of pages are continuously moved in and out of memory and the CPU grinds to a near halt.
3.11 Refinementof Coordinates
215
could be cut into several pieces with an overlap of two residues. If we cut the 3000-atom example into three pieces with an overlap of 20 atoms, into 1020-atom pieces (3060 parameters/block), the memory requirement for each block drops to 18 megabytes. Similarly, the time to calculate the standard uncertainties is reduced. The drawback is that the standard uncertainties will be underestimated because the sum parts of the matrix have been omitted. The amount by which the standard uncertainty is underestimated can be estimated however, and compensated for. Figure 3.36 shows the percentage by which the error is underestimated when the positional parameters are divided into blocks. As shown in the figure, when the 12 overlapping blocks are used, the standard uncertainty is underestimated by only 16%. For large proteins, this is an excellent approximation.
SHELX Refinement Strategy As a typical example of how to use SHELX in protein refinement we'll assume that a data set has been collected on a crystal using synchrotron radiation and cryocrystallography and now a very-high-resolution data set
0
-
-4-
-8-
E3
o~
-12 -
-16
-
-20 0
I
i
I
J
i
I
2
4
6
8
10
12
14
no. of blocks
FIG.3.36 Effectof block size on the underestimation of positional uncertainty parameters.
216
COMPUTATIONALTECHNIQUES
exists for a protein that was previously solved a room temperature with 2.0resolution data. This is a typical application of SHELX. The following example is for SHELX-97 or later versions. See Appendix B, Useful Web Sites, for the SHELX site that offers information about obtaining SHELX. Start with a PDB format file with the coordinates to be refined and run SHELXPRO with the I option, .ins from PDB file, which creates an .ins file from the PDB coordinates. The program will prompt you for the unit cell and space group. N a m e this file something like prot. i. ins. Also prepare your data and convert SHELX .hkl format. You can do this with xprepfin. The data must be divided into a working set and a free set for R-free crossvalidation. This is done with SHELXPRO and the V command, which marks the R-free reflections with a - 1. For the example, this file will be p r o t . h k l . To turn on the R-free calculations, edit the .ins file and add the - 1 flag to the CGLS command: CGLS
20 -i
When SHELX runs it will look for the reflection data in a file that has the same name as the .ins file but with the extension changed to .hkl. Rather than copying or moving the data file, we can create a soft link to the file. Thus we need to enter: in -s p r o t . h k l
prot.l.hkl
N o w we can run SHELX with the command: shelxl
prot.l
>& p r o t . l . l o g
&
This starts SHELX in the background and puts the output into a log file. We can follow the log file to monitor progress with the command tail
-f p r o t . l o g
which continuously monitors the file for new lines being added as SHELX runs. As the program runs it will create three other output files, a CIF format reflection file with an .fcf extension, a longer log file with t h e . 1 st extension, a new instruction file with the extension .res (for restart file), and a new PDB format file with the extension .pdg. When SHELX stops you can fit the output with xfit by building the command xfit p r o t . l . p d g
prot.l.fcf
After fitting the model, write the output PDB file from xfit into p r o t . 1 . f i t . p d b . With SHELPRO you can update the .res file with the PDB file using the U command. Write the output from this into p r o t . 2. i n s . Make another soft link to p r o t . h k l and run the new .ins file. Repeat this process until further fitting seems to be unnecessary, adding or subtracting waters as needed (see Chapter 4, Sec. 4.4, Editing Waters).
3.12 Fittingof Maps
217
If you are above 1.5-A resolution, you can make the model anisotropic by adding the command"
ANISO_* $C SN $0 $S to change the thermal model for isotropic to anisotropic. Run this refinement and look for a drop in R-free of about 2 %. After another manual fitting step, again, if at very high resolution, add riding hydrogens by removing the REM comments from the HFIX commands in the .ins file. This should give a drop in R-free of about 1%. Continue refinement if necessary to convergence. For the last cycle, remove the -1 flag from the CGLS command to refine the final model against all the data. The R-factor should be in the teens and R-free 5 - 1 0 % greater than the R-factor.
. . . . . 3.12 . . . . . FITTING OF MAPS Calculating Electron Density Maps Resolution Cutoffs Normally, you can specify a minimum and a maximum resolution cutoff when calculating an electron density map (Fig. 3.37). Remember that the map is being calculated using a Fourier synthesis and that this causes series termination errors for the higher-resolution terms left out. These series termination errors show up as ripples in the electron density with periods close to the highest resolution used in the map. The effect of leaving out low-resolution terms is to add low-period ripples that make the map look "choppy." Low-resolution terms are often left out of refinements and, thus, they often end up left out of map calculations. How much effect this has on the interpretability of the map depends upon how high the other limit is. For instance, a 4.0- to 3.0-A map is hard to interpret even with perfect phases, while a 4.0- to 2.0-A map is relatively straightforward. As a rule of thumb, a 3-A map should include data from 10- to 3-A.
Nonorthogonal Coordinates The main problem with nonorthogonal coordinates is making certain that the map and the model are in the same coordinate frame, as discussed previously Coordinate Systems (see Sec. 3.1). It is also common to discover after a model has been fit that the cell transformation is incorrectly specified for the refinement program and the only indication this has happened may
218
COMPUTATIONALTECHNIQUES
be a high R-factor. The XtalView system allows the user to specify a 3 x 3 matrix for the Cartesian-to-fractional coordinate transformation (for fractional-to-Cartesian the inverse matrix is computed), so that if another program is being used, the matrix from this other program can be entered. In XtalView the matrix used is the same as F R O D O and X P L O R in the cases tested so far.
Map Boundaries M a n y programs, such as F R O D O or O, can display only as much of the unit cell as is stored in the precalculated map file. This means that the user must decide beforehand on the map boundaries that will cover an entire molecule. A mini-map can be used to determine boundaries that will cover an entire molecule. In xfit there is no need for this because the program is smart enough to k n o w that the density at 1.1 is the same as at 0.1, and the maps always contain a full unit cell.
Combined Phase Coefficients Phase bias is a serious problem in the early stages of fitting and refinement. One way to avoid phase bias is to use only the MIR phases to calculate the maps; this guarantees that the maps are unbiased with respect to the model being fit. However, MIR phases are noisy and usually limited in resolution. Several methods have been developed for combining calculated model phases with experimental phases to allow information from both to be used. This approach can be used to increase the resolution, to allow the use of partial models, and to help reduce model bias. All methods rely on the difference between F,, and F~ to weight the a m o u n t of calculated phase contribution. Combined phase coefficients allow the inclusion of the low-resolution MIR phases, which are usually accurate, with calculated phases from higher resolutions. The MIR phases alone should be used in the resolution range infinity to 5.0 A, where model phases are inaccurate. At other resolutions both phases are used in a weighted manner. The phase combination program
FIG. 3.37 Effect of different resolution cutoffs. The maps use the same data but differ in the resolution limits used. Thick lines are the model used to calculate the phases; thin lines represent the electron density contoured at l~r. (A) 37-5.0, (B) 37-4.5, (C) 37-4.0, (D) 37-3.7, (E) 37-3.3, (F) 37-3.0, (G) 37-2.0. The turns of the helix become apparent at 3.7, and the carbonyl bulges are apparent at 3.0. However, these maps are made with refined phases, and starting maps will appear to have lower resolution because of phase errors. Map (H), 5.0-3.7, shows the deleterious effect of truncating the low-resolution data too severely. Compare the densities in (H) and (D).
A m
41'
m
C
D
FIG. 3.37 (continued). 220
FIG. 3.37 (continued). 221
G
FIG. 3.37 (continued).
3.12 Fittingof Maps
223
puts out a combined figure of merit that is used to weight the map, as in the MIR case. The combined maps are smoother, especially at resolutions between 3 and 2.5 A, than maps made from an incomplete model for phasing. The most common phase combination procedure is Bricogne's adaptation of Sim's weighting scheme. 38 Two-phase probability distributions are multiplied together by the following procedure. The phase probability for the MIR phases has been previously stored as the four Hendrickson-Lattman coefficients, AMIR, BMIR, CMIR, DMIR (see note 11). The phase probabilities for the calculated phases are calculated from the equation:
exp[2lFob~]lFcl] Pc(~b)
(F 2 - F2)cos(& - &c)'
(3.35)
where (F 2 - F 2) is the root-mean-square difference in intensities and is calculated in bins of resolution. It can be seen that when the R-factor is large, that is Fo and Fc do not agree, then the contribution from the calculated phase is lowered because the denominator will be large. As the R-factor decreases, the calculated phase contribution will increase. This probability is expressed in terms of A and B, where A is the cosine part of phase, B is sine part of the phase at the maximum value of the probability distribution and x / A 2 + B 2 = m F o , where m is the figure of merit. These are then added to A M~Rand B MIR with the relative weights set by WMI R and Wcalc; A
=
WMI R X
A MIR +
Wcalc X
A calc,
(3.36)
with a similar equation for B. The new phase is found by evaluating: Pj(~b)- exp(A(cos(b)& B(sin~b)& C(cos2~b) + D(sin2~b))
(3.37)
which gives a new most-probable phase and figure of merit using Eqs. 3.20 and 3.21. The success of this process can partially be judged by an increase in the figure of merit. The weights should be adjusted so that the new phases are, on average, between the MIR and the calculated phase. Low-resolution phases will be closer to the MIR phase, and the higher-resolution phase will be closer to the calculated phases. The swing point, where more of the calculated phase is included than the MIR phase, should be located between 4.0 and 3.5 A, based upon past experience. This seems to give maps that contain new information from the filtering effect of the calculated phases but is not overwhelmed by them so that no information from the MIR is left.
EvaluatingMap Quality An experienced crystallographer can usually quickly assess the quality of a map by inspection. It is especially important for new crystallographers 38Bricogne, G. (1976). Acta
Crystallogr.
A32, 832.
224
COMPUTATIONALTECHNIQUES
to try to learn this technique so that a lot of time is not wasted trying to fit poor maps.
Judging Electron Density First look at the map on a large scale, where both protein and solvent should be visible. There should be a large contrast difference between the solvent and the protein. If the map is contoured at 1or, there should be few long connected regions in the solvent region (look at several sections). The protein region of the map should have connected densities that are cleanly separated. The heights of these ridges should be consistent over the protein region. Excessive "peakiness" is a bad sign. It might help at this point to consider that a protein is a long polypeptide composed mostly of carbon, nitrogen, and oxygen, which all have about the same electron density. The exception is a small number of sulfur atoms. Any error in the phases will cause some volumes to have too little density, and another volume of the map will be correspondingly too high. With a practiced eye, you can quickly discern this. Another feature to look for in judging the quality of the map is the contrast between protein and solvent regions. For this purpose, a slab of electron density over a large area is needed (Fig. 3.38). There should be a clear difference in level between the protein and solvent. The solvent should comprise a few large areas of low-level peaks rarely rising above 1-2 times of the root-mean-square value (or) of the map.
Electron Density Histograms The previous statements can be restated more precisely. In a correctly phased map, the distribution of the densities will be that for a protein molecule, which is independent of its fold or space group, and will have a characteristic histogram. In fact, the histogram can be used to compute the probable amount of phase error by comparing the histogram of an unknown with that of correctly phased maps of known structures (Fig. 3.39). The histogram is dependent upon the resolution range used and also on the percentage solvent in the crystal. XtalView comes with a program, xedh (Fig. 3.40), that computes the histogram of a map and can be used to compare it with known histograms. This is easily enough done; except that, of course, there is a large gray area where a map may be interpretable in spite of the phase errors. However, the histogram comparison method gives an objective method of estimating phase error that, with experience, will give a guide to the "interpretability" of a set of phases. The histogram method requires no model, and it does not matter how the phases are derived.
3.12 Fittingof Maps
225
-0.456 i 0-094 ,
C)
out.fin
infile=infile.tmp outfile:outfile.tmp
# this complains # it w o r k s cat > $infile
about
an
inappropriate
operation
but
# o u t p u t of m t z 2 v a r i o u s is s a v e d in m t z 2 v a r i o u s . l o g mtz2various h k l i n $ i n f i l e h k l o u t $ o u t f i l e \ mtz2various, log labin I(+)=I(+) SIGI(+)=SIGI(+) I(-)=I(-) S I G I ( - ) = S I G I (-) OUTPUT USER ' (315,4F12.2) ' END eof #place outfile cat $outfile rm rm
on
stdout
$ infile $outfile
Using this as a guide, you can easily convert any other CCP4 file by changing the l a b i n line to the appropriate labels for your file. Of course you must have CCP4 executables installed on your computer and in your path for this to work. If you are just going to run it every once in a while, you can make a simpler script and load the output into xprepfin using the xtalmgr or on the command line. Here's a similar script that leaves the output in for_xprepfin.fin: # ! / b i n / c s h -f mtz2various hklin my_file.mtz hklout for_xprepfin.fin 1.5), and especially if the B's are anisotropic, the program cannot do an accurate structure factor calculation because it
324
XtalViewTUTORIALS
FIG. 4.38 The xfit SFCalc window is used to calculate structure factors from the model and can be used for making omit maps. Use the Shake option to reduce phase bias.
4.4 A TypicalManual FittingSessionwith Xfit
325
uses an isotropic reverse FFT algorithm. However, if you are at the point of making your B's anisotropic and are still wondering where side chains are located, you may want to rethink your fitting strategy!
Finding Geometry Errors The Error w i n d o w (Fig. 4.39) is used to find geometry errors in the model. Bring up the Error w i n d o w and click on Analyze Active Model. The geometry is analyzed and the results put into the error list so you can click through it. Also, the phi-psi plot is popped up. Residues with bad phi-psi's are marked on the phi-psi plot (and put in the error list). As you click on each error or click on Go To Next Error, the model is moved to center the residue with the error. When you click on Fit Error Res the residue is activated so you can repair it. Use Delete Error Res to send the residue to oblivion.
Editing Waters Two shortcut keys have been added specifically to make editing the water list faster, thus encouraging users to actually look at the waters. Note
FIG. 4.39 The Geometry Errors window is used to analyze the geometry of the model. Click on an error to go to it.
326
XtalViewTUTORIALS
that in the majority of cases editing the water list and removing bad waters will cause your R-free to drop even if there is a very small increase or no change in the overall R-factor. A bad water is one in which there is no density, the density is not shaped like a water, or the B-value is higher than, say, 80.0. For high-B-value waters with sensible density, you can try halving the occupancy. 9 Shift + W. Adds a water where the cursor is and at the center of the slab. The new water is in fitting mode, and its position can be adjusted using the middle mouse button. To quickly position the water, leave the Refine window up in a corner and, after you have added the water, click on the Translation button to real-space-refine the water into the center of the peak. Use a thin slab (5 A) to get the out-of-screen direction close enough for the translation to pull it in. Use the semicolon key to end the water fitting, or go on to the next one and use the semicolon key at the end of the water-adding sequence. 9 Shift + D. Deletes the last picked water/residue. Click on a water and then issue this command. If it's a water, the program deletes it. If it's a residue, the program asks for confirmation of the delete. You can undelete a water by going to the Model window, finding the deleted water, and deleting it again to toggle the DEL flag. 9 Automated water addition. First make a difference map. The resolution should be better than 2.3 A, and the map should show clear water peaks. Bring up the Waters... popup (Fig. 4.40) and set the minimum density to be the lowest you will accept for a water. Press Add Water and wait. The new waters are added to the Error p o p u p - - n o t because they are errors, but this gives you a quick way to navigate through the list of added waters to see if you agree with them. Be sure to save the model.
.....
4.5 . . . . .
INTERFACING TO OTHER PROGRAMS This section gives specific information for interfacing XtalView to popular software. Molscript. The xfit script commands Rotation and Translation can be used to set the viewpoint in Molscript. The translation is the inverse of Molscript's, but other than this simple change in sign, the commands can be cut and pasted into a Molscript control file. When a good view is found in xfit, use View/Make Script/Save and edit the resulting file to get the rotation and translation commands. Paste these into the Molscript file, reverse the translation, and run Molscript.
4.5 Interfacingto Other Programs
327
FIG. 4.40 Add water molecules with this window. Waters can also be renamed to match the nearest atom in another structure.
SHELXL. Using SHELX with XtalView is straightforward. Xprepfin writes out an h kl file and can be used to convert from various input formats. You can then m a r k this file for R-free tests using SHELXPRO. For the first refinement cycle use SHELXPRO with the I option to create the .ins file. After refinement read this and the .fcf file directly into xfit. After fitting, use SHELXPRO to update the .res file with the fit .pdb file to make the next .ins file. SHELXPRO has an XtalView c o m m a n d (X) to write a .phs file, but this is not needed with versions of XtalView after 3.1. Xfit reads and understands the ANISOU cards and thus keeps the anisotropic U's correct between cycles. D E N Z O . Xprepfin can prepare input from the scalepack output .SCA file by choosing Other as the input file type and D E N Z O I's from the Other menu. CCP4. The preferred way to use XtalView with CCP4 map files is to use the phase files as input and not m a p files. This will be faster, as well as saving disk space and allowing you to make omit maps on the fly. Starting with XtalView 4.0, xfit can read the structure factors directly from .mtz files as long as you use the standard CCP4 labels for FO, F C A L C and PHIB.
328
XtalViewTUTORIALS
However, there is a CCP4 map converter available from CCMS that was kindly provided by John Irwin. Send e-mail to c c m s - h e l p @ s d s c . e d u and ask. You can also convert CCP4 map files by FFTing them into structure factors and reading this into xfit. You can convert .mtz files with xprepfin or by means of the CCP4 program mtz2various. Several XtalView programs can write .mtz files, including xprepfin, xheavy, and xfit. PHASES. Xheavy writes an input file for PHASES that will get you most of the way to running PHASES. Look at the menu on the menu button for saving a phase file. In this way you can use xheavy for the heavy-atom location and refinement and then switch to PHASES. There is not much difference in the actual phases produced. XPLOR. Xprepfin can be used to generate an input file for XPLOR Fo~s data. If your native data has Bijvoet pairs, be sure to use the Average F1 and F2 option in xprepfin. You can switch the segment ID with the chain ID in xfit by using the options on the Files... window. Since the chain ID field in a PDB file is only one character, the first character in the seg ID is used. This allows one to get around the fact that XPLOR loses the chain ID.
XPLOR,TNT, PROLSQ,and OtherRefinementPrograms To make maps, first prepare your native data into an empty phase file with xprepfin by reading it in and setting the Fake Phs output option. This creates a file with 0.0 for the phase. N o w run xfit, loading the empty phase file and your latest refined PDB file. The FFT window will pop up, but don't hit the Apply button. Instead just set the map type you want (e.g., 2 m F o - DF~); then go to the SfCalc window and choose the Calculate All and Scale button. Xfit will calculate F~ and the phase, scale F,, to F~ (to put on an absolute scale), and then FFT your density. If you want to look at an Fo - F~ map as a second map, just reload the phases with the File window and repeat the procedure, setting the FFT type to F o - Fc.
5 PROTEIN CRYSTALLOGRAPHY COOKBOOK
.....
5.1 . . . . .
MULTIPLE I S O M O R P H O U S R E P L A C E M E N T
Multiple isomorphous replacement (MIR) is the oldest method of phasing proteins and is still very successful. The basic method has changed little since myoglobin was solved, although the detailed implementation has. The basic cycle of MIR phasing is diagrammed in Fig. 5.1. 1. Soak the crystals in heavy-atom solutions to scan for possible derivatives. Crystals that survive the soaking are tested to see if they still diffract. Those that do are scanned to see if they produce any changes in X-ray intensity. 2. If intensity changes are observed, a data set is collected. The first data should be collected quickly. If possible, a nearly complete data set to at least 5 A should be collected within 24 h or even faster. Many heavy-atom derivatives are unstable in the X-ray beam, and either the crystals quickly degrade or they change with time. Often the best data are from the first run, and even though the crystal still diffracts strongly, later data are found to have significantly lower phasing power. Continue collecting data if the crystal has not degraded. Frozen crystals are more stable (see Chapter 6), but freezing can also change the unit cell leading to significant nonisomorphism. 329
330
PROTEINCRYSTALLOGRAPHYCOOKBOOK l Soak
Crystal in Heavy .
.
.
.
.
.
.
.
.
.
.
.
Atom
Collect partial data
Solution 1 !
set
Evaluate differences: above 10%?~ . ~ Toss Finish Data
collection
!
Merge and Scale with native
J Evaluate s!atistics: Isomorphous?~Tossl--P If Phases available
Cross Fourier or else Make Patterson
I PattersonSolvable? ~ J Refine and Compute
Save ~
SIR phases I
Cross Fourier other derivatives to solve and put on same origin Refine solutions and Compute MIR phases
Map interpretable?
I No I
FIG. 5.1 Heavy-atom phasing scheme.
3. Evaluate the heavy-atom statistics. By looking at the statistics of intensity changes it is possible to tell whether the crystal is likely to be a good derivative. The overall percentage difference should be above 10-12 %. While you may solve and refine a derivative with weaker changes, it will have little phasing power; try resoaking to see if you can raise the percentage difference with longer soaks and/or higher soaking concentrations. The centric data should have larger differences than the acentric zones. The root-meansquare magnitude of the differences should fall off with resolution in roughly the same proportion as the scattering factor for the heavy atom (including the temperature factor). If the differences do not fall off, this indicates noise or nonisomorphism. 4. Solve for the positions of heavy atoms. It is necessary to solve at
5.1 MultipleIsomorphousReplacement
331
least one derivative's Patterson map. This may not be the Patterson map of the first derivative you find. After solving one Patterson map, you can solve the others by cross-phasing with the single isomorphous replacement (SIR) phases of the first. This also puts all the heavy atoms on the same origin. 5. Refine each heavy-atom solution. Look for additional sites. Be conservative about adding sites at this point. 6. With two or more derivatives you can co-refine the derivatives to improve the phases (if the derivatives share common sites this should be done cautiously). This gives you the first set of protein phases. 7. Make difference Fourier transforms of each derivative, preferably leaving this derivative out of the protein phase calculation to reduce bias. Look for new heavy-atom sites and confirm old ones. 8. Refine the updated solutions and co-refine to produce better protein phases. 9. Reiterate if necessary. 10. Calculate the protein electron density map and evaluate its quality. If the map is difficult to interpret, go back to step I and look for more derivatives or work to improve the ones that you already have. Be objectivemyou will not divine the structure from poor protein phases without considerable luck.
Example 1"Patterson from Endonuclease III Escherichia coli endonuclease III (Table 5.1) was solved at the Scripps Research Institute by MIR techniques. 1 The solution of a single-site derivative Patterson map is presented here. Endonuclease III crystals were soaked in thiomersal, an organomercurial, at I mM, for 2 days. Data were collected on an area detector and then merged with the native data. The isomorphous difference Patterson is shown in Fig. 5.2. The symmetry of the Patterson is m m m , orthogonal mirrors at 0 and 1/2 in all three directions. In space group P212121, there are three Harker planes arising from the three 21 screw axes (Table 5.1) at x = 1/2, y = 1/2, and z = 1/2. A heavy-atom site will give rise to three unique self-vectors on these Harker sections (plus all the peaks related by Patterson symmetry). Note in Table 5.1 that if there is a vector at 2x on one section it will be at 1/2 - 2x on the other. Thus, if we line up the Harker sections so that the common axes run in opposite directions and 0.0 is opposite 0.5, then the self-vectors on the two sections will line up (Fig. 5.2). On the Harker sections for this derivative there are just three
1Kuo, C. F, McRee, D. E., Fisher, C. L., O'Handley, S. F., Cunningham, R. P., and Tainer, J. A. (1992). Science 258,434-440.
332
PROTEIN CRYSTALLOGRAPHY COOKBOOK
0.0000
Y
Y=
0.5000
~
._~~ y
~~-
O. 5 0 0 0
~
L:H?. :
i P
II
o u t p u t f i l e
Note that the formatting statement is the same as that used in the C language. In general, any valid C expression can be used in awk. However, an awk program is much simpler to write and is interpreted rather than compiled.
Program 9 Awk can be used to do simple calculations. For instance, if we want to modify a phase file with records of h, k, 1, FO, FC, phi so that it has records of h, k, 1, 5FO - 3FC, Fc, phitomakea5Fo-3Fcmap: awk
'{$3
=
5*$3-3*$4;
print}'
inputfile
> outputfile
453
CrystallographicEquations in ComputerCode
Program 10 Another use of awk is to make decisions with i f statements. In this example we want to remove all data that is below 3~r from a file with records h, k, l, Fo, cr(Fo): awk '{if( outputfile
$4
>
3.0
*
$5)
print}
' inputfile
>
Program 11 This awk script can be used to find the m i n i m u m and m a x i m u m of a stream of coordinates x, y, z. Save the awk script into a file called m i n m a x . awk
' BEGIN{ {
xmin
=
99999;
ymin-
xmax
=
-9999;
ymax
=
99999;
zmin
=
99999;
-9999;
zmax
=
-9999;
if(
$i
zmax)
zmax
:
$3;
END { print }' $*
xmin,
ymin,
zmin,
$i;
}
xmax,
ymax,
zmax;
Now, if you needed to find the range of coordinates in a .pdb file, where x, y, z are fields 6, 7, 8, then you can use another awk c o m m a n d to print x, y, z and pipe this into minmax: awk
'/ATOM/{print
$6, $7, $ 8 } '
inputfile
I minmax
The / A T O M / field causes awk to ignore all records that do not match the pattern A T O M . In a .pdb file this will reject all n o n a t o m records that do not have any coordinate information.
Program 12 Awk can call mathematical functions such as s q r t ( ) , as in this script to find the distance between two atoms input as Xl, yl, zl, x2, y2, z2: awk
'{xd
-
$i
-
yd
:
$2
-
$5 ;
zd
-
$3
-
$6 ;
print
sqrt(
$4;
xd
* xd
+
yd
* yd
+
zd
* zd)
}'
$*
454
APPENDIXA
Program 13 XPLOR requires that you split up your protein into separate files before you can enter the coordinates. This can be easily done with awk. In this example, the protein is split into three files containing the protein, the heme prosthetic group, and the water: awk'/LYD ISERITHRIALAICYSIASPIGLUIPHEIGLYIHISIILEI LYS ILEUI MET IASN IPRO IGLNIARG IVAL ITRP ITYR/{print]' $i > protein.pdb awk '/HEM/{print}' $i > hetatom.pdb awk '/HOH/{print} ' $i > water.pdb
APPENDIX B Useful Web Sites
Practical Protein Crystallography II Web Site The Scripps Research Institute h ttp ://pp cII.s cripp s. edu This site has example data, updates, and other information. It will be updated periodically, so look here for changes to information in the book. If any of the URLs in this appendix go out of date, you can look here for updated ones.
XtalView online manual http://www.sdsc.edu /CCMS /Packages /XTALVIE W/XV1 TO C.html
Software Web Sites CCP4 http ://www.dl.ac.uk /CCP/CCP4 /main.html The CCP4 program suite is an integrated set of programs for protein crystallography that includes a large number of very useful tools.
455
456
APPENDIXB DENZO
h ttp-//www, h k l-xray, com/ The HKL suite is a package of programs intended for the analysis of X-ray diffraction data collected from single crystals.
DPS http-//bilbo.bio.purdue.edu /~-viruswww/Rossmann_home/rstest.html The Data Processing Suite (DPS) will be a complete package for data processing of crystallographic area detector data.
MAIN
http-//omega.omrf, ouhse.edu /doc/main /index.html MAIN is an interactively driven computer program dealing with computational parts of macromolecular crystallography.
MIDAS
http'//www.cgl.ucsf.edu /midasplus.html MidasPlus is an advanced molecular modeling system developed by the Computer Graphics Laboratory (CGL) at the University of California, San Francisco.
Molscript http.//www.avatar.se /molscript / MolScript is a program for displaying molecular structures, such as proteins, in both schematic and detailed three-dimensional representations.
MOSFLM
http.//wserv i .dl.ac.uk /S RS /PX /jwc_external /abs_mosflm_suite.html The MOSFLM suite of programs is designed to facilitate the processing of monochromatic X-ray diffraction rotation data.
UsefulWebSites
457
O http-//imsb.au.dk / - m o k /o/
O is a general-purpose macromolecular modeling environment. The program is aimed at scientists with a need to model, build, and display macromolecules.
PHASES
h ttp-//www, ims b.a u. d k / - m o k /p h as es /p h as es.h tm l
Bill Furey's phasing and density modification package.
PROCHECK
http'//www, biochem.ucl.ac.uk /~ roman/procheck/procheck.html
Checks the stereochemical quality of a protein structure, producing a number of PostScript plots analyzing its overall and residue-by-residue geometry.
PROTEIN
h ttp'//www, bio ch em.mpg, de /P RO TE I N /
The PROTEIN program system is an integrated collection of crystallographic programs designed for the structure analysis of macromolecules. Raster3D
http-//brie.bmsc.washington.edu /raster3d/
Raster3D is a set of tools for generating high-quality raster images of proteins or other molecules. REPLACE
h ttp-//como, b io. co lumbia, edu /tong /P ub lic /R ep lace /rep lace.h tml
A suite of programs for molecular replacement calculations, REPLACE currently consists of two major programs, GLRF for rotation function calculations and TF for translation function calculations.
458
APPENDIXB RIBBONS
h tpp-//www, cm c. uab. edu/rib bons / A program for drawing publication-quality pictures of protein structures as a smooth ribbon with space-filling and ball-and-stick representations, dot and triangular surfaces, density map contours, and text. Shake and Bake
http ://www.hwi. buffalo.edu: 8 0/Sn B / SnB is a computer program based on Shake-and-Bake, a direct-methods procedure for determining crystal structures. SHARP and Buster
http://Lagrange.mrc-lmb.cam.ac.uk / Programs for heavy-atom phasing and improving phases based on Bricogne's algorithms. SHELX
http ://linux.uni-ac.gwdg.de /SHE LX / Source of SHELX-97 software. The site contains tips and FAQs for running and using SHELX, as well as instructions for installation. SOLVE
http ://www.solve.lanl. gov/ Automated crystallographic structure solution for MIR and MAD. wARP and ARP
http://den.nki.nl /---perrakis /arp.html Automated map fitting and model building. WHATIF
http://swift.embl-heidelberg.de /whatif / WHATIF is a versatile protein structure analysis program that can be used for mutant prediction, structure verification, molecular graphics, etc.
UsefulWebSites
459
XPLOR, CNS http://atb, csb. yale. edu /
XPLOR is a widely used refinement program. CNS (crystallography and NMR system) is a package for phasing by MAD and MIR and refining structures with torsion refinement and maximum-likelihood targets. XTAL h ttp ://www-structure. bio. purdue, edu / ~ k vz /
The Purdue University XTAL Programs Library (PUXTAL) was developed as part of the macromolecular structure research effort. Since the 1960s, a series of crystallographic computing techniques has been developed at Purdue, and many of the Xtal programs have been extensively used in laboratories around the world. XTALVIEW http://www.scripps.edu /pub /dem-web /toc.html
XtalView is the software featured in this book; it is available for free download to academics and nonprofits.
Databases Metalloprotein Database h ttp ://meta llo.scripp s. edu
Protein Data Bank h ttp ://www. rcs b. o rg
The Prosthetic Groups and Metal Ions in Protein Active Sites Database h ttp ://bmbs gi l 1 .leeds.ac.uk /bmb knd /promise /
Synchrotrons 9 Advanced Light Source (ALS) h ttp-// www-a ls. l b l. g o v/
460
APPENDIXB 9 Advanced Photon Source http.//epics.aps.anl, gov/welcome.html 9 Cornell High Energy Synchrotron Source (CHESS) http'//www.tn.cornell.edu / 9 Daresbury Synchrotron Light Source http-//www.dl.ac.uk /SRS /index.html 9 European Synchrotron Radiation Facility (ESRF) h ttp-// www. es rf .f r/ 9 Hamburger SynchrotronstrahlungsLABoratorium HASYLAB (Hamburg) http-//www-hasylab.desy.de / 9 LURE http://www.lure.u-psud, fr/ 9 National Synchrotron Light Source (NSLS) h ttp.// www. ns ls. bnl. go v/ 9 Photon Factory, Japan http.//www.dl.ac.uk /SRS /index.html 9 Stanford Synchrotron Radiation Laboratory (SSRL) http.//ssrl.slac.stanford.edu /
Useful Information mmCIF
http-//ndbserver.rutgers.edu / N D B/mmcif/ Macromolecular Crystallographic Information File home page with CIF tools, description, and HTML dictionary, and data definition language. Kevin Cowtan's Book of Fourier
http-//www, yorvic, york.ac.uk /---cowtan /fourier/fourier.h tml Learn about the properties of Fourier transforms.
Heavy-AtomInformation 9 Bart Hazes' heavy-atom info page h ttp ://my cro ft.mmid.ualberta, ca. 8 0 8 0 /bart /derivatives /main.html 9 Enrico Stura's heavy-atom page http-//bmbs gi13.1eeds.ac.uk /wwwprg /stura/heavy.html 9 HEAVY-ATOM DATABANK h ttp'//bonsai.lif, icnet.uk/bmm/had/heavyatom.html
UsefulWeb Sites
461
Crystallization http://www.hamptonresearch.com / H a m p t o n Research Company home page, and crystallization resources and information.
http-//bmbsgi13.1eeds.ac.uk /wwwpgr/stura/cryst.html Enrico Stura's crystallization techniques page.
X-Ray AnomalousScattering http-//www, bmsc.washington, edu /scatter This page has tables of f ' and f" in a convenient periodic table format. Invaluable information for MAD experiments.
X-Ray EquipmentVendors 9 Bruker Analytical X-ray Systems
http://www.bruker-axs.com /index.html 9 Charles Supper Company
http://charles-supper.com / 9 Molecular Structure Corporation
http'//www.msc.com / 9 Oxford Cryosystems
http-//www. OxfordCryosystems.co.uk / 9 Polycrystal Book Services
http-//www.dnaco.net /-polybook /
CrystallographicAssociations 9 American Crystallographic Association
http-//nexus.hwi, buffalo.edu /A CA / 9 British Crystallography Association
h ttp ://gordon. cryst, b b k.ac. u k/B CA/index. h tm l 9 Crystallography World Wide
h ttp ://www. lmcp.juss ieu. f r/cww- top /crystal. index.h tml
462
APPENDIXB 9 International Union of Crystallography
http://www.iucr.ac.uk /welcome.html 9 SInCris Information Server for Crystallography
http-//www.lmcp.jussieu.fr/sincris / 9 World Database of Crystallographers
http://www.iucr.ac.uk /iucr-top/wdc/index.html
INDEX
Absolute configuration, 289 Absorbance, measuring, 2 Absorption, 90 correction, 76 Aconitase, 380 Adding, substrates, 264 water, 264 Additives, 435 if, see a l s o specific chemical Alcohols, 435 Alignment of crystals in loops, 421 microscope, 31, 32 optical, of crystal, 31 Amicon, 4, see also Centricon; Microcon Amino acid handedness, 159 stereochemistry, 226 Ammonium sulfate, 339 precipitation, 4 AmoRe, 113 CCP4, 113 Anaerobic apparatus, 21 crystal, 20 handling, 415 Analysis, mutants, 379 Anderson, D., 70 Anisotropic
B-value, definition of, 100 displacement parameter, 100 scaling, 117 thermal parameters, 205 ANOLSQ, 157 Anomalous difference Pattersons, 133 signal of, 133 Anomalous scatterer, 67 locating, 188 Anomalous scattering, 115,154 phasing with, 182, 189 Antimicrobial agents, 2 Area detector, 370, 66, 69 data collection, 76 ARP, 458 Arrows, for comparisons, 370 Artificial mother liquor, 342 Arvai, A. S., 60 ASCII files, 103 ASTRO, 67 Asymmetric unit, 57, 99 Atom field, PDB, 453 Auto fit, Xfit menu, 298 strategy, 299 Autoindex, 51, 70, 74 Automated solvent flattening, 176
463
464 Awk, script, 453 splitting files, 454
B-value, 196 definition of, 100 water, 264 Background, 43, 61 decreasing, 81 Bases, magnetic, 418,425-427 Batch method, crystallization, 11 Beam stop, 37 Beinert, H., 380 Bending wire, 422-423 Berthou, J., 171 Best phase, 150 Bijvoet data, 67 difference, 154 Fourier, 157 Patterson map, 133,284, 392 pair, 116, 119, 185, 188,389, 51, 76 expected differences, 185 Binary files, 103 Birefringence, 33 Blind region, 50 zone, 69 Block diagonal, 213 Blow, D. M., 147 Blow-Crick equations, 149 Blundell and Johnson, 34, .50 Bragg equation, 95 Bragg's Law, 60 Bricogne, G., 223 Brilliance, 38 Bruker, 77, 359 detector, 77, 81 Brunger, A. T., 193 Buffers, 2 Buster, 458
C-centering, 57 C6A mutant, 371 CCD detector, 188 CCMS, 271,275 CCP4, 104, 111,202 AmoRe, 113 crystallographic programs, 111
INDEX density modification, 112 DM, 112, 399 f2mtz, 400 FFT, 113 labels, 104 map calculation, 113 mLPHARE, 113,193 model validation, 114 molecular replacement, 113 MTZ format, 104 mtz2various, 280 PDB files, 113 PROCHECK, 114 reflection files, 112 REFMAC, 113 SFALL, 113 SFCHECK, 114 SFTOOLS, 114 SIGMAA, 113 Solomon, 112 tips, 114 web site, 455 XtalView data, 290, 327 CNS, 459 CORN, 159 CRC handbook, 435,442 CRYSTAL environment variable, 104 Calculation, of Pattersons, 127 corrections, film, 40 Canes, cryo-, 429, 431,432, 433,434, see also Cryocane Capillaries, 23, 418, 441 crystal-mounting, 24 quartz, 23 sealant, 24 source, 23 Cartesian coordinates, 218, 97 Center of symmetry, 125 Centering, 57 Centric phases, 145 R, 154 reflections, 70 Centrosymmetry, 145,359 Chain direction, 359 tracing, 232 Characterization, crystals, 39 Chasing the train, 360
INDEX Check reflections, 74 Chillers, 410 Choice of, wavelength, 185 C h r o m a t i u m v i n o s u m cytochrome c r, Patterson maps, 342 cis-aconitase, 382 Cocrystallization, 380,440 cold stream, 416ff, see also Cryocooler Collimator, 36 Collodicon, 2 Combined-phase coefficients, 218 Compounds, heavy atom, 19 Computer code, 445 portability of, 103 files, 102 Concentration, samples, 4 Conformations, multiple, 259 Connolly, M. L., 269, 379 Control, of temperature, 7 Cooling, of crystals (versus freezing crystals), 412 Coordinate analysis, 265 refinement, 193 systems, definition of, 96 Copper, X-ray sources, 35 Correct hand, 172, 359 Correlation coefficient, 285 function, 168 search, 146, 169, 194 Counting, of statistics, 88 Cowtan, K., 399,460 Cracked crystal, 64 Crane, B., 78 Crevices, and water, 264 Crick, F. H. C., 147 Cross peaks, 344 Cross-fourier, 145,344 Cross-linking crystals, 415 Cross-phasing, derivatives, 344 Cross-validation, 199, 209 Cruikshank, D., 210 Cryocane, 429,431,432,433,434 Cryocooler, 416ff Cryocrystallography, advantages of, 410ff equipment, 416 list of, 428ff Cryogenic safety, 415
465
Cryoprotectant, 410,411-415,423,435ff testing, 435 ff Cryoprotection, 411-415 Cryostat, 410, 416, see also Dewar Cryostream, 417 Cryosystem, 416ff Cryotongs, 427ff, 430 Cryotweezers, 428,430, see also Cryotongs Cryovials, 429ff Crystal aligning, 40 characterization, 39 determining if protein, 40 evaluating quality, 61 exposure time, 40, 64 grid screen, 8 growing, 7 large, 14 light and, 18, 33 in loops, 421 mounting, 7 multiple forms, 381 offsetting, 68 optically, 31 preparing, 23 quality of, 34 radiation damage, 37 resolution limit, 61 size needed, 64 slippage of, 30 twinned, 61 file, 104 growth, 339 mounting, 23, 24 avoiding crushing, 24 capillaries, 24, 30 drying, 24 illustrated, 28 filter paper, 24 illustrated, 26 loops, 422-424 loosening crystals, 24 resting, 29 simple rules for, 421,423-424 supplies, 23 quality, 61 sample purity, 16 storage, 18 trials, grid screen, 8 Crystalline precipitant, 4
466
INDEX
Crystallization, X-ray quality, 14 batch method, 11 equilibrium, 18 by first precipitating, 13 impurities, 1 incomplete factorial, 11 new conditions, 18 nucleation, 14 varying conditions, 14 web sites, 461 Crystallographic associations, web sites, 461 equations, 445 .cshrc, 104 CuA subunit, 388 CuK, X-ray sources, 36, 35 Cunningham, R. P., 331 Cysteine, Oxidation of, 379 Cytochrome c oxidase, 388 peroxidase, 370 D235E mutant, 370 Cytochrome c', 339, 360
d-spacing, definition of, 95 D235E mutant, cytochrome c peroxidase, 370 DENZO, 91 web site, 456 and XtalView, 327 DPS, 456 Data collection, 23,370 area detector, 76 diffractometer, 76 image-plate, 82 MAD, 389 strategy, 67, 79 filtering of, 116 indexing, 70 merging, 117 re-indexing, 72 scaling, 117 reduction, 87, 115 to parameter ratio, 214 Defrosting dewars, 434 Deicing tongs (hair dryer), 428 Density modification, 112, 151,176 unexplained, 264 Dental wax, 24
Derivatives, absolute configuration, 159 cross-phasing, 344 fine-tuning, 161 Desalting, 2 Detector distance, determining, 77 Dewars, 411,415-417, 427-428,430ff, 435 Diagonal zone, 57 Dialysis, 2 Difference Fourier, 138,145,264, 284 maps, 138,370 mutant, 375 Diffractometer, 64, 67 data collection, 76 geometry, 76 four-circle, 74 Diffuse scattering, figure 6.7 Diffusion, of cryprotectants, 440 heavy atoms, 19 Dihedral angles, side chains, 227 Dill, K. A., 266 Diluting seeds, 16 Dipping method, 440, see also Vapor equilibration method Directory, 102 Disorder, 264 Disulfides, 379 Dithoionite capillaries, 21 DM, 112, 399
E, error estimate, 147 Electron density gradient, 370 evaluating, 224 histograms, 178,224 map, fitting of, 217 skelotonizing, 232 Electrostatic terms, 196 Ellipsoid, anisotropic, 100 Enantiomorphic space groups, 160, 290 Endonuclease III, 331 Engh, R., 210 Epoxy, 24 Equilibration, of crystals, 413,435ff, 439-440 Equipment, cryocrystallography, 416 Error estimate, E, 147 estimation, 88
INDEX R-factor, and, 198 evaluating, 197 Estimation, of standard deviation, 210 Eularian angles, 163 identity operator, 165 Evaluation of crystal quality, 61 of errors, 197 maps, 223 Evaporation, from loops, 435 EXAFS, 186 scan, example, 391 Example, molecular replacement, 384 Expected differences, Bijvoet pairs, 185 Exposure time, 43, 51
FA coefficients, 191 Failing, see Trying Fast Fourier transform, 144 Fc, Fourier, 138 map, 138 Fe-S cluster, 382 Ferredoxin, 210 FFT, 113, 144 approximations of, 204 CCP4, 113 grid of, 144 FHLE coefficients, 135 Fiber loops, 419ff, see also Loops Fibers, 418-419, see also Mohair fibers; Nylon fibers; Rayon fibers; Silk fibers Fiducial mark, 74 Figure-of-merit, 144, 150, 151 File format .df, 106 .fin, 106 .hkt, 216 .map, 106 .phs, 106 .sol, 106 history file, 106 mmCIF file, 106 other formats, 106 PDB, 216 postscript, 106 SHELX, 106 TNT, 106 Xfit sequence, 302
names, 102 systems, 102 Film, 39, 68, 69 calculating corrections, 40 determining exposure, 43 measuring error, 43 developing, 74 marking, 39 reducing background, 43 scanning, 74 small-molecule, 40 X-ray, 73 Filter paper, 24 crystal-mounting, 24 Filtering, of data, 116, 127 Fine-tuning, of maps, 359 Fisher, C. L., 339 Fitting of electron density map, 217 general, 243 main chain, 250 and noise, 243 and phase bias, 250 and resolution, 243 side chain, 259 Fo, Fourier, 137 map, 137 Fo-Fc, fourier, 138,264 map, 138 2Fo-Fc map, 264 Fourier, 140 map, 140 omit map, 263 2mFo-DFc, Fourier, 142 map, 142 Focusing mirrors, 37 X-ray sources, 37 FORTRAN, 103, 104 Four-circle, diffractometer, 74 Four-fold (4-fold), 55 Fourier 2Fo-DFc, 142 Bijvoet difference, 157 difference, t38, 145,284 Fc, 138 Fo, 137 Fo-Fc, 138,264 of heavy atoms, 145 Kevin Cowtan's book of, 460
467
468
INDEX
Fourier (continued) techniques, 137 transform, program, 448,450 Fractional coordinates, 218, 97 Free radicals, 410, 413-414 Freeze thawing, 5 Freezing conditions, testing of, 436ff crystals, 427, 434ff of samples, 4 Friedel pair, 67 Friedel's Law, breakdown of, 116 FRODO, 218 Frost, method, for removal, from crystals, 430 Full-matrix, 210, 214 Furey, W., 176, 360
GRINCH, 233 Getzoff, E. D., 371 Ghost peaks, 145,151,161,344 Glutaraldehyde, for crosslinking crystals, 415 Glycerol, 412,435,437, caption for plate 7, flowchart 6.7 Goniometer, 31, 39, 40, 416,418,423,425ff Goniostat, 74, 78 Gradient electron density, 370 vectors, 376 Grid FFT, 144 screen, 8 Growing, crystals, 7
Hallewell, R. A., 371 Hamlin, detector, 77, 82 Hamlin, R., 70 Hampton Research, 13, 24 web site, 461 Hand choice of, 172, 359 Hanging drop, 8, 10 Harada, Y., 171 Harker peaks, 12 7 sections, 127, 133, 169, 331,342, 344 vector, 59 Harris, M., 70 HASSP, 131
Heat-shield, 428,429 Heavy atom, 64, 359 absolute configuration, 159, 289 area detector scanning, 64, 66 statistics, 66 strategy, 66 compounds, 19 soaks, 19 trials, 19 derivative, 65 finding more sites, 151 fine-tuning, 161 Fourier of, 145 handedness, 159,289 merging, 281 Pattersons of, 128 phase calculation, 285 phasing statistics, 153 refinement, 285 safety, 20 scanning for, 76 storage, 20 suggestions, 20 toxicity, 20 sites, refining, 351 soaks, 341 statistics, 120 web sites, 460 Helices, recognizing, 237 Helium, 416, 417 box, 36 path, 81 Helix, 248 Helliwell, J. R., 36, 84, 185 Hemostat, 420, 422,425,428,430, 431, 435,436 Hendrickson and Konnert, 193 Hendrickson, W., 147, 157, 185,223 Hendrickson-I.attman coefficients, 147, 158, 172, 223 Histogram electron-density, 224 modification, 178 History file, 106 Homologous structure, 161 Howard, A., 77 Humidity, 429 Hydrogen bonding, 266 riding, 205
INDEX Ikr, 115 Ice on crystals, 422,434 properties and theory, 411-412 formation, common strategies to avoid, 429, 434, 435ff, see also Testing cryoprotectants Ideal geometry, 195 Image plate, 69 data collection, 82 dynamic range, 82 erasing, 83 MAR Research, 82 Incomplete factorial, crystallization, 11 Increasing brilliance, 38 Inhibitors, 380 Initial crystal trials, 10 Insulin, 1 Integration, of intensities, 87 Intensity change, 65 integration of, 87 International Tables for Crystallography, 53, 57, 67, 99, 266, 448 International Union of Crystallography, 210 INTREF, 168 Iron-binding protein, 60, 66 Isocitrate- aconitase, 380 Isomorphism, 65 Isomorphous phasing, 146 replacement, 116, 128,145 signal, 120
lolles_ P._ 171
Large crystals, 14 Lattice packing, 265 Lattman, E. E., 147, 223 Lauble, H., 380 Laue photography, 51 white-radiation, 82 Lepock, J. R., 371 Lifchitz, A., 171 Light, and crystals, 18, 33 Limiting axis, 49 Local minimum, 195, 379 scaling, 117, 119, 188 symmetry, 171 Location, of anomalous scatterer, 188 Locking hemostats, 420, 422, 425,428,430, 431,435,436, see also Hemostats .login, 104 Loops discussion of, 419 making of, 420ff Lorentz correction, 50, 89 Lunes, 45 Luzatti plot, 210
Macroseeding, 15,339, 381 MAD, 182, 193 data collection, 389 example of, 388 as MIR, 190 phasing, 388 equations, 189 Magnetic bases, 425ff MAIN, 456 Mninchnin 6ttin~ )qD bounds, choosing, 218 Map, 2Fo- Fc, 140 2 m F o - D F c , 142 difference, 138 evaluating, 223 Fc, 138 fine-tuning, 359 Fo, 137 Fo-Fc, 138 omit, 142 sigmaA weighted, 142 MAR Research image plate, 82 Marker residues, 243 ................ M a p
130 K, 4 1 0 - 4 1 1 , see also Vitreous point Kennedy, M. C., 380 Kevin Cowtan's book of Fourier, 460 Kling, 412, see also Rayon fibers Kraulis, P. J., 365 Kretsinger, R., 206 Kuo, C. F., 331
Labeling, samples, 5 Laboratory, X-ray sources, 34
469
~ .......
o~
--
470 Mask and count, 87 Matrices, 100 Matthews coefficient, 57 Max-flux, mirrors, 39 McRee, D. E., 331,371 Measuring, absorbance, 2 Melting temperature, 379 Merging data, 117 Metatloprotein database, 459 Microcon, 2, 4 Microscope, 20 alignment, 31-32 bases for, 7 dissecting, 24 fiber-optic lights, 7 source for, 7 Microseeding, 16 MIDAS, 456 Miller indices, 97 Min-max, program, 453 Mini-map, 218,232 Minor sites, 151,344 locating, 344 MIR, 223 map, 242, 261,359, 360, 361 phases, 218 phasing, 329 Mirrors, 37 focusing, 37 increasing brilliance, 38 max-flux, 39 monochromating, 37 osmic, 39 Misindexing, 119 Missing data, 116, 370 MKT, 370 MLPHARE, 113,193 mmCIF, 104, 106 dictionary, 110 examples, 108 syntax, 108 web site, 460 Model validation, 114 Moews, P., 206 Moffatt, K., 53 Mohair fibers, 419 Molecular dynamics, refinement, 376 packing, 170 replacement, 113, 161 example, 380, 384
INDEX solution verification, 387 steps in, 384 MOLSCRIPT, 326, 365,456 MOLTAN, 133 Monochromator, 36, 37 Monoclinic, 70 Mosaic spread, 43,410, 413,435, figure 6.8, see also Mosaicity Mosaicity, 410, 413,435, figure 6.8, 61 MOSFLM, 91,456 Most probable phase, 150 Mother Liquor, 18 Mounting crystals, 7, figure 6.2, 422, 423-424, see also Crystal, mounting MRK residue, Xfit, 297 MS, 269, 379 MTZ format, CCP4, 104 Multiwavelength anomalous dispersion, 388 data collection, 188 phasing, 182 Multiple conformations, 259 isomorphous replacement, 145,329 MAD phasing, 193 Multiple data set, scaling, 119 Mutant screening for solubility, 8 studies, 370
N-terminus, finding, 359 Needles, 24 crystal, loops for, 421,423 for making loops, 420,422 New conditions, crystallization, 18 Nichrome wire, 429 Nickel filter, 36 Nickel foil, 36 Nielson, C., 70 Nodal, 51 Noncrystallographic symmetry, 171,175,359 Nonorthogonal cells, 61 coordinates, 187 Normal matrix, 211 Novel conformations, 257 Nucleation crystallization, 14 of ice, 412 Nylon, loops, 419
INDEX O, web site, 457 O'Handley, S. F., 331 Omit map, 142,261,371,377 Optical alignment, 31 analyzer, 31 removal, 287 space-group, 99 ORTEP, 206 Orthorhombic, 67 Oscillation method, 74 photography, 43 Osmic, 39 mirrors, 39 Osmolality, 44 lff Osmolarity, 419,441ff Osmotic shock, 441,442 Other formats, XtalView, 106,278 Oxidation, of cysteine, 379
P6, 68 PDB atom field, 453 file, program, 451 filter, program, 446 formats, 216 PEG, 339 4K, 18 20K, 4 and light, 18 PHASES, 176, 193,328,360, 457 and XtalView, 328 PROCHECK, 114, 209,457 and CCP4, 114 PROLSQ, 193,376 and XtalView, 328 PROTEIN, 457 Parge, H. E., 385 Partial spot, 66 Patterson coordinates, 97 map, 59, 125, 127, 151,191,344, 359 anomalous difference, 133 Bijvoet, 392 difference, 133,284 chromatium vinosum cytochrome c', 342 heavy atoms, 128 solving, 128,282, 331
471
radius, 163 space, 163 symmetry, 125 synthesis, 124 Patterson-correlation, refinement, 385 Pauling, L., 359 PDBfit, 172 Pentamers, 257 Peptide bonds, 227 Percent solvent calculation, 177 pH, 414-415,443 Phase bias, 142, 218, 261 and waters, 264 error examples, 225 file, 145 probabilities, 150 power, 153 Phi-psi angles, 227 plot, 198 and errors, 200 examples, of, 200 Philips, W. C., 36 Phodospirillum molishcianum cytochrome c', 360 Photoactive yellow protein, 334 Pins, 421ff, 424, 425-426, 427, 429, 430 Plasticine, 24 Platinum wire, 424ff Polar space groups, 68 Polarization corrections, 89 of light, 33 Polyalinine, 243 Portability considerations, 103 Positional uncertainties, 210 Postscript, 106 Precession camera, 39, 40 corrections for, 42 screens for, 42 photo, 44, 48, 57, 58, 60, 61, 66 Precipitant, 8,440 ammonium sulfate, 4 concentration by, 4 Preliminary characterization, 39 Preparation, of crystals, 23 Preste, L. G., 266 Profile fitting, 87
472
INDEX
Program, Fourier transform, 448,450 PDB file, 451 filter, 446 awk, 452 if statement, 453 coordinate transform, 445 electron density calculation, 450 filter, 445 hydrogen removal, 446 minmax, 453 reformatting, 452 resolution calculation, 446 simple calculation, 452 splitting files, 454 structure-factor calculation, 448 Protein conformations, 248 refinement, 195,263 sample, 1 solubility, 8 Protein Data Bank, 451 Pseudocentric, 359 phases, 145 Pseudosymmetry, 59
Quality, of crystals, 34 Quartz, capillaries, 23
R crystallographic, 98 R-factor, 98,198,218, 261,264, 385, 361,365 and errors, 198 search, 169, 194 R-free, 199,209 R-merge, 342 R-symm, 115,385, 76 equation for, 90 Radiation damage, 61, 76, 89, 410, see Free radicals Radicals, free, 410, 413-414 Ramachandran plot, 198 Random amplitudes, 13 7 phases, 137 R-factor, 98 Raster3D, 457 and Xfit, 319
also
Rayment, I., 30 Rayon fibers, 412,413 Real-space refinement, 243 Recrystallization, 5, 6 Redford, S. M., 371 Redissolving crystals, 6 Refinement checking, 209 coordinates, 193 cycles compared, 366 heavy atom, 285 sites, 351 molecular dynamics, 376 mutants, 376 rigid-body, 361 Patterson-correlation, 385 SHELX strategy, 215 simulated annealing, 376 software, 193 strategies, 196 very high resolution, 202 and waters, 264 weighting, 196 Reflection definition of, 93 sigma, 67 REFMAC, CCP4, 113 Ren, Z., 53,339 REPLACE, 457 Residual map, 151 Resolution bin scaling, 117 definition of, 95 limits, 217 Retrieval, of crystals, 427, 432ff Reversal, 65 Rhombic, 56 RIBBONS, 458 Ribbon diagram, 365,384 Richard's box, 232 Richards, E. M., 232 Richards, F. M., 267 Richardson, D. C., 232 Richardson, J. S., 232 Ridgelines, 232 Riding hydrogens, 205 Right-handed helices, 359 Rigid groups, 194 Rigid-body refinement, 194, 361 Ring planarity, 232 stereochemistry, 232
INDEX Rings, ice, plate 6.8, solvent, plate 6.8 Rini, J. M., 168 Robbins, A. H., 21,380 Rose, G. D., 266 Rotating anode, X-ray sources, 34 Rotation axis, 51 camera, 68 function, 385 geometry, 50 image, 45 matrix, 100 method, 74, 162 photography, 43, 49, 66 search, 162, 385 choosing resolution, 163 refining, 165 RSPACE, 67 RTD, 429
Safety cryogenic, 415 heavy atoms, 20 Sample concentration, 2, 4 freezing of, 2, 4 handling, 2 labeling, 5 logging, 2 protein, 1 storage of, 4 Sandwich box, 29 Scale factor, 117, 263 anisotropic, 117 local, 119, 188 multiple data set, 119 Scaling data, 117 resolution-bin, 117 Scanning, heavy atom, 64 Schwarzenbach, D., 210 Scissors, 24 Screening cryoprotectant conditions, 435ff Screens for, precession camera, 42 Screw axis, 59 Script, awk, 453 Sealant, capillaries, 24 Sealed tube, X-ray sources, 34
Searching, for cryoprotectants, 435ff Secondary structure, recognizing, 237 Seeding, 14 diluting seeds, 15 illustration of, 17 macroseeding, 14 microseeding, 14 serial dilution, 16 streak seeding, 15 to reduce twinning, 16 Selenium-methionine, 182 Self-rotation function, 175 example, 164 Self-vectors, 127, 344 Sequence, identification, 242 matching, 242 Series-termination effects, 176 errors, 217 SFALL, 113 SFCalc Window, Xfit, 322 SFCHECK, 114 SFTOOLS, 114 Shake-and-bake, 191,458 Shaking coordinates, 263 SHARP, 193,458 Sheet, 248 recognizing, 237 Sheldrick, G., 203,205 SHELX, 203,213,215,327, 408 features, 205 file format, 106 strategy, refinement, 215 web site, 458 and XtalView, 327 SHELXPRO, 216, 327 SHELXS, 131,133,191 Shipping dewars ('dry' dewars), 429 Side-chains, identifying, 245 Siemens (Bruker) area detector, 339, 385 Sigma, 115 estimating, 88 reflection, 67 SIGMAA, 113,210 weighted map, 142 weighting, 142 Signal, of anomalous difference, 133 Signal-to-noise, 81, 84 Silk fibers, 419 Sim weighting, 223 Simulated annealing, refinement, 376
473
474 SIR phases, 344 SIRAS, 157, 193 Sites, in common, 151 Sitting drops, 10 Six-fold (6-fold), 55, 58 Skelotonizing, electron density map, 232 Soaking crystals, 19,413,435ff, 439-440, see a l s o Equilibrating crystals Soaks, heavy atom, 19 SOD, 379 Software, refinement, 193 Solomon, 112 Solubility, grid screen, 8 Solutions, of heavy atoms, 19 SOLVE, 193 web site, 458 Solvent contrast, 224 flattening, 176,359 mask, 360 model, 206 rings, plate 6.8 accessible surfaces, 266 Solving heavy atoms with Fourier, 145 Patterson map, 128 Source, capillaries, 23 Space group definition of, 99 mistakes in, 59 origin, 99 specificity, 103 table, 54 determination, 53 enantiomorphic, 160, 290 symmetry, 67 Split side chains, 206 Split spot, 63, 64 Spot profile, 63, 64 Standard uncertainty, averaging, 213 calculating, 211 definition, 210 memory requirements, 214 Stanford Synchrotron Radiation Laboratories, 78 STAR, 106 Statistical analysis, 115 Statistics heavy atoms, 66, 120 mutant, 374 Stereochemical restraints, 195
INDEX Stereochemistry, 197 amino acid, 226 and fitting, 225 peptide bond, 227 phi-psi angles, 227 ring planarity, 232 side chains, 227 Stfact, 135 Stickel, D. F., 266 Still photo, 39, 63 Storage of crystals, frozen, 427ff, 430ff heavy atoms, 20 rings, 84, see also Synchrotron of samples, 4 Stout, C. D., 380 Strategy, data collection, 67 Streak seeding, 16 Subdirectory, 102 Substrate adding, 264 example, 380 Suckers, 25 Suggestions, heavy atoms, 20 Superoxide dismutase, 21,371 yeast, 385 Supplies, crystal-mounting, 23 Symmetry determination, 5 7, 59 noncrystallographic, 175 elements, 53 operators, 171 of Pattersons, 125 Synchrotron, 409,414,427 list, 459 radiation, 84 X-ray sources, 36 Systematic absence, 53, 57
TNT, 193,203,328 file format, 106 and XtalView, 328 Tainer, J. A., 331,371,385 Temperature, controlling, 7 Ten Eyck, L., 193 Terwilliger, T., 131, 193 Testing, of cryoprotectants, 435 ff freezing conditions, 435 ff Textpanes, 274
INDEX The Scripps Research Institute, 455 Thermistor, 429 Thermocouple, 429 Three-fold (3-fold), 55, 58 Time-resolved data collection, 86 Tongs, 427ff, 430, see a l s o Cryotongs Torsioning, 259 Toxicity, heavy atom, 20 Tracing the chain, 360 Transfer pipette, 25 Translation function, 168 search, 162, 169, 386 Trials, heavy atom, 19 Tronraud, D., 193 Tulinksy, A., 176 Turbulence, 422,429 Turns, 248 recognizing, 241 Tweezers, 23,420, 428,436 Twinning, 63 crystals, 61 image of, 63 Two-fold (2-fold), 55, 59
U, definition of, 100 Uij, 206 Ultrapurification, 5 Unique data, 116, 67 volume, 68, 70 Unit-cell determination, 59 UNIX, 104
Van der Waals packing, 266 radii, 264 Vapor equilibration method, 440, see a l s o Dipping method Varying conditions, crystallization, 14 Very high resolution refinement, 202 Vitreous, water (ice), 411-412,414,438,442
Wang, B. C., 176, 360 WARP, 458 Water, adding, 264 properties of, 411 when to add, 197
475
Web sites, 455 Weighting, refinement, 196 Weis, W., 190 WHATIF, 458 White-radiation, Laue, 51, 82, 85 Wild-type structure, 370 Wilson, B., 211 Wire, 420,422, 424ff bending tool, 423 Wyckoff, H. C., 76
X-ray film, 73 fluorescence, 172, 186 quality, crystallization, 14 sources
collimation, 36 copper, 35 CuK, 35, 36 film, 39 focusing, 37 increasing brilliance, 38 laboratory, 34 monochromation, 36 nickel-filtered, 36 reducing noise, 37 rotating anode, 34 sealed tube, 34 spectrum, 51 synchrotron radiation, 36 vendors, web sites, 461 XAS scan, 190, 388-389 examples, 398 Xcontur, 133,284, 343,344, 353 Xedh, 224,230 XENGEN, 59, 91,370 Xenon, 415 Xfft, 126, 127, 151,284, 344, 353,359 Xfit, 209, 216, 218, 231,263,267, 290, 359, 361,370 addition, of, constraints, 317 prosthetic groups, 315 assignment, of sequence, 302 atom stack, 292 Auto Fit menu, 298 strategy, 299 automated fitting, 297 automatic waters, 326
476 X-fit (continued) calculating structure factors, 323 center on atom, 292 chi angles, 313 combination, of phases, 307 contouring, of maps, 305 control window, 319 current map, 297 de novo model, 296,299 density modification, 308 editing waters, 325 error window, 325 fit-while-refine, 318 fitting operations, 293 a residue, 312 fixing main chain, 296 focus residue, 297 fragments, 295 geometry errors, 325 refinement, 316 go to N-terminus, 311 residue, 295 hiding, of maps, 311 identification, of sequence, 302 improvement, of phases, 308 ligands, 295,315 loading maps, 305 maps, and phases, 295 model window, 320 and Molscript, 326 mouse, 290 moving, along the chain, 311 MRK residue, 297 omit maps, 323 ordering fragments, 300 and other programs, 326 poly-Ala model, 301 popping chain, 311 postscript plots, 319 prosthetic groups, 315 and Raster3D, 319 real space refinement, 315 reversing chain, 300 SFCalc window, 322 saving model, 315 phases, 310 sequence file, 302
INDEX shake option, 323 shortcut menu, 311 showing maps, 311 solvent flattening, 308 spline maps, 295 splitting side chain, 321 starting new model, 296,299 torsions, 313 tracing the main chain, 297 typical session, 303 use, of mouse to fit, 293 phase files, 305 viewing, of thermal parameters, 318 Xheavy, 148, 150, 328,343 Xhercules, 132, 191,282 Xmerge, 118,281,370 Xmergephs, 145, 157, 344, 375 Xpatpred, 133,284, 343 XPLOR, 91,165,193, 196,203,218,263, 328,361,376,385,405 splitting files, 454 web site, 459 XtalView and, 328 Xprepfin, 277, 328 XRSPACE, 67 illustration of, 71 XTAL, 459 XtalView, 91, 99, 106, 144, 145, 161,271 CCP4 and, 327 crystal file, 104 data and CCP4, 290 example, 280 DENZO and, 327 downloading, 272 exporting data, 290 file formats, 104 help, 272 history file, 106 installation, 271 manual, 275 other formats, 278,279 and PHASES, 328 and PROLSQ, 328 preparation, of data, 277 requesting, 272 and SHELX, 327 saving messages, 274 and TNT, 328 textpanes, 274 tutorials, 271
INDEX and XPLOR, 328 web site, 459 Xtalmgr, 104, 275, 92 XTALVIEWHOME, 272 Xuong, N. H., 70 XView toolkit, 272 widgets, 293
Yeast, superoxide dismutase, 385 Yeates, T. O., 168
Z number, 57 Zone, 40
477
PLATE 1. Extended F map. In (A) is shown an Fsoaked-Fnative difference map after soaking in a ligand to displace a loop in cytochrome c preoxidase. The density for both conformations of the loop is present with the new, flipped out loop position weak. This makes it very difficult to fit. To isolate the density an extended F map is made with the coefficients Fnative + (Fsoaked-Fnative)/(estimated partial occupancy). A range of estimated partial occupancies is tried until the map at the old position is essentially flat. This map (B) now shows only the new position and is easier to fit. To refine the partial occupancies, SHELX was used with the PART command and the relative occupancies tied to a free variable.
PLATE 2. Examples of typical electron density at 2.7 A resolution in a region of f~-sheet. (A) MIR map, (B) 2Fo-Fc map with final model phases. Note that in A many of the details, such as the carbonyl bulges are obscured by noise. In fitting such details one must make use of correct peptide geometry and fit to the larger features.
PLATE 3. Bijvoet difference Fourier at 4.0 ~i showing sulfur density superimposed on the final model. It is always worth collecting Bijvoet pairs for your native data to locate the sulfur atoms in your protein. In this map of a 53,000 kD protein with a single heme about 2/3 of the sulfurs also have clear density in the Bijvoet difference Fourier. In the figure the heme Fe is in the upper left (labeled HEM 500 FE) and 2 sulfur peaks can be seen in the lower part of the figure on the methionine S~5atoms (labeled MET 408 SD and MET 400 SD).
PLATE 4. Closeup of the density around an imidazole bound into a cavity in CCP at 2.1 ~l resolution. The heme ligand, histidine, was removed and an imidazole molecule soaked into its place.
PLATE 5. 1.6 A density around the Fe-binding site in ferric binding protein. The tyrosine rings are just getting holes in their density.
PLATE 6. Why go to very high resolution? In this example of the heme ligand in cytochrome c peroxidase at 1.4 A resolution it was found that the peptide is significantly bent. In fact, the bend pulls the heme ligand in such a way as to keep the Fe below the heme plane, which will tend to keep the Fe five-coordinate and active in the resting state of the enzyme. This fact, which was hypothesized in 1979 by Joan Valentine, was missed in the 1.8 A resolution structure.
PLATE 8. Example of using thermal ellipsoids to find bad parts of the structure. (A) The density in the N-terminal region of the map. (B) The same area with thermal ellipsoids drawn on the model. Note how the thermal ellipsoids become large and jumbled in the region with no density.
PLATE 9. Example of thermal ellipsoids for side-chain extending into solvent (A) with weakening density toward the end (B). Notice how the ellipsoids gradually get larger along the length of the side-chain from top to bottom. At the top end is a water molecule (in red). Notice how it has roughly the same size as the end of the lysine, indicating a similar thermal parameter.
PLATE 10. (A) Nice density for a leucine side-chain at 1.4 A resolution with thermal ellip soids. (B) Density for another leucine with a split end.
PLATE 11. Xfit canvas. In the upper left is the gnomon showing the direction of the axes. In the center is a white cross indicating the center of rotation. Across the bottom of the canvas is a green ruler with the units in Angstroms. Across the very bottom is a status bar showing the last atom picked and its properties.
PLATE 14. (Chapter 6) Picture of a typical cryo setup. The crystal is mounted on the end of the goniometer in lower center at the end of a pin. The cold stream of nitrogen gas is blown out of the nozzle coming diagonally down from the upper right. The X-ray collimator is shown on the left. The microscope used to align the crystal is in the back. On the right is the X-ray detector.
PLATE 15.
Figure 6-7: The Effects of Varying Cryoprotectant Concentration
a) 100% water / 0% glycerol b) 95 % water / 5 % glycerol c) 90% water / 10% glycerol d) 80% water / 20% glycerol e) 70% water / 30% glycerol f) 65% water / 35% glycerol g) 60% water / 40% glycerol
Large ice diffraction peaks Ice peaks broadening The beginnings of ice rings Strong ice rings Ice rings begin to broaden No ice rings, but strong band of scattering No ice rings and the band of diffraction is broad and diffuse.
Note: Cryoprotectant solutions with high PEG concentrations will show diffraction rings due to the PEG at similar resolution to the ice rings. If these appear, try reducing the high molecular weight PEG concentration by adding a mixture of lower molecular weight PEGs. Figure modified from Figure 1 of Garman and Mitchell, J. Appl. Cryst. (1997) 29, 584-587, used with permission.