PRACTICAL PROTEIN CRYSTALLOGRAPHY SECOND EDITION
This Page Intentionally Left Blank
PRACTICAL PROTEIN CRYSTALLOGRAPHY SECONDEDITION
DUNCAN E. McREE Department of Molecular Biology The Scripps Research Institute La Jolla, California
With contributionsby PeterR. David Department of Structural Biology Stanford University Medical Center Stanford, California
ACADEMIC PRESS An Imprint of Elsevier San Diego London Boston New York Sydney Tokyo Toronto
This book is printed on acid-free paper. ( ~ Copyright 9 1999, 1993 by ACADEMIC PRESS All Rights Reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval sy~em, without permission in writing from the publisher. Permissions may be sought directly from Elsevier's Science and Technology Rights Department in Oxford, UK. Phone: (44) 1865 843830, Fax: (44) 1865 853333, e-mail:
[email protected]. You may also complete your request on-line via the Elsevier homepage: http://www.elsevier.com by selecting "Customer Support" and then "Obtaining Permissions".
Academic Press An Imprint of Elsevier 525 B Street, Suite 1900, San Diego, California 92101-4495, USA http://www.apnet.com
Academic Press 24-28 Oval Road, London NW1 7DX, UK http ://www.hbuk. co. uk/ap/ Library of Congress Catalog Card Number: 99-61577 ISBN-13:978-0-12-486052-0 ISBN-10:0-12-486052-4 PRINTED IN THE UNITED STATES OF AMERICA 05 06 07 08 09 BB 9 8 7 6 5 4
3
CONTENTS ,
,
,
,
,
,
,
,
Foreword to the Second Edition Preface to the Second Edition Acknowledgments xix
o
9 o
9 o
1
o o e e
,
0
,
,
,
,
0
,
xiii XV
9
LABORATORYTECHNIQUES 1.1 PreparingProtein Samples
1
History and Purification Exchanging Buffers 2 Concentrating Samples Storage of Samples 4 Ultrapurification 5
1.2 ProteinCrystal Growth
1 3
6
Protein Solubility Grid Screen Initial Trials 9 Growth of X-Ray Quality Crystals
1.3 CrystalStorage and Handling 1.4 CrystalSoaking 18 1.5 AnaerobicCrystals 20 o o e
9o
2
o o
9o
13
17
9
DATA COLLECTION TECHNIQUES 2.1 PreparingCrystals for Data Collection Crystal-Mounting Supplies
24
23
vi
CONTENTS Mounting Crystals 24 Drying Crystals 29 Preventing Crystal Slippage
2.2 OpticalAlignment 2.3 X-Ray Sources
30
31 34
Nickel Foil Filtering 36 Filtering by Monochromators 36 Focusing with Mirrors 37 Increasing Brilliance with Mirrors 38
2.4 PreliminaryCharacterization
39
Precession Photography 40 Rotation Photography 44 Blind Region 50 White-Radiation Laue Photography Space Group Determination 53 Unit-Cell Determination 60 Evaluation of Crystal Quality 61
51
2.5 Heavy-AtomDerivative Scanning with Film 2.6 OverallData Collection Strategy
64
67
Unique Data 67 Bijvoet Data 69 Indexing of Data 70
2.7 Overviewof Older Film Techniques
73
2.8 Four-CircleDiffractometerData Collection 2.9 Area Detector Data Collection
76
Increasing Signal-to-Noise Ratio
2.10 ImagePlate Data Collection
81
82
2.11 SynchrotronRadiation Light Sources
84
Differences from Standard Sources 84 Special Synchrotron Techniques 85 Time-Resolved Data Collection 86
74
CONTENTS
2.12 Data Reduction
vii
87
87 Integration of Intensity Error Estimation 88 Polarization Correction 89 Lorentz Correction 89 Decay or Radiation Damage Absorption 90
89
..... 3..e.o
COMPUTATIONALTECHNIQUES 3.1 Terminology 93 Reflection 93 Resolution 95 Coordinate Systems 96 R-Factor 98 Space Groups and Symmetries 99 Matrices for Rotations and Translation B-Value 100 Anisotropic B-Values 101
3.2 Basic ComputerTechniques
100
102
File Systems 102 Portability Considerations 103 Setting Up Your Environment 104 File Formats and mmCIF 104 CCP4 Crystallographic Programs 111
3.3 Data Reduction and StatisticalAnalysis
115
Evaluation of Data 115 Filtering of Data 116 Merging and Scaling Data 117 Heavy-Atom Statistics 120
3.4 The PattersonSynthesis
124
Patterson Symmetry 125 Calculating Pattersons 125 Harker Sections 127 Solving Heavy-Atom Difference Pattersons
128
viii
CONTENTS
3.5 FourierTechniques
13 7
Types of Fourier 137 Solving Heavy Atoms with Fouriers
3.6 IsomorphousReplacementPhasing
145
145
Heavy-Atom Refinement 145 Isomorphous Phasing 146 Heavy-Atom Phasing Statistics 153 154 Including Anomalous Scattering Fine-Tuning of Derivatives 161
3.7 MolecularReplacement
162
Rotation Methods Translation Methods
163 168
3.8 NoncrystallographicSymmetry
171
Self-Rotation Function
3.9 DensityModification
175
176
Solvent Flattening 176 Histogram Modification
3.10 MultiwavelengthData with Anomalous Scattering
178
182
Choice of Wavelengths 185 Collection of Data 188 Location of Anomalous Scatterers Phasing of Data 189
3.11 Refinementof Coordinates
193
Available Software 193 Rigid-Body Refinement 194 R-Factor or Correlation Search for Rigid Groups 194 Protein Refinement 195 Evaluating Errors 197 Very-High-Resolution Refinement with SHELX-97 202
188
CONTENTS
ix
Block Diagonal Calculations SHELX Refinement Strategy
3.12 Fittingof Maps
213 215
217
Calculating Electron Density Maps 217 Evaluating Map Quality 223 Fitting and Stereochemistry 225 Chain Tracing 232 General Fitting 243 Phase Bias 261 Adding Waters and Substrates 264
3.13 Analysisof Coordinates
265
Lattice Packing 265 Hydrogen Bonding 266 Solvent-Accessible Surfaces ..o..4
266
o...o
XtalView TUTORIALS 4.1 Installation 4.2 ObtainingHelp 4.3 XView
271 272
272 XView Widgets 273 The xtalmgr 274 Preparing Data 277 Merging Heavy-Atom Data 281 Patterson Solutions 282 Bijvoet Difference Pattersons 284 Difference Fouriers 284 Heavy-Atom Refinement and Phase Calculations 285 Absolute Configuration (Hand) 289 Enantiomorphic Space Groups 290 Exporting Data 290 Xfit 290
x
CONTENTS Atom Stack 292 Fitting with the Mouse 293 Additional Xfit Functions 295 Semiautomated Fitting 297
4.4 A Typical Manual Fitting Session with Xfit
303
Loading the Map 305 Contouring the Map 305 Combining Phases 307 Improving Phases 308 Saving Phases 310 310 Some Additional Xfit Features Fitting a Residue 312 Raster 3D 319 Model Window 320 SfCalc Window 322 Finding Geometry Errors 325 Editing Waters 325
4.5 Interfacingto Other Programs
326
XPLOR, TNT, PROLSQ, and Other Refinement Programs
9 9 9 9 9
5
o
328
9 9 9 9
PROTEIN CRYSTALLOGRAPHYCOOKBOOK 5.1 Multiple Isomorphous Replacement
329
Example 1: Patterson from Endonuclease III 331 Example 2: Single-Site Patterson from Photoactive Yellow Protein 334 Example 3: Two-Site Patterson from Photoactive Yellow Protein 33 7 Example 4: Complete Solution of Chromatium vinosum Cytochrome c' 339
5.2 Mutant Studies
366
Example 1: MKT D235E Mutant 370 Example 2: SOD C6A Mutant 371
CONTENTS
xi
5.3 Substrate-AnalogExample
380
Example: Isocitrate-Aconitase
5.4 MolecularReplacement
380
384
Example: Yeast Copper-Zinc Superoxide Dismutase 385
5.5 MultiwavelengthAnomalous Dispersion (MAD) Phasing 9 388 Example: CUA Subunit from T. thermophilus Cytochrome c Oxidase 388 XAS Scan 389 Data Collection 389 Patterson Maps 392 .....
6 oo.e.
CRYOCRYSTALLOGRAPHY:BASICTHEORYAND METHODS Peter R. David
6.1 Overview 6.2 Theory
410 411
6.3 Room Temperatureversus Low-Temperature Crystallography 413 6.4 CryogenicSafety 6.5 Equipment
415
416
Crystal Supports Making Loops
418 420
6.6 CrystalLoop Mounting Techniques The Use of Platinum Wire
423 424
Attaching Pins to Goniometer Heads
6.7 CrystalStorage Overview Lists of Materials
427 428
425
xii
CONTENTS Additional Practical Considerations 429 Manipulating and Storing Frozen Crystals Storing Crystals for Later Use 430
6.8 CrystalRemoval and Storage
432
434 Freezing Away from the Cold Stream Selecting a Cryoprotectant 435 Testing Cryoprotectant Solutions 435 Setup 436 Method 437 Rationale 438 Cryoprotectant Optimization 440
6.9 CrystalHandling
441
Osmotic Effects
441
APPENDIX A Crystallographic Equations in ComputerCode 445 APPENDIX B Useful Web Sites 455 Practical Protein Crystallography II Web Site 455 Software Web Sites 455 Databases 459 Synchrotrons 459 Useful Information 460 Heavy-Atom Information 460 Crystallization 461 461 X-Ray Anomalous Scattering 461 X-Ray Equipment Vendors 461 Crystallographic Associations Index
463
429
FOREWORD TO THE SECOND EDITION
The five years since the first edition of this remarkably useful book have been marked by a number of significant advances in protein crystallography. Synchroton radiation and cryocrystallography now routinely give data of much higher resolution and superior quality than the average datasets available in the past. The use of multiple wavelength anomalous dispersion and improved methods of locating heavy atoms has made high-quality experimentally determined phases much more common. Serious progress has been made on the direct phasing problem, to the extent that structures containing some metal atoms and data to better than 1.2 A resolution have a realistic chance of being solved with one dataset. Triclinic lysozyme, which contains 1000 atoms (all light atoms), has been solved ab initio by two methods. The increasing occurrence of atomic resolution diffraction data for protein molecules is changing the methods used for refinement. It is now becoming evident that the mathematical and physical model that describes the diffracting contents of a protein crystal is essentially the same as that which has stood the test of time for small molecules. The focus of this book, and one of the keys to its popularity, is the word "Practical" in the title. Duncan McRee has an excellent ability to strip a problem to its essentials and then cast those essentials into computer programs that can be used by people who are not expertsmwithout sacrificing the power and features required by people who are experts. His program XtalView, introduced with the first edition of this book, has rendered the arcane comprehensible. It has been demonstrated time and again that it is not sufficient for computer programs to be comprehensive and correct. If the ~ Xlll
xiv
FOREWORDTO THESECONDEDITION
task is complex, the user interface must be well designed or the program becomes a barrier instead of an aid. XtalView provides a clear, interactive, modular interface to the complex tasks of macromolecular crystallography. It thus gets the apparatus out of the way of the science and becomes a powerful tool both for the beginner learning the techniques and for the expert seeking higher productivity. The technical advances in macromolecular crystallography, particularly the advent of higher and higher resolution, improved phasing methods, and improved methods for highlighting problem areas of a structure, have required supporting changes in the software. In most cases these make the underlying science more discernable. XtalView has been enhanced to support atomic resolution structures, improve the phasing methods to include MAD (multiple anomalous dispersion) phasing, and greatly improve the ease with which models of new structures can be built. XtalView's comprehensive analysis tools have also proven useful beyond the crystallographic community. The program is increasingly popular with molecular modelers. XtalView, which is described in this book, is distributed through the Computational Center for Macromolecular Structure (CCMS) at the San Diego Supercomputer Center. Readers can obtain copies of the program through the CCMS web page at http'//www.sdsc.edu/CCMS. Support services are available by e-mail at
[email protected]. XtalView provides a modern user interface and runs on very inexpensive computer systems, including PC platforms running Linux. Because XtalView is free to academic users, the total cost of the crystallographic workstation is that of a personal computer with a reasonably powerful graphics card and a good-quality monitor. With XtalView, every postdoctoral and graduate student in a laboratory can have his or her own graphical workstation, with all of the corresponding increases in productivity. This point cannot be made too strongly. The productivity we are talking about here is the removal of barriers to the interactive exploration of ideas. I have great faith in the creativity of students. I consider it my great good fortune to be able to participate in the development and distribution of a tool that is capable of changing a whole field of science. Lynn F. Ten Eyck San Diego Supercomputer University of California, San Diego
PREFACETO THE SECOND EDITION
This book is a practical handbook for anyone who wants to solve a structure by protein crystallography. It should prove useful both to new protein crystallographers and to old hands. The topics covered in this book are well-tested, robust methods commonly used in our laboratory and in others. The well-informed crystallographer will note, however, that many techniques and methods are not mentioned or are mentioned very briefly. These have been omitted because of space, less common use, difficulty of application, the need for special expertise, or perhaps oversight on the author's part. The exclusion of a method should not be taken in any way as a disapproval; one book cannot possibly include everything. For the second edition, a new chapter on cryocrystallography, written by Peter R. David, has been added. Other topics that have been added or expanded are very high-resolution refinements, MAD phasing, a tutorial section on XtalView, and material on CCP4 software and mmCIF (macromolecular crystallographic information file), as well as minor changes throughout the book. When I wrote the first edition of this book in 1993, there were a few hundred entries in the Protein Databank; in 1999, the number is rapidly approaching ten thousand. I hope that the first edition had some small part in this explosion of protein structures solved by crystallography. Now we are looking forward to the day when the structures of most of the proteins in an entire genome will be solved. If you are reading this book, then you want to be part of this coming revolution in structural biology. XV
xvi
PREFACETO THESECONDEDITION
One exciting development is the dramatic drop in the cost of the computers needed for solving structures. An IBM PC running LINUX is more than adequate for all of the tasks of solving a protein structure and can be purchased for a minimal cost. Often an existing Windows computer can be converted to a dual boot Windows/LINUX machine and structures solved with freely available software. The book is divided into six chapters: (1) Laboratory Techniques, (2) Data Collection Techniques, (3) Computational Techniques, (4) XtalView Tutorial, (5) Protein Crystallography Cookbook, and (6)Cryocrystallography, by Peter David. There is no need to read the book in any order; chapters that use information from other chapters will reference them when necessary. In fact, the best use of the book may be to read the cookbook first and then refer to the other chapters as needed. Chapters 4 and 6 are new to the second edition, and there is new material in all of the chapters reflecting the advances since the first edition. In particular, freezing techniques, better detectors, and the wide availability of synchrotron beamlines have increased the resolution of the average protein structure considerably. To cover this new material on very high-resolution structure, refinement and analysis have been added. In Chapter 1, only cursory information on crystal growth is presented because several excellent texts on crystal growth already exist. A fair amount of attention is given to protein sample handling, because proper attention to the sample can often mean the difference between a successful protein and a failure. Proteins are delicate materials that demand special handling and are very difficult to purify in large quantities. Protein crystals are also very delicate and require special handling techniques different from those of small molecule crystals. Chapter 2 bridges the gap between the laboratory and the computer. Special emphasis is placed on the newer techniques using area detectors and synchrotron sources. Because of the high cost of these area detectors, the user will probably have access to only one. Experience has shown that the user will then come to regard the available detector as best and will defend it vehemently against all others. Emphasis is placed less on a specific system and more on general techniques relevant to all area detectors. Chapter 3 provides information about using computers and file systems and how theory translates into actual methods. Rules of thumb are provided throughout to serve as a guide. Like all rules, these are made to be broken and should not be taken too literally. The variety of software used by different groups is enormous and no book could hope to cover even a small portion of it. General information that should be applicable to most techniques is given. Although this book can never substitute for the individual manuals
PREFACETO THE SECONDEDITION
xvii
of each program, it does give guidelines that will allow the reader to make intelligent choices among the program options. To allow discussion of specifics, the XtalView system and CCP4 are used. Information on obtaining XtalView, CCP4, and other crystallographic software can be found at this book's web site at http://ppcII.scripps.edu. Chapter 4 is a tutorial for using XtalView to perform common crystallographic tasks and covers everything from making Patterson maps to automated map fitting. Chapter 5 contains examples drawn from the experience of the author and his colleagues, providing some examples of protein structures solved by various methods. These examples can be used as guides for the user's own projects and to give a feel for how to apply the varied methods. Real numbers are given as a basis for interpreting the user's own data. By following the examples in multiple isomorphous replacement, users can, with luck and perseverance, solve their own structures. For the second edition, new material on a MAD phasing example has been added. Chapter 6 covers the now standard method of collecting data on frozen crystals, including the apparatus needed and many practical pointers. Appendix A contains formulas commonly used in protein crystallography, but with a twist: the formulas are coded in both FORTRAN77 and C. These two languages easily account for 99% of all protein crystallographic software. This will be of great aid to users in writing their own software and in understanding other software. Also for those of us who understand a computer language better than we do math, this appendix explains the formulas. One goal of this book is to provide enough information for the computer neophyte to write a simple program to reformat or filter data. Unfortunately, because of the incredible variety of software available and the consequent large variety of file formats, this is a necessary skill. Appendix A provides information for writing programs that will continue to be useful with different operating systems and for other projects.
SuggestedReading Crystal Growth: McPherson, A. (1998). Crystallization of Biological Macromolecules. Cold SpringHarbor Laboratory Press, Cold SpringHarbor, New York. Crystallography: Stout, G. H., and Jensen, L. H. (1989). X-Ray Structure Determination: A Practical Guide, 2nd Ed. Wiley,New York. Ladd, M. S. B., and Palmer, R. A. (1985). Structure Determination by X-Ray Crystallography, 2nd Ed. Plenum,New York.
xviii
PREFACETO THE SECOND EDITION McKie, D., and McKie, C. (1986). Essentials of Crystallography. Blackwell Scientific, Oxford.
Protein Crystallography: Blundell, T. L., and Johnson, L. N. (1976). Protein Crystallography. Academic Press, San Diego. Wyckoff H., ed. (1985). Diffraction Methods for Biological Macromolecules, Methods in Enzymology, Vols. 114 and 115. Academic Press, San Diego. Carter, C. W., Jr., and Sweet, R. M., eds. (1985). Macromolecular Crystallography A and B, Methods in Enzymology, Vols. 276 and 277. Academic Press, San Diego. Drenth, J. (1994). Principles of Protein X-Ray Crystallography. Springer-Verlag, New York.
ACKNOWLEDGMENTS
My thanks go to a number of people for their assistance in preparing the manuscript for this book. In particular, I thank Emelyn Eldredge for giving me the opportunity and impetus to do a second edition. I thank David Stout, Yolaine Stout, Michele McTigue, and Pamela Williams for critically reading the manuscript and making many helpful suggestions. Any errors in the book are, of course, my responsibility. I also thank the large number of patient people in the Structural Biology Group at the Scripps Research Institute who have beta-tested XtalView for me. I hope this book will answer some of their questions. In particular, I thank David Goodin, John Tainer, Elizabeth Getzoff, Ian Wilson, Jack Johnson, Ethan Merritt, Chris Bruns, Cliff Mol, Jean-Luc Pellequer, Marieke Thayer, Yi Cao, Sheri Wilcox, Rabi Musah, Gerard Jensen, Melissa Fitzgerald, G. Sridhar Prasad, Brian Crane, Andy Arvai, John Irwin, Alex Shah, Gary Gippert, Arno Pahler, Nicole Kresge, Jacek Nowakowski, Robin Rosenfeld, John Blankenship, Nathalie Jourdan, Ward Smith, Paul Swepston, Phil Bourne, John Badger, and John Rose for their numerous comments and suggestions over the years. I thank Mark Israel and the folks at CCMS for several years of dedicated support of XtalView users while always remaining cheerful. I especially thank Mike Pique for keeping me up to date on programming and computer graphics. I have worked with many people over the years who have provided the intellectual stimulus, knowledge, and help that made this book possible. David Richardson was my Ph.D. advisor and started me on the right path. Jane Richardson has been a major inspiration over the years. Wayne Hendrickson took me under his wing and taught me much about anomalous xix
xx
ACKNOWLEDGMENTS
scattering, phasing, refinement, scaling, and critical thinking. Bi-Cheng Wang and Bill Furey taught me all about phase modification; Fred Brooks taught me the virtues of a good user interface and the true powers of computers. John Tainer and Elizabeth Getzoff completed my education in protein crystallography while I worked as a postdoctorate in their lab. Lynn Ten Eyck taught me all about the mathematics of crystallography and provided many good ideas. George Sheldrick has been an inspiration to keep programming despite the academic consequences, and his encouragement has been invaluable. XtalView distribution through the Center for Macromolecular Structure (www.sdsc.edu/CCMS) at the San Diego Supercomputer Center is funded by Grants BIR 93-31436 and 96-16114 from the National Science Foundation. Finally, I thank my wife, Janice Yuwiler, for her dedication and support to her sometimes trying husband, and her father, Art Yuwiler, for always believing in me. And last, but not least, I want to thank my three children, Alisa, Kevin, and Alex (one more than the first edition!) for putting up with me while I spent so many evenings late at work. Peter R. David, the author of Chapter 6, thanks Duncan McRee, E1speth Garman for critically reading the chapter, Roger Kornberg and Michael Levitt for their support and encouragement on difficult problems, Kerstin Leuther for many thingsmespecially for asking w h y m a n d finally, Mike Blum for starting him out on crystallography.
LABORATORY TECHNIQUES
.....
1.1
.....
PREPARING PROTEIN SAMPLES A protein sample must be properly prepared before it can be used in a crystal growth experiment. There are many ways that this can be done to accomplish the same goal: to put the protein at a high concentration in a defined buffer solution. Methods of preparing a protein sample that have worked well in our laboratory are outlined here, but if you know of a quicker, easier method then by all means use it.
History and Purification Since it is not uncommon for one batch of protein to crystallize while the next will not, it is vital to keep a history of each sample and to track each batch separately. Your records may provide the only clue to the differences between samples that produce good crystals and samples that are unusable. For example, there have been several cases where the presence of a trace metal is needed for crystallization. The most famous case is insulin. It seemed that the only insulin that would crystallize was that purified from material collected in a galvanized bucket. It was eventually discovered that zinc was required, and later it was added directly to the crystallization mix. In our lab
2
LABORATORYTECHNIQUES
any sample received is logged into a notebook with a copy of any letters or material sent with the sample. Samples should be shipped to you on dry ice and kept frozen at - 70 ~C until they are ready for use. Ask the person sending the sample to aliquot the protein into several tubes and to quick-freeze each one. This way you can thaw one aliquot at a time without having to repeatedly freeze-thaw the entire sample, which can damage many proteins. Keep a small portion of the sample apart and save it for future comparison with samples that do not crystallize or crystallize differently. Always keep protein samples at 4~ in an ice bucket to prevent denaturation and to retard bacterial growth. Perform all sample manipulations at 4~ either in a cold room or on ice. When the samples are finally set up they can be brought to room temperature. Proteins are usually stabilized by the presence of the precipitants used in crystallization, and agents are added to retard microbial and fungal growth. A common antimicrobial agent is 0.02% sodium azide. Other broad-spectrum cocktails sold for use with tissue culture are quite effective.
Exchanging Buffers If the desired buffer of the sample is not already known, the sample should be placed in a weak buffer near neutrality. A good choice is 50 mM Tris-HC1 at pH 7.5 with 0.02% sodium azide. Some proteins will not be stable at low ionic strength, so a small portion should be tried first to see if a precipitate forms. Be sure to wait several hours before deciding if the sample is stable. Observe the sample in a clear glass vial and hold it near a bright light to detect any cloudiness in the sample. There are two methods for exchanging the buffer solution of the protein sample: dialysis and use of a desalting column. The desalting column is the fastest method, and if disposable desalting columns such as a PD-10 column from Pharmacia-LKB is used, it is very convenient. A single pass through the column will remove 8 5 - 9 0 % of the original salt in the sample and, if this is not enough, two passes can be used. Every time the sample is passed over the column it will be diluted about 50%. Unless the sample is very concentrated to begin with, it may be necessary to concentrate the sample after desalting. Dialysis on small volumes is best carried out in finger-shaped dialysis membrane or with a Microcon (Fig. 1.1). Always soak the membrane first in the buffer to remove the storage solution. A minimum of two changes of dialysis buffer 12 h apart is recommended. The dialysis buffer should be stirred and should be at least 100 times the sample volume. Use membranes with a molecular weight cutoff less than half the sample molecular weight;
3
1.1 PreparingProtein Samples
Dialysis m e m b r a n e
/ Flask
solution
.
FIG. 1.1 Dialyzingsamples to exchange buffers. otherwise you risk losing a substantial portion of the sample. When removing the sample, wash the inside of the membrane with a small amount of buffer to recover the sample completely.
ConcentratingSamples First measure the starting concentration of the sample. The simplest way is to measure the absorbance of a 50-times dilution of the sample at 280-nm wavelength and assume the concentration is simply the absorbance times 50. While this method is not very accurate, it is reproducible, and the sample should be pure enough to warrant the assumption that all absorption is due to the protein. 1 Always use buffer as a blank and check the buffer versus distilled water to be sure it does not have significant absorption. Some buffers have a significant amount of absorption at 280 nm, which can greatly reduce the accuracy of different absorbance measurements. The diluted absorbance must be below 1.0 or the measurement will be inaccurate. Concentrate the sample to 1For further information on determining protein concentration, see Scopes, Robert K., and Canter, Charles R., eds. (1994). Protein Purification, Principles and Practice, 3rd Ed. Springer Verlag, New York.
4
LABORATORYTECHNIQUES
1 0 - 2 0 mg/ml. If you have enough sample, it is better to concentrate to 30 mg/ml, wash the concentrator with one-half the sample volume, and then add the wash to the sample to make a final concentration of 20 mg/ml. A Centricon is one of the best ways to concentrate the sample. An Amicon will also work well. Another method is to dialyse against polyethylene glycol 20,000 (PEG-20K) using a finger-shaped dialysis membrane. The advantages of this method are that it can be combined with dialysis and that the same membrane used to dialyze the sample can be transferred directly to the PEG-20K for concentrating. The dialysis tube can be put directly onto solid PEG-20K. The water in the sample will be quickly removed, so check the sample often. However, be aware that PEG is often contaminated with salts and/or metals and this may or may not be desirable. In many cases, though, such contamination has actually contributed to crystallization. You may want to keep a sample of the particular batch of PEG you use. If, in the future, a new batch causes problems, you can analyze the differences. Another method often used is precipitation with ammonium sulfate. If an ammonium sulfate step has already been used in the purification procedure, this may be an easy way to achieve a high concentration. You will want to use a high level of ammonium sulfate to ensure that the entire sample is precipitated. The ammonium sulfate should be added slowly to the solution while kept cold. Let the solution sit for at least 30 min after all ammonium sulfate has dissolved. Spin down the precipitate and remove the supernatant. The pellet can be redissolved in a small amount of buffer. However, since the pellet will contain some salt, a dialysis step will be needed before the protein is ready. It is not uncommon for a protein to precipitate at high concentrations. If this happens while you are concentrating, add buffer back slowly until all the sample has dissolved. Raising the level of salt by using a more concentrated buffer or by adding sodium chloride can often help stabilize protein solutions. If a precipitate forms, examine it carefully to make sure it is not crystalline. Amorphous precipitates are cloudy and have a matte appearance. Crystalline precipitants are often shiny and if from a colored protein are brightly colored with little cloudy appearance. Two proteins have been crystallized in our lab accidentally during concentration. One was found in an ammonium sulfate precipitation step and the other during concentration on an Amicon to lower the salt concentration.
Storage of Samples The entire sample will not be used all at once, and the remaining protein solution should be aliquoted and stored frozen at - 7 0 ~C. Divide the sample into 100- to 200-/~1 aliquots in freezer-proof tubes (not glass, which
1.1 PreparingProteinSamples
5
FIG. 1.2 Storageof samples. The procedure involves (1) aliquoting samples into several tubes, then (2) quick-freezingeach sample in an acetone-dry ice bath and storing at - 70~C. becomes brittle and shatters at low temperatures) and quick-freeze the tubes in an acetone-dry ice or liquid nitrogen bath (Fig. 1.2). Label each tube with the date, a code to identify the sample and the particular batch of the sample, and your initials. Cover the label with transparent tape to prevent the ink from rubbing off when you handle the frozen tubes later. Place the tube in a cardboard box and store in the freezer. It will harm protein samples to be freeze-thawed; although often they may withstand several cycles of freezethawing, it is best not to find out the hard way. Thaw the samples in an ice bucket or the cold room when they are to be used. If some sample is left over and it will be used the next day, it can usually be stored at 4~ overnight.
Ultrapurification While it is beyond the scope of this handbook to cover purification techniques, the crystallographer has one special technique that is usually not tried by others to further improve the sample: recrystallization (Fig. 1.3). We will assume that you have succeeded in finding conditions that will grow small crystals but are having trouble growing larger ones. It may be worthwhile to recrystallize the sample to improve the purification. A large sample
6
LABORATORYTECHNIQUES
FIG. 1.3 Redissolvingcrystals. of the protein can be set up in the crystallization mixture and seeded with a crushed crystal. After crystals have grown, you may wish to add slightly more salt to push more protein into the crystalline state. Gently centrifuge down the crystals, or allow them to settle by gravity, and remove the supernatant. Resuspend the crystals in a mother liquor higher in precipitant by about 10% to avoid redissolving and to wash them, and then remove the supernatant. Resuspend the crystals in distilled water to dissolve them. If you have a large amount of precipitate present with the crystals, this method will not remove the precipitate unless it settles the crystals. In these cases, the crystals can be resuspended in 2 ml of artificial mother liquor in a petri dish, then picked up individual crystals with a capillary and manually separated them from the precipitate. For the crystals to redissolve well, they should be freshly grown. Old crystals that still diffract well often will not redissolve even in distilled water because the surface of the crystal has become cross-linked. This is especially true of crystals grown from polyethylene glycol.
.....
1.2 . . . . .
P R O T E I N CRYSTAL G R O W T H
Several excellent texts have been published on methods for growing protein crystals (see Suggested Reading in the preface) and I will not repeat
1.2 ProteinCrystalGrowth
7
this material here except briefly, to add some of our own experience. Like fine wine, protein crystals are best grown in a temperature-controlled environment. M o s t cold rooms have a defrost cycle that makes them especially poor places to grow crystals. Investing in an air conditioner for a small r o o m to keep it a few degrees colder than the rest of the laboratory is the best way to keep a large area at a constant temperature for crystal growth. To ensure that the r o o m is tightly regulated, get a unit with a capacity larger than needed. Another alternative is to use a temperature-controlled incubator. However, a r o o m is best because you will need to examine your setups periodically at a microscope. In a r o o m everything can be kept at the same temperature. Invest in a good dissecting stereomicroscope and remove the lightbulb in the base. Substitute a fiber-optic light source so that the base does not heat up and dissolve your crystals as you observe them. Even with the fiber-optic source be careful not to put the setup down near the fiber-optic light source, which gets hot during operation. Have a Plexiglas base built over the dissecting scope base (Fig. 1.4) to provide a large surface on which to place setups so that they do not fall off the edges during examination. This will also provide a base to steady your hands during delicate mounting procedures.
Fiber-optic light source
Plexiglas cover
Z
FIG. 1.4 Modifieddissecting scope. A Plexiglas base is put over the scope to make a larger area, providing a place to rest your hands during mounting operations and prevents tipping hangingdrop plates over the edge. A fiber-optic light source is used instead of the built-in light to prevent the base from heating and damaging crystals.
8
LABORATORYTECHNIQUES
Protein Solubility Grid Screen Before embarking on trials of a protein, we routinely screen it for solubility with a grid screen invented by Enrico Stura. 2 The grid consists of a number of common precipitates and a wide range of precipitant concentrations as well as a wide range of pH values (Fig. 1.5). To make the grid screen, we make up a 100 ml of stock solution of the highest concentration of precipitant in row D in the buffer listed and store the solutions in lighttight bottles (wrap with aluminum foil or keep in a dark place). Make up a 4 • 6 Linbro plate 3 with 1 ml in each well of the solutions as shown in Fig 1.5, diluting the stock solution with buffer as appropriate toward the top of the grid. The top edges of the grid are liberally coated with vacuum grease to make a seal with a 22-mm siliconized circular glass coverslip to be added in the next step. Make a hanging drop of protein solution over each well by mixing 5 #1 of protein solution ( 1 0 - 2 0 mg/ml) with 5/~1 of well solution in the center of a coverslip and then quickly invert the coverslip with forceps and place the coverslip over the well and press into the grease seal. Make sure the slip is sealed completely with grease by looking for air gaps in the seal. After preparing all 24 wells, place the tray in a dark place with a constant temperature. We use large incubators 4 set at 17-22 ~ C for crystal trials in our lab, although a well-air-conditioned room can also be used. Check the plates for precipitation soon after setting up and the next day, by observing the drop on the coverslip with a dissecting microscope. What you are looking for as you scan down a row are rows where relatively clear drops turn cloudy with precipitate as you go up the row. The midpoint of this transition is where you want to start crystal trials. You want to avoid precipitants where every drop is fully precipitated--these precipitants may specifically interact with the protein. You also want to avoid precipitants that never precipitate. If you are lucky you may find a well with crystals. Armed with this solubility information, you can make intelligent choices of precipitant concentrations to start with. Unstable proteins will form precipitant in every condition regardless of precipitant or concentration. In this case, you need to find conditions to stabilize the protein for at least 24 h or you stand little chance of growing crystals. Possible stabilizers are lower-temperature metal ions, cofactors, non-ionic detergents, glycerol, and ligands. If a number 2Stura, E. A., Satterthwait, A. C., Cairo, J. C., Kaslow, D. C., and Wilson, I. A. (1994). Reverse screening. Acta Crystallogr. D50, 4 4 8 - 4 5 5 . 3These plates and other crystallization supplies as well as excellent material on crystallization can be obtained from Hampton Research, Irvine, California, http://www.hampton research.com. Another source is Emerald Biosciences. 4These incubators need to be of the type that can both cool and heat to keep a constant temperature so close to room temperature. Heat-only incubators designed for bacterial growth at 37 ~ C are not suitable.
9
1.2 ProteinCrystalGrowth Precipitant
Buffer
1
2
3
4
PEG 600
PEG 4K
PEG 10K
(NH4) 2 SO 4 PO 4
Citrate
15%
10%
7.5%
0.75M
0.8M
0.75M
24%
15%
12.5%
1.0M
1.32M
1.0M
33%
20%
17.5%
1.5M
1.6M
1.2M
42%
25%
22.5%
2.0M
2.0M
1.5M
0.2M imidazole malate pH 5.5
0.2M imidazole malate pH 7.0
0.2M imidazole malate pH 8.5
0.15M sodium citrate pH 5.5
Nail 2 PO 4 K2HPO4 pH 7.0
FIG 1.5
5
6
10mM
sodium borate, pH 8.5
Enrico Stura's grid screen.
of possible variants of a protein are candidates for crystal trials, such as different species or mutant constructs, the grid screen can be used to select the best candidates for further trials. Look for proteins that exhibit sharp transitions from clear to cloudy drops. We have used the g r i d t o screen for solubility of a membrane-attached protein that was engineered to be soluble by screening various constructs to test hypotheses about how the protein was attached to the membrane. In our initial attempts all the drops were cloudy, and after many rounds of mutagenesis we found a mutant that showed a clear transition from soluble to precipitate in high salts. Prior constructs precipitated within 24 h in all the conditions and never produced crystals. The soluble mutant eventually produced large crystals that we subsequently used to solve the structure.
Initial Trials In all aspects of protein crystallography except initial crystal trials, the more past experience you have the better. Beginner's luck is definitely a factor in finding conditions for crystallizing a protein the first time. This is partly because beginners are more willing to try new conditions and will often do naive things to the sample, thus finding novel conditions for crystal growth. This is also because no one can predict the proper conditions for crystallizing a new protein. There are conditions that are more successful than others, but to use these exclusively means that you will never grow crystals of proteins that are not amenable to these conditions. So fiddle away to your heart's content. What is needed is to observe carefully what does happen to your
10
LABORATORYTECHNIQUES
sample under different conditions and to note carefully the results. The least experienced part-time student can outperform the most expensive crystallization robot because he or she has far more powerful sensing faculties and reasoning abilities. Leave the "shotgun" setups to the robots. Having said all this, I present in Table 1.1 a recipe to use for initial trials. The most commonly used methods for initial crystal trials are the hanging-drop and sitting-drop (Fig. 1.6) vapor-diffusion methods. The batch method can actually save much protein if done properly. In the hanging-drop method many different drops are set up. Most of these will never crystallize. It is hoped that just the right conditions will be hit upon in a few of the drops. A method that I have used successfully for many years is to place a small amount of protein in a 1/4-dram shell vial with a tightly fitting lid (caplug). Small aliquots of precipitant are added slowly. After each addition, the shell vial is tapped to mix the samples, then held up to a bright light. When the protein reaches its precipitation point, it will start to scatter light as the proteins form large aggregates that cause a faint "opalescence." Slowly make additions to the sample, waiting several minutes before each addition to avoid overshooting the correct conditions. If two vials are used, they can be leapfrogged so that when one reaches saturation, the other will be just below. For example, set up the first vial with 20 #I of protein plus 2 #1 of precipitant and the second with 20 btl of protein plus 4/_tl of precipitant. Then add 4/zl to the first and then 4/.tl to the second, and so forth, so that one of the vials is always 2/.tl ahead of the other. When opalescence is achieved in one vial, put both away overnight to be observed the next day. If the precipitation point is overshot, a small amount of water may be added to clear the precipitate
TABLE 1.1 Conditions for Initial Trials
Precipitant
Concentration range
Additives
Polyethylene glycol 4000
10-40% w/v
0.1 M Tris, pH 7.5
Polyethylene glycol 8000
10-30% w/v
0.2 M Ammonium acetate
Ammonium sulfate, pH 7.0
5 0 - 8 0 % saturation
Ammonium sulfate, pH 5.5
5 0 - 8 0 % saturation
Potassium phosphate, pH 7.5
0.5- 2.5 M
2-Methyl-2,4-pentanediol Low ionic strength sodium citrate
15-60% Dialysis" 0.5-2.5 M
50-200 mM Potassium phosphate, pH 7.8
FIG. 1.6 Crystal setup using ACA plates. A cross section through a single well is shown. (A) The lips of the wells are greased where the coverslips will later be placed. High-vacuum grease can be used alone or mixed with about 20% silicone oil. The addition of oil makes the grease less viscous so that it flows more easily. (B) The lower coverslip is pressed into place. Make sure there are no gaps in the grease for air to leak through. (C) Place the reservoir solution in the well. (D) Put the protein solution onto the lower coverslip. (E) Carefully layer the precipitant (often some of the reservoir solution) onto the protein. (F) Mix the two layers together quickly by drawing up and down with an Eppendorf pipette. (G) Put on the upper coverslip to seal the well completely. Again check that there are no gaps in the grease. Wait several hours to several weeks for crystals to appear.
12
LABORATORYTECHNIQUES
and the precipitation point can be approached again more slowly. This method is less wasteful because it allows a finer searching of conditions in just two shell vials, substituting for the large number of hanging drops needed to do as fine a scan. It also encourages more careful observation of the samples. Finally some proteins do not fare well during the evaporation that occurs in hanging drops. It is impossible and impractical to systematically scan every possible precipitant that has been used for growing protein crystals. Therefore, another approach is an incomplete factorial experiment, s A small subset of all possible conditions is scanned in a limited number of experiments by combining a subset of solutions. These drops are scanned for crystals or promising precipitates. If anything is found, then a finer scan can be done to find better growth conditions. A particularly successful version of this method was developed by Jancarik and Kim6 and has been optimized to 50 conditions combining a large number of precipitants and conditions. A kit is available from Hampton Research that contains all 50 solutions premixed, so all one has to do is set up 3-5/.tl of protein sample with each of the solutions. This method recommends that you first dialyze the protein against distilled water to allow better control over pH and other conditions. Try this on a small sample first. Many proteins will not tolerate distilled water and will precipitate (or sometimes crystallize). Use as low a concentration of buffer as you can. Phosphate buffers will give phosphate crystals in several of the drops that contain divalent cations. We have used this method with some success. While you may not get usable crystals on the first trial, you may get some good leads. Some of the drops may stay clear for a couple of weeks. You can raise the precipitant concentration in the drop by adding saturated ammonium sulfate to the reservoir (but not to the drop). This will cause the drop to dry up somewhat. More ammonium sulfate can be added until the drop either precipitates or crystallizes. Another crystallization method not often tried approaches the crystallization point from the other end by first precipitating the protein and then slowly adding water until the critical point is reached. Often microcrystals are formed when the protein is precipitated. As the precipitant is lowered, protein is redissolved and crystals large enough to see may grow out of the precipitate using these microcrystals as growth centers. Also, if the excess solution is removed from the precipitant, the result is a high concentration of the protein, which may force crystals. This can be done on a micro basis using a variation of the hanging drop. For example, to use this method with SCarter, C. W., Jr., and Carter, C. W. (1979). Protein crystallization using incomplete factorial experiments.J. Biol. Chem. 254, 12,219-12,223. 6Jancarik, J., and Kim, S.-H. (1991). Sparse matrix sampling: A screening method for crystallization of proteins. J. Appl. Crystallogr. 24,409-411.
1.2 ProteinCrystalGrowth
13
ammonium sulfate, mix the drop with 10/.tl of protein sample and 3 ~1 of saturated ammonium sulfate and set this over a reservoir of saturated ammonium sulfate. The drop should dry slowly and the protein precipitate, which will give a final concentration about three times higher than at the start. Every few hours add some water to the well to lower the ammonium sulfate concentration. Keep careful track of the amount added. If the drop starts to clear, slow the addition down to once a day and add water very slowly. I have grown a large number of crystals using this method. Although they are rarely suitable for diffraction, they can be used as seeds to grow better crystals. This method allows searching a large number of conditions with a small amount of sample. Never give up on a setup unless it is completely dried; it may take several months for crystals to appear. Proteins that are not stable in buffer are often stabilized by high precipitant concentrations. Also, the presence of precipitant in a setup does not preclude crystallization. Often a crystal will grow from the precipitant. Nucleation is a rare event and may require a very long time to occur if you are near, but not right on, the correct crystallization conditions.
Growth of X-Ray Quality Crystals The elation that you experience following the appearance of the first crystals of a new protein can be short-lived. It is often discovered that the first crystals grown are of insufficient quality to use for data collection. A long series of experiments may be needed before large, single crystals can be obtained. The first step to try is a very fine scan of conditions nearest those used initially to find the optimal conditions for growing only a few large crystals in a single setup. Vary the precipitant concentration, the protein concentration, the pH, and the temperature. You may also want to try varying the buffer used and its concentration. Using different types of setup will vary the equilibration rate, which can often lead to improved growth. What are needed are conditions where nucleation is rare and crystal growth is not too rapid. Do not look at your setups too frequently~once a day is e n o u g h ~ since disturbing them can result in the formation of extra nuclei. Leave finescan plates alone for a week before disturbing. Since nucleation is a stochastic process, preparing a large number of identical setups will often yield a few drops that produce nice crystals by chance. This is most useful only if you have sample to waste. If nucleation is unreliable, then seeding is often the answer. Two methods of seeding are used: microseeding and macroseeding. In microseeding small seeds obtained by crushing or those usually present in a large number
14
LABORATORYTECHNIQUES
in old setups are introduced into a fresh drop of preequilibrated protein. Seeds will usually grow in conditions where nucleation will not occur. An extreme example of this is photoactive yellow protein, where seeds will grow in ammonium sulfate solutions at 71% of saturation but nucleation will not occur at concentrations less than about 100%. The microseeds are diluted until only a few will be introduced. This usually requires serial dilutions and can be very difficult to control. Another method is to place a very small amount of solution from a drop in which a crystal has been crushed at one point of a fresh drop without any mixing. A mass of crystals will grow at this point but often a few seeds will diffuse to another part of the drop where a large crystal may develop. The author has found that 30-100 #1 sitting drops are good for this technique. The steps involved in microseeding are illustrated in Fig. 1.7A. The first step in microseeding is to establish the proper growth conditions. Drops with precipitating agent of increasing concentration are set up and preequilibrated overnight. Then a crystal is crushed with a needle so that the entire drop will fill with microscopic seeds. A whisker or eyelash glued to a rod is then dragged through the solution to pick up a small amount of the liquid containing microcrystals. The whisker is then streaked or dipped into the preequilibrated drops. After several hours or days, crystals should grow in the drops with sufficiently high precipitant concentration. To prevent unwanted nucleation, it is desirable to use the lowest concentration that will sustain crystal growth. When proper growth conditions are established, several drops are then preequilibrated to this concentration. A crystal is then crushed as before and a few microliters of the mother liquor in the drop is then pipetted into the first of a series of test tubes with stabilizing solution and mixed well. These are then serially diluted about 10- to 20-fold so that each successive tube contains fewer microcrystals. A few microliters of each tube is then put into the preequilibrated growth drops and after several days examined for growth (Fig. 1.7B). Each drop should contain progressively fewer crystals. The goal is to find a dilution that will provide just a few crystals per drop. If the microcrystals are stable enough, it may be possible to seed many drops from this same tube to grow many large crystals. In macroseeding a single seed is washed and placed in a fresh, preequilibrated drop (Fig. 1.8). The seeds need to be well washed in 2 ml of artificial mother liquor in a plastic petri dish. The dish is gently swirled to dilute any microseeds. The seed is then transferred with a minimum amount of solution to another dish with a precipitant concentration (found by experiment)in which crystals slowly dissolve. This produces a fresh growth surface on the seed and dissolves any microcrystals. The crystal is then transferred with a small amount of solution and placed in a fresh setup. Microseeds can be
1.2 ProteinCrystalGrowth
15
A. Find correct concentration
L
1. Crush crystal
2. Wet whisker increasing concenFation
FIG. 1.7 Microseeding techniques.
broken off by mechanical disturbances. Because protein crystals are soft and fragile, a gentle technique is necessary for this method to work. Let the crystal fall in to the fresh drop and settle of its own accord. Do not disturb the crystal after placing it in the growth drop. Often the less-dense dissolving solution will layer on top of the drop and mix only slowly, allowing any microseeds to dissolve. Several problems can be found with this technique
LABORATORYTECHNIQUES
16
1,.
3. I
s. i
7.
2.
----
@
1
4.
!
6.
,/,
8.
FIG. 1.8 Macroseeding method of growing larger crystals. First, two solutions are prepared in small petri dishes, a storage solution (usually a few percent higher in precipitant than the growth concentration) in which the crystals are stable for a long time and an etching solution in which the crystals will slowly dissolve over several minutes (usually a few percent in precipitant lower than the growth concentration). About 2 ml of each is needed, and the dishes should be kept covered to prevent evaporation. (1) Using a thin capillary, draw up several small but well-formed seed crystals. (2) Transfer these crystals into the petri dish with the storage solution. (3) Gently rock and swirl the storage solution petri dish to disperse the seeds throughout the dish. This dilutes microseeds and separates the crystals from each other and from any precipitate that might have been transferred with them. (4) Pick up a single seed from the storage solution and transfer it to the etching solution, bringing with it as little of the storage solution as possible. (5) While observing the crystal through a microscope, let it sit and occasionally rock the dish gently. (6) The corners of the crystal should start to round and the faces may etch, leaving scars and pits. (7) Pick up the crystal with as little solution as possible and gently transfer it to a fresh drop of protein preequilibrated to the growth conditions overnight. Often the crystal will fall out of the transfer capillary of its own accord so that no solution need be added to the drop. (8) Over several hours or days the crystal should grow larger.
1.3 CrystalStorageand Handling
17
and it will not work in every case. Sometimes the crystal may be so fragile that a trail of microcrystals is left in its wake. Often the seed will not grow uniformly; instead, spikes form on the seed surface. Other times the new growth will not align perfectly with the seed, causing a split diffraction pattern. In this case try using a smaller seed that will not contribute substantially to the overall scattering. Often the dislocation between the old and new crystal can be seen. It may be possible to expose only a region of the crystal away from the old seed. The seed should be freshly grown and well formed. If imperfect seeds are used, then you will only grow larger imperfect crystals. Often several generations of seeding will be needed to produce single crystals. Multiply twinned crystals can be crushed and fragments macroseeded until single crystals are obtained. Better crystals can sometimes be obtained by further purifying the protein sample to make it more homogeneous. Isoelectric focusing is especially useful. If you have a large amount of sample (>100 mg), then you can use preparative isoelectric focusing. Smaller amounts of sample (< 20 mg) can be chromatofocused on Pharmacia-LKB XX media. This method has proved successful in several cases. Be aware, though, that both these methods introduce amphylytes into the solution that can be difficult to remove. The last method for improving crystal quality, and often the best, is simply to look for more conducive conditions. For example, Chromatium vinosum cytochrome c' can be crystallized easily from ammonium sulfate, but these crystals are always so highly twinned that they are unusable, even for preliminary characterizations. By searching for new conditions, it was found that PEG-4K at pH 7.5 produces usable crystals that can be grown to large dimensions and will diffract to high resolution. This is very common for proteins. If they crystallize under one condition, chances are they will crystallize under another condition in a different space group and in a different habit. If there were only one condition out of all possible ones, I doubt that very many proteins would ever crystallize.
.....
1.3 . . . . .
CRYSTAL STORAGE A N D H A N D L I N G
Protein crystals can be stored for a few years and still diffract. Some precautions will help increase lifetime. Some proteins can be simply left in the drops in which they grew. Others, though, will grow small unaligned projections on their surfaces if kept in the original drop. These need to be transferred to an artificial mother liquor for storage. The artificial mother liquor must be found by experimentation. Usually, raising the precipitant a
18
LABORATORYTECHNIQUES
few percent is all that is needed. Do not use mother liquor with precipitant at the growth level. The protein in the crystal is in equilibrium with the protein in solution, and if mother liquor at the growth conditions without protein is substituted, the crystal will partially or wholly dissolve to reestablish equilibrium. Higher precipitant concentrations drive the equilibrium toward the crystal. For the same reason, do not store the crystal in a volume larger than necessary. Too high precipitant concentration will result in cracked crystals because the change in osmotic pressure will cause them to shrink. Change the reservoirs in vapor-equilibrium setups to prevent drying. Observe the crystal in the artificial mother liquor for several days before committing more crystals to it. Ideally, a crystal in a new artificial mother liquor should be examined by X-ray to confirm that no damage to the diffraction pattern has occurred. Keep the crystals in the dark. Light causes free radical chain reactions in the solution which will cross-link and eventually destroy the crystals. This is especially true of polyethylene glycol. Commercial PEG contains an antioxidant to retard polymerization caused by light that will slow, but not completely prevent, oxidation of PEG solutions. Solid PEG and PEG solutions must be stored in the dark at all times.
.....
1.4 . . . . .
CRYSTAL SOAKING
To solve the phase problem, the most common method used is multiple isomorphous replacement. In this method one or more heavy atoms are introduced into the structure with the most minimal change to the original structure that is possible. This gives phasing information by the pattern of intensity changes. A heavy atom must be used to produce changes large enough to be reliably measured. Only minimal changes, or isomorphism, are necessary because the primary assumption of the phasing equations is that the soaked crystal's diffraction pattern is equal to the unsoaked crystal's diffraction pattern plus the heavy atoms alone. For more details, see later chapters and the suggested readings. Heavy atoms or substrates that are to be introduced into protein crystals are usually soaked in an artificial mother liquor containing the reagent of choice. The compound is prepared in an artificial mother liquor solution at about 10 times the desired final concentration, and then one-tenth of the total volume is layered onto the drop containing the target protein crystal. Diffusion occurs within several hours to saturate the crystal completely. With some heavy atoms, secondary reactions often occur that can take several
1.4 CrystalSoaking
19
days. Heavy atom compounds are usually introduced at 0.1- to 1.0-mM concentration. A typical protein in a 10-~tl drop requires roughly micromolar concentrations for equimolar ratios. Many compounds will not dissolve well in the crystillization solution. In these cases it may be beneficial to place small crystals of the compound directly in the drop. Also, many heavy-atom compounds will take several hours to hydrate. If they do not completely dissolve at first, be patient. Gentle heating may speed dissolution. Soak time is more difficult to determine. If the soaking drop has several crystals, they can be mounted at different time intervals. Some heavy-atom reagents that are highly reactive may destroy the crystals and yet be useful if soaked for a short time. Other heavy-atom compounds undergo slow reactions that may produce a new compound that will bind. For instance, a platinum compound in an ammonium sulfate solution will eventually replace all of its ligands with ammonia. As a crystallographer you are not as concerned with exactly what binds to your protein as long as something heavy binds at a few sites in an isomorphous manner. A good way to check that something is binding is to place the crystal in a capillary so that when you invert the capillary the crystal will slowly settle. When heavy atoms bind to the protein, they will increase its density and cause it to settle faster. Similarly, if an artificial mother liquor can be sufficiently concentrated so that it has a slightly higher density than the protein crystals, the crystals will float. When a sufficient number of heavy atoms bind, the crystal will sink. This property can be used as a way to screen a large number of solutions quickly and has the advantage that small crystals can be observed underneath a microscope. The change in osmotic pressure of the increased density mother liquor may cause the unit cell of the crystal to shrink and, if so, any changes found in the diffraction pattern may be due to this effect rather than heavy-atom binding. In any case, it is a simple matter to resoak a fresh crystal in the usual mother liquor. What heavy atoms should you try? Table 1.2 presents a partial list of heavy-atom compounds in the order that I usually try them. (Whenever you meet another crystallographer, first you swap crystal-growing tales and then you always ask what heavy-atom compounds he or she has had particular success with.) Because most of the heavy-atom compounds are extremely toxic, extreme caution is in order. Some people experience respiratory distress and allergic reactions when exposed to these compounds. Therefore, always wear suitable (not latex) gloves and work i n a well-ventilated area. For reproducibility it is best to use fresh solutions; old solutions may oxidize and/or dismutate with time. Solutions must be kept in the dark and preferably under argon. The bottles of reagents themselves should be stored in a well-ventilated area with the caps sealed with Parafilm.
20
LABORATORYTECHNIQUES TABLE 1.2 Useful Heavy-AtomReagents and Conditions
Reagent
Conditions
Platinum tetrachloride Mercuric acetate Ethyl mercury thiosalicylate Iridium hexachloride
1 mM, 24 h
1 mM, 2-3 1 mM, 2-3 1 mM, 2-3 100 raM, 2-3
Gadolinium sulfate Samarium acetate Gold chloride Uranyl acetate Mercury chloride Ethyl mercury chloride
.....
days days days days
100 mM, 2-3 days 0.1 mM, 1-2 days 1 mM, 2-3 days 1 mM, 2-3 days 1 raM, 2-3 days
1.5
ANAEROBIC
..... CRYSTALS
Many proteins lose activity if exposed to air and so they must be grown anaerobically. In other cases the protein must be kept anaerobic in order to reduce it to its active conformation. The easiest method is to use an anaerobic hood. Solutions are passed in and out an airlock and crystals can be set up and handled with conventional techniques. For large-scale work this is by far the best method, but not all of us have access to an anaerobic hood. Another method uses a glove bag. This is a plastic bag with gloves that you can put your hands in to manipulate samples. Everything that you are going to use must be inside the bag before you seal it. This can present some logistical problems. A very simple anaerobic apparatus invented by Art Robbins consists simply of a capillary filled with degassed solution into which you float a crystal (Fig. 1.9). One end of the capillary is sealed by melting and the other is sealed with a layer of diffusion pump oil. Dithionite crystals can be dropped into the oil layer, through which they will float into the lower liquid. Any residual oxygen will be destroyed by the dithionite, and the oil layer prevents the entry of new oxygen. In our laboratory Cu-Zn superoxide dismutase crystals have been reduced in this manner and have stayed reduced for over
1.5 AnaerobicCrystals
21
FIG. 1.9 Simple anaerobic apparatus. Degassed mother liquor is placed in capillary and then a crystal is introduced. The crystal should be large enough so that it will wedge itself in the tapered portion of the capillary as it sinks. Mineral oil is then layered over the mother liquor to form a seal. The top few millimeters of the mother liquor, which have been exposed to air, can be drawn off with a capillary inserted through the oil layer. Solid reductant, such as dithionite, is then placed on top of the oil and allowed to sink through the oil into the mother liquor. Overnight the dithionite will diffuse to the crystal and reduce it. Excess oxygen is destroyed by the dithionite. The data can then be collected by mounting the capillary on a goniometer head as is normally done. Crystals reduced in this manner have remained oxygen-free for over a year.
a year. T h e d a t a are c o l l e c t e d by m o u n t i n g t h e c a p i l l a r y d i r e c t l y o n a g o n i o m e t e r h e a d . T h e size of t h e c a p i l l a r y in this case is c h o s e n so t h a t the c r y s t a l will w e d g e p a r t w a y d o w n . T h e e x t r a s o l v e n t d e c r e a s e s t h e d i f f r a c t i o n d u e to a b s o r p t i o n , b u t it w a s still p o s s i b l e t o get a 2 . 0 - A d a t a set u s i n g a l a r g e crystal.
This Page Intentionally Left Blank
......
2
......
DATA COLLECTION TECHNIQUES
.....
2.1 . . . . .
PREPARING CRYSTALS FOR DATA COLLECTION Protein crystals must be kept wet or they will disorder. Since solvent forms a large portion of the crystal lattice, a large change from the crystallization conditions will cause the crystals either to dehydrate and crack or to melt. For room temperature work, crystals are usually mounted in thinwalled glass capillaries. 1 The thin glass wall minimizes absorption of the scattered X-rays and also minimizes background from the glass. For protein work use the glass capillaries. Quartz capillaries are stronger, but the quartz scatters strongly around 3-~i resolution in a sharper band where the glass scattering is diffuse. Solvent contributes to background, which is always bad, and so as much solvent as possible must be removed without letting the crystal dry up. Alternately, you may want to use the newer cryocrystallography techniques covered in Chapter 6. However, freezing can cause changes in the unit cell that make them nonisomorphous, and a few proteins are not amenable to freezing. In these cases it may be necessary to use the methods in this chapter. 1Available from Charles Supper Company. 23
24
DATACOLLECTIONTECHNIQUES
Crystal-MountingSupplies Before mounting a crystal, make sure you have all the supplies you need at hand: 9 Capillaries. Thin-walled capillaries are needed in a variety of sizes. You will want to have a large supply of suckers previously made to pick from. 9 Tweezers. Two pairs of tweezers are needed: a pair with straight ends and a curved pair for prying up coverslips. 9 Scissors. A sharp pair of surgical scissors is needed for cutting capillaries and another pair for cutting filter paper in thin strips small enough to fit inside capillaries. Do not cut paper with the pair meant for cutting glass or they will quickly dull, and once dull they shatter rather than cut the glass capillaries. 9 Capillary sealant. Dental wax and other types of low-temperature wax are the traditional means of sealing capillaries. Recently, 5-min epoxy has become popular. The epoxy requires no heat, sets quickly, and forms an immediate vapor barrier even before it sets. The handiest kind is clear, comes in a dual-barreled syringe and is quite fluid before it hardens. Avoid types that are thicker and more like clay before hardening. 9 Plasticine. This is also known as nonhardening modeling clay and is available in toy stores. It is very useful for sticking capillaries to goniometers and for holding them in position while mounting crystals. When warmed by rolling with the fingers, Plasticine can be wrapped around thin capillaries without breaking them. An alternative is to use pins that are sold for use with goniometers by Supper and Huber. The glass capillary is inserted into a hole in the pin and held in place with wax or epoxy. 9 Filter paper strips. Cut Whatman #1 filter paper into strips thin enough to fit into capillaries for drying. The ideal strip is about 50 mm long, tapering from about 1.5 mm at one end to a fine point. The strips tend to curl when cut and can be straightened by gently curving in the opposite direction with fingers. Thin paper points originally meant for dental work are available from Hampton Research. As these come, they are too short to reach into capillaries. Mount the fine size into the end of a #18 syringe needle and then they will reach the crystal in most capillaries.
Mounting Crystals There are several ways to mount a crystal in a capillary but they all accomplish the same goal. The method used almost exclusively in our lab is as follows. A capillary at least twice the width of a crystal is used. It is shortened by breaking with a pair of sharp tweezers so that it is about 4 cm
2.1 PreparingCrystalsfor Data Collection
25
long. If it is not shortened it will be too long to fit onto most X-ray cameras. The broken end is sealed with either melted dental wax or 5-min epoxy. The large funnel-shaped end is left open; the crystal will be placed in the capillary through this end. A ring of wax or epoxy is placed where the funnel end narrows to make a place where the capillary can be cut later. Without the ring, the capillary may shatter completely. A small ball of Plasticine is warmed in the fingers and gently wrapped around the capillary to serve as a mounting base. Then the capillary is put aside; use the Plasticine to stick it in a handy spot where it can be reached later. The crystal to be mounted is selected and another capillary that will fit inside the first into which the crystal can be sucked is readied (Fig. 2.1). We use a piece of rubber tubing that fits over the capillary at one end with the other end going into a mouthpiece. It takes a little practice to get the knack of the sucking operation. The liquid will tend to stick at first because of surface tension, and then it comes all in a rush, requiring a little back pressure. Instead of mouth pressure, a syringe can be used to suck up the liquid. It is harder to control the syringe, however, and it takes one of your too-few hands. For toxic solutions, such as heavy-atom soaks, always use a syringe. Before the crystal is sucked out of the drop, a small amount of reservoir solution is sucked up and placed in the bottom of the previously prepared capillary (Fig. 2.2). Often a thin piece of filter paper is then pushed down the capillary to hold the reservoir liquid and to prevent it from moving. The crystal is then sucked up into the transfer capillary (Fig. 2.3). This frequently requires blowing liquid gently back and forth over the crystal to free it from the surface it grew on. More stubborn cases can be removed by very gently inserting the sharp point of a surgical blade between the crystal and the surface to pry it loose. Once the crystal is freely floating, it can be sucked up. The sucker is removed from solution and then a little air space is drawn in. This helps prevent the liquid from being drawn out by capillary action at the wrong time, wedging your crystal between the sucker and the mounting capillary. The sucker is then guided into the capillary, while observing through a dissecting microscope. The crystal is then gently expelled into the capillary and the sucker quickly removed. Some mother liquors are easier to handle than others and some will insist on sweeping the crystal between the sucker and the capillary wall, catastrophically crushing the crystal. This result can be avoided by first placing a band of reservoir liquid in the capillary into which the end of the sucker is inserted; then the crystal can be gently blown out. This leaves a large amount of solution to be removed later. In general, you want to blow out the crystal with as little solution as possible. The next step is to remove the solution around the crystal (Fig. 2.4). A very thin, fine capillary about 0.1-0.2 mm in diameter works the best for removing large amounts of liquid. Use one small enough that the crystal will not fit
26
DATA COLLECTIONTECHNIQUES
I
I
I
J
FIG. 2.1 Making crystal transfer pipettes. The transfer pipettes or suckers used for mounting crystals are made from 200-/~1 capillaries or any thin-walled piece of glass about 1/8 in. in diameter. The capillaries are useful because they come with tubing and mouthpieces. The glass is easily softened by holding over a Bunsen burner flame while turning slowly. When the glass softens, it is removed from the flame and the ends pulled apart. Hold it still for a moment to cool and then put it aside. Then draw out several more. Each drawn-out capillary is cut in the middle to form two pieces. The cut capillary is then bent by holding briefly over a flame until the end droops. With this technique you should be able to produce a number of different sizes. The ends are usually tapered and the proper bore can be obtained by cutting them off at the appropriate length.
inside. Start r e m o v i n g the liquid at the edges first. M a n y liquids with a high surface t e n s i o n c a n n o t be fully r e m o v e d this w a y a n d r e q u i r e f u r t h e r r e m o v a l w i t h a strip of thin filter p a p e r . This can be w o r k e d up n e x t to the crystal, w h e r e it will slowly a b s o r b all the free liquid. Leave a small a m o u n t of liquid b e t w e e n the crystal a n d the capillary to h o l d it in place. T h e capillary is t h e n sealed w i t h either d e n t a l w a x or e p o x y . E p o x y has the a d v a n t a g e of being
2.1 PreparingCrystalsfor Data Collection
27
FIG. 2.2 Preparing X-ray capillary for mounting. If at any time during mounting you want to temporarily seal the capillary, you can simply plug it with a softened piece of Plasticine. This gives you an opportunity to find something you forgot or to take a break. Sometimes it is necessary to allow viscous liquids time to bead up again before you can fully remove them.
cool; there is some danger with dental wax that heating will hurt the crystal. This can be minimized by laying a strip of wet tissue on the capillary over the crystal before the melted wax is applied. Another common mounting technique is to fill the capillary with liquid and float the crystal down into it. The crystal can be picked up in a minipipette and placed into the tube held vertically. Or the crystal can be sucked directly up into the capillary along with the mother liquor. Both methods require a large amount of liquid to be removed before the crystal is ready. However, some liquids are very difficult to dry completely as they tend to stick to the glass, and an excessive amount of time and effort may be required
28
DATACOLLECTIONTECHNIQUES
FIG.2.3 Crystalmounting.
to dry the capillary. These methods are easier and may also be gentler on the crystal. A disadvantage of the method is the necessity of making an artifiical mother liquor with which to fill the capillary. This artificial mother liquor sometimes damages the crystal when it is transferred. The gentlest technique of all is to grow the crystal in the capillary and then remove the growth so-
2.1 PreparingCrystalsfor Data Collection
29
FIG. 2.4 Dryingcrystals. (A) Remove large amounts of liquid by drawing up into a drawn-out pipette by capillary action or by sucking. (B) Final drying is done with a thin piece of filter paper. (C) The crystal should have a small amount of liquid to keep it wet and to help it adhere to the capillary walls.
lution for data collection. This has been necessary for some protein crystals with very high solvent contents.
Drying Crystals H o w dry does the crystal need to be? This depends upon the particular protein, and therefore requires experimentation. If some crystals are left too wet, they will dissolve slowly. If too dry, some crystals may crack. If a lowtemperature apparatus is to be used, then temperature gradients may cause the liquid to distill around the capillary, either cracking or dissolving the crystal. To prevent such damage, use a short capillary with as little free liquid as possible. A piece of filter paper may be used to wick solution around the capillary and reequilibrate it. Polyethylene glycol solutions are amazingly tenacious, and a layer that slowly beads up around the crystal will remain bound to the glass, destroying your careful drying work. One remedy for this is to place the unsealed capillary in a sandwich box with a reservoir of crystallization liquid to keep the crystal wet while you wait about half an hour for the liquid residue in the capillary to draw up around the crystal and rewet it. This time when you remove the liquid the crystal will stay dry and you can seal it. The sandwich
30
DATACOLLECTIONTECHNIQUES
box is also handy to have to give yourself a break if you think that the crystal is getting too dry. You can reequilibrate the crystal in the box before continuing.
Preventing Crystal Slippage Crystals are held in place by the surface tension of the thin film of liquid between the crystal and the capillary wall. In most cases this is adequate, but sometimes the crystals will slip slowly or suddenly. Some methods of data collection are gentler than others and the crystal is less likely to slip. Any slippage of a crystal during data collection is a problem. The crystal can leave the center of the X-ray beam or it can rotate, changing the pattern of the diffraction. There are several ways to avoid this situation. First, dry the crystal thoroughly; excess liquid around the crystal encourages slipping. (See the preceding remark on viscous liquids and drying crystals.) The slippage may be due to excess liquid that builds up around the crystal after data collection has begun. For instance, if you use a low-temperature device there may be a temperature gradient along the capillary that causes water to distill from one end of the capillary to the other. This changes the vapor equilibration point at your crystal and can cause it to get wetter. In the worst cases a bead of liquid may form above the crystal and slip down onto it and dissolve it. To avoid this, keep the capillary as short as possible and put a wicking material such as filter paper in the capillary to encourage reequilibration of the liquid. Second, mechanically holding the crystal in place with fibers may be used. This should be a last resort, as material used to hold the crystal in place will add to the background scattering. Pipe-cleaner fibers have been found to be useful for this purpose. A third method is to glue the crystal in place with a glue that dries in a thin film over the surface of the crystal and cements it into place. The glue and the method used is described by Rayment. 2 Finally, freezing the crystals as described in Chapter 6 will prevent crystal slippage. Also consider the shape of the capillary relative to the surface of the crystal you are mounting. If the crystal has a flat face, then mounting inside a large-diameter capillary will provide a better contact between the crystal and the glass (Fig. 2.5). Conversely, a small capillary may be better suited to a crystal with many facets that presents a more curved surface. In fact, by floating crystals down capillaries filled with liquid, it is possible in extreme cases of slippage to wedge the crystals into the capillary where the glass tapers. 2Rayment, I. (1985). In Methods in Enzymology, Vol. 114, pp. 136-140. Academic Press, San Diego.
2.2 Optical Alignment
31
FIG. 2.5 Choose the capillary size to fit the shape of the crystal.
.....
2.2 . . . . .
OPTICAL ALIGNMENT T h e n e x t steps will be m a d e e a s i e r if t h e c r y s t a l is first a l i g n e d o p t i c a l l y . T h i s is a c c o m p l i s h e d u s i n g a s p e c i a l g o n i o m e t e r s t a n d c a l l e d a n o p t i c a l a n a l y z e r a n d t h e d i s s e c t i n g s c o p e . T h e c r y s t a l in t h e c a p i l l a r y is p l a c e d o n a g o n i o m e t e r h e a d t h a t is, in t u r n , m o u n t e d o n t h e o p t i c a l a n a l y z e r . T h e first o p e r a t i o n is t o find t h e c e n t e r of r o t a t i o n of t h e a n a l y z e r w i t h r e s p e c t t o t h e m i c r o s c o p e r e t i c u l e s (Fig. 2.6). R o t a t e t h e a n a l y z e r to 0 ~ a n d n o t e t h e
FIG. 2.6 Steps in entering a crystal on a camera using the crosshairs as guidelines. The view through the microscope is shown in three steps A, B, and C. The rotation axis is horizontal and the direct beam passes through the crystal vertically. The microscope crosshairs in this example are not perfectly aligned with the rotation axis of the camera. (A) View at 0 ~ (B) View at 180 ~ The translation on the goniometer is moved so that the crystal is halfway between the two positions observed at 0 ~ and 180 ~ (C) The final position after correction: the crystal will now be in the identical position at both 0 ~ and 180 ~ Note that the crosshairs do not go through the center of the crystal. The center is not defined by the crosshairs but by the center of rotation. If the crosshairs and the center do not coincide, the crosshairs should be adjusted to facilitate future alignments. Never assume the crosshairs are centered unless it has been done recently, since high-power microscopes can become misaligned easily, especially if they are frequently moved.
32
DATACOLLECTIONTECHNIQUES
position of the crystal, then rotate it to 180 ~ and note the position. The center is the midpoint between these two positions. The crystal can be translated by using the slide on the goniometer head to move it to the midpoint. Another check of 0 ~ and 180 ~ is usually needed to fine-tune the centering. Repeat these steps for 90 ~ and 270 ~ Note that the center is defined by the range of motion as the axis is rotated and not by any particular point in the microscope. The most c o m m o n source of frustration in alignment is to assume that the crosshairs on a piece of equipment correspond to the center of rotation. Never make this assumption: the center is that position at which the crystal does not move when rotated. If the crystal has a definable axis, you may want to align it with the rotation axis. This is done by comparing the views at 0 ~ and 180 ~ and adjusting the arcs on the goniometer until the crystal axis in both views is in the same position. This is then repeated for 90 ~ and 270 ~. The third alignment that needs to be made is the position of the crystal faces relative to 0 ~ In general, crystal axes either are perpendicular to a face or they pass through an edge (Fig. 2.7). If the goniometer head provides a z-rotation,
1,R3
Cubic
23 432
120 a=
6
622
Angle restrictions
P222, P222 1, P2 1212, P2 12121 C222, C222, F222,1222, 12,2121
32 Hexagonal (6-fold parallel to c)
Lattice restrictions
b=c(R)
a=b
u = p = y < 120 u =p
=
90, y
=
y = 90
=
120 ~ 6 2 2~, 6 ~ 2~26, ~ 2~26, ~ 2 2 , P6,22, P6,22 P23, F23,123, P2,3,12,3 P432, P4,32, P4,32, P4,32, F432, F4,32,1432,14,32
a=b=c
a
=
p
2.4 PreliminaryCharacterization
55
are asymmetric objects and occur only in the L-form, they cannot be involved in symmetry elements requiring inversion centers, mirrors, or glide planes. This limits the possible space groups to 65 out of the 230 mathematically possible space groups and leaves 2-, 3-, 4-, and 6-fold axes along with the corresponding screw axes, and centering, as the possible symmetries. We will consider each in turn. 2-Folds. A 2-fold causes the presence of a mirror plane perpendicular to the 2-fold axis in the reciprocal space pattern after the addition of the inversion center (see Fig. 2.22). A 21 can be distinguished by the absence of every other spot on the axis that lies in the plane of the mirror. The presence of even one exception to the screw axis absences means the axis is a 2-fold. The 2-fold constrains the angles between the 2-fold and the other two axes to be 90 ~. 3-Folds. A 3-fold causes 3-fold symmetry in the reciprocal axis except on the 0-layer, where the inversion center raises the symmetry to a 6-fold. Thus it is usually necessary to take an upper-level precession photo to differentiate between a 3-fold and a 6-fold. There are two 3-fold screw axes, the 31 and the 32. They give identical absences along the 3-fold axis: only every third spot is present. They cannot be distinguished from each other at this point and must be left to distinguish later. In addition, the 3-fold symmetry gives rise to a hexagonal lattice that constrains the a and b axes to be identical and the angles between axes to be 90 ~ between the 3-fold and the other two axes, and to be 120 ~ (60 ~ in reciprocal space) between the two non3-fold axes. 4-Folds. A 4-fold gives rise to 4-fold symmetry to the diffraction pattern in the plane perpendicular to the 4-fold axis. It also constrains two axes to be identical to each other and all axes to be at 90 ~ A mirror will be found at the plane passing through the origin and perpendicular to the 4-fold (hkO). There are two possible types of screw axis: 41 and 43, with only every fourth spot present; and 42, with every other spot present on the 4-fold axis. 6-Folds. A 6-fold gives rise to 6-fold symmetry both on the zero level and on upper levels. In addition, a mirror is found at the plane passing through the origin and perpendicular to the 6-fold axis (hkO). Both trigonal (nonrhombic) and hexagonal space groups have a hexagonal lattice; the symmetry of the intensities must be used to tell the two apart. The 6-fold axis itself can have three different screws: 61 and 6s, with every sixth spot only; 62 and 64, with every second spot only; and 63, with every third spot only on the 6-fold axis. Two pairs of the screw axes, 61 and 6s, and 62 and 64, can be told apart only at a later stage.
56
DATACOLLECTIONTECHNIQUES
§
Q-
2-fold
A
3-fold
4-fold
§
Q-
§
6-
2-fold + center
A
3-fold + center
4-fold + center
FIG. 2.22 The effect of adding an inversion center to a 2-, 3-, and 4-fold. On the left is the symmetry in real space with a comma used to indicate an asymmetric object. A " + " and a solid symbol indicate an object above the plane of the page and a " - " and an open symbol indicate an object below. Note in the case of the 2-fold that an inversion center forms a mirror perpendicular to the 2-fold. In projection there are two mirrors and, therefore, the 0 level will also show two mirrors, while an upper level will show only one mirror. An inversion center plus a 3-fold still has the same symmetry as before, although in projection it will appear to be a 6-fold. A 4-fold plus an inversion center adds a mirror in the plane of the paper. A 6-fold (not shown) is similar to a 4-fold.
Rhombic. Rhombic is a special case of trigonal and is characterized by having all three axes equal and all three angles equal. It is the hardest system to diagnose because of the difficulty in finding the zones and determining their relationships when the axes are not near 90 ~. In certain cases, C-centered monoclinic may really be R32, so this may be worth checking out.
2.4 PreliminaryCharacterization
57
Centering. Centering can be detected by systematic absences throughout the diffraction pattern. Centering can cause confusion about the direction of the principal axes. Always use the symmetry to determine the lattice. For instance, in a C-centered lattice, spots with h + k = odd are missing. At first inspection, the lattice appears to be running on the diagonals of the cell. Symmetry on the zero level will show the presence of two mirrors that give the correct direction of the two axes. The five types of centering possible are A, B, C, F, and I. A, B, and C centering are identical except for the naming of the axes. The convention is to name the axes such that the cell is C centered, so that h + k = 2n + 1 reflections are missing; that is, the ab face of the crystal is centered. (Thus, A centering has the bc face centered and B centering has the ac face centered.) F is face-centered so that all three faces ab, ac, and bc are missing the odd reflections. I is body-centered so that an extra lattice point is found at the body center of the lattice. This would be easy to miss by precession photography of 0 levels alone, although a picture of a diagonal zone might reveal it. Always assume that you may have higher symmetry than you do until proven otherwise. Never trust 90 ~ angles unless confirmed by symmetry. Never use low resolution photographs to decide the space group. Always keep an open mind about the space group until the structure has been solved and refined. To determine the correct space group it is necessary to take enough precession photos to determine the symmetry elements present, any systematic absences along the axes, and any centering. The photos are then compared with the diagrams in the International Tables. The tables are grouped according to the highest symmetry present (i.e., if you have a 6-fold, then the molecule is found in the hexagonal section). You then search for alternative space groups and ask if the precession photos you have are necessary and sufficient to eliminate all other possible space groups. This may mean taking an upper-level photo to determine the difference between some possibilities such as trigonal versus hexagonal (Fig. 2.23). Determine the size of the unit cell by Bragg's law and compare the volume of the cell in angstroms cubed with the size of the best estimate of the protein's molecular weight (MW) in daltons. Listed in the International Tables for each space group is the Z number, or the number of asymmetric units in the unit cell. Use this formula to calculate the angstroms per dalton of the asymmetric unit: volume/(Z x M W ) . 11 The expected value for this number for protein molecules ranges from 1.7 to 3.0, with the average being about 2.3. If you have a number substantially smaller than 1.7, it is likely that something is wrong or, perhaps, there is internal symmetry in the protein molecule that corresponds with a crystallographic axis. For instance, spot hemoglobin is a tetramer 11Matthews, B. W. (1968).J. Mol. Biol. 33,491.
58
DATACOLLECTIONTECHNIQUES
2.4 PreliminaryCharacterization
59
and has a dimer axis that coincides with a crystallographic axis so that the asymmetric unit contains one-half of a tetramer instead of a full molecule. If the number is substantially larger than 3.0, then there are two possibilities. One is that there is more than one molecule in the asymmetric unit, which is very common. The other possibility is that the space group has higher symmetry than you have determined. Try looking for higher-symmetry space groups that contain the symmetry you have already determined to see if there is a precession p h o t o g r a p h that you could take that would prove or disprove this possibility. For instance, R32 can be reindexed as monoclinic C2 and it can be difficult to spot the difference. If you are using X E N G E N or M O S F L M to reduce the data, it is also possible to try reindexing your data in different space groups and to look at the R-merge values. There are also programs that are commonly used by small-molecule crystallographers to search for additional symmetries in a three-dimensional data set. Finally, a c o m m o n error is to mistake pseudosymmetry for true symmetry. An example of a crystal with pseudosymmetry is given in Fig. 2.24. Pseudosymmetry appears correct at lower resolutions but breaks down at higher resolutions. Most protein crystals show some pseudosymmetry in the range of infinity to 6 A. It is unusual not to have low-order reflections on the axis that are virtually extinct, leading one to the conclusion that there is a screw axis. Always confirm screw axes to at least 3 ~i or better resolution. If you cannot confirm the screw axis to high resolution, then bear in mind that the axis is not a screw axis and try both possibilities. The presence of even a single reflection that breaks the symmetry rules out the presence of a screw axis. If that reflection is weak, however, it may be worthwhile considering that it is an artifact (in particular, K~ radiation can cause artifacts), and do not exclude the 2-fold screw until better evidence is found. One of the best confirmations of the space group is a good heavy-atom Patterson. For example, consider the case of determining whether a 2-fold or a 21 screw is present on the a axis. The Patterson map for both possibilities is calculated the same way. Even if you enter the incorrect possibility in the symmetry operators, both a 2-fold and a 2 ~ degenerate to a mirror in Patterson space. In the case of the 2-fold, the H a r k e r vectors will be at the plane x = 0.0, and for the 2~ the Harker vectors will be on x = 0.5. It is important to plot out
FIG. 2.23 Distinguishing 3- and 6-folds. (A) The 0-level photograph of photoactive yellowprotein taken down the c axis shows a hexagonal net with 6-fold symmetry. Both a 3-fold and a 6fold will show 6-fold symmetry in the 0-level. Thus an upper-level precession photograph hkl (B) was taken to distinguish the two possibilities. This photo also shows 6-fold symmetry,confirming the presence of a 6-fold axis. Another precession photograph (not shown) was taken of the 6-fold axis and showed every odd spot systematically missing. There are no mirrors, eliminating the possibility of the class 622. This is enough information to assign the space group as P63.
60
DATACOLLECTIONTECHNIQUES
FIG. 2.24 Pseudosymmetry. This precession photograph of iron-binding protein shows true 4-fold symmetry. There are also pseudomirrors along the main axes and the main diagonals. Close examination shows these mirrors to be inexact. (Photo courtesy of Andy Arvai, Scripps Research Institute.)
the entire Patterson m a p and look at all possible H a r k e r sections. (For more information on Patterson maps, see the following.)
Unit-Cell Determination A single still roughly the three concentric circles visible and these directions.
p h o t o g r a p h taken along one axis can be used to determine directions of the unit cell. You should have a pattern of around the beam center. Portions of the lattice will be can be used along with Bragg's law to determine two
2.4 PreliminaryCharacterization
61
A d
=
2 sin( tan-l(A/F))2 where A is the spacing between spots, F is the crystal to film distance, and A is the the wavelength of X-rays (1.5418 A for copper targets). It is best to measure a long row and divide the length by the number of spaces to get a more accurate determination. Also, the closer in to the center the row is, the more accurate the approximation of using Bragg's law will be, because the spacing gets stretched the farther out from the center you are. The third direction can be determined from the spacing of the concentric rings using the equation: n~ d= ]
-
cos(tan-l(r/F))'
where r is the radius of the nth circle. The circles need to be close to concentric about the beam stop (i.e., an axis aligned along the direct beam) for this equation to be accurate. The direction determined is correct if you have an orthogonal cell. Otherwise the lengths need to be corrected for the fact that the photographs show d* instead of d directly. To do this simply divide the distance by the sine of the appropiate angle. For example, the correct value of b is b/(sin~). More accurate distances can be determined from the undistorted lattices found on precession photographs. For this you will need photographs of at least two zones. Measure a number of spots in a row and divide by the number of spaces as above and use the same equation derived from Bragg's law. Again, for nonorthogonal cells these distances will need to be corrected.
Evaluation of Crystal Quality The number-one piece of information requested about protein crystals is the limit of observable diffraction. The desire to have this number be as high a resolution as possible (i.e., to have the smallest numerical value) has led to some rather creative definitions where the single highest-resolution spot is used to report this value. It is better to report the resolution where at least one-third of the possible reflections are still visible above background. Even this is pushing it, but resolution inflation is common and pervasive. Another factor to consider is the mosaicity of the crystal. Mosaicity is a measure of the order within a crystal. If a crystal has low mosaicity, the crystal is highly ordered and diffraction spots will be sharp. Highmosaicity crystals will have broader peaks because of lower crystalline order (Fig. 2.25). Since increased mosaicity means that a spot is in diffracting conditions for a larger range of angles, mosaicity may be recognized as broadened lunes on diffraction patterns. Or, if you are using a diffractometer or
62
DATA COLLECTION TECHNIQUES
FIG. 2.25 Mosaicity of crystals. Each block is composed of one to many unit cells. Top: This crystal has high order (right) and thus the diffraction of a single spot is sharper (left). Bottom: This crystal has lower order and broader diffraction spots.
area detector, the profile of a peak may be directly measured. While increased mosaicity in itself may not be a problem, it may indicate other problems. For instance, if when mounting the crystal you let it dry too much, the mosaicity will be increased. If the crystal is suffering from radiation damage (heating, drying, etc.), it is quite likely that the mosaicity will increase. If the crystals have high mosaicity, it may be worth trying to see if a crystal can be mounted to give a diffraction pattern with less mosaicity. Mosaicity may also indicate twinning, where the crystal is actually made up of several crystals joined together. Unless the twinning can be accurately accounted for, it is not possible to use the amplitudes of a twinned crystal to determine the X-ray structure. Finally, increased mosaicity is usually accompanied by lower-resolution diffraction overall and lower signal-to-noise ratio, since the counts are spread over a larger diffraction angle. In looking for twinning it is important to look for extra families of circles that cannot be accounted for by a single crystal in the beam (Fig. 2.26). A twinned crystal is one or more crystalline units joined together. Sometimes the joining is apparent in the morphology, but often the only way to tell is from the diffraction pattern. Still photographs or small-angle photos are best for this purpose. Be cautious in assigning twinning due solely to split spots (Fig. 2.27). If the crystal is slightly misaligned about the center of the camera or the camera is misaligned, this can cause split spots because the
FIG. 2.26 Image of twinned crystal. An image of a crystal with a twinning defect. Two separate, unrelated sets of concentric circles are evident. This crystal cannot be used for data collection.
FIG. 2.27 Split-spot profile. A split-spot profile such as this may indicate a cracked crystal or twinning.
64
DATACOLLECTIONTECHNIQUES
Ewald sphere is not centered on the camera center. It is possible then for a reflection to occur twice in near proximity and to produce split spots. It may be more fruitful to align the camera carefully rather than to throw the crystal away. On the area detector or diffractometer, twinning can be recognized from looking at the profile of several spots in different areas of reciprocal space. For area detectors, 1 0 - 2 0 frames of 0.05~ ~ in oscillation angle are taken on the profile of spots as a function of oscillation angle is plotted. This can be done using the frameview program from XtalView (see following). Diffractometers allow a continuous scan in one of several angles. The presence of split spots indicates a twinned or cracked crystal, which must not be used for data collection. Are the crystals big enough for data collection? This is a question with so many parameters that it is not possible to give a good answer. It is always possible to grow a larger crystal, although it may take considerable experimentation and hard work. The size needed is determined by the quality of the diffraction pattern, not its physical dimensions. Crystals with large unit cells have weaker diffraction patterns than do similar crystals with smaller cells, with the result that a larger crystal is needed. (Actually they diffract the same amount of p h o t o n s - - i n the larger cell these photons are spread out over more reflections.) In the end you have to decide which questions you wish to answer with your experiment. If you want an atomic-resolution structure of a mutant protein to look at small changes from the wild type, then clearly a 4-A diffraction pattern is not enough. It may be worthwhile collecting data on a small crystal for now so that you can start working on structure solution at low resolution while you wait for larger crystals to grow. Avoid collecting data just because you can do it. If you collect a 2-A data set but all the spots beyond 3 A are below the noise level, then it is really a 3-A data set and will not give you any information beyond this.
.....
2.5 . . . . .
HEAVY-ATOM DERIVATIVE SCANNING WITH FILM The traditional method of scanning for heavy-atom derivatives is to use screened precession photos with a precession angle of 5~ ~ or higher. The method is inefficient in that it takes a longer exposure to collect the same number of photons by precession photography than it does to by other methods because most of the diffracted X-rays are blocked by the screen. Shorter exposure times can be used if several-degree-rotation photos or low-angle screenless precession photos are used. In any case the object is to compare the intensities of the heavy-atom film with an equivalent "native" film and to look for intensity changes. The unit cell can be quickly checked by overlaying
2.5 Heavy-AtomDerivativeScanningwith Film
65
equivalent rows on the native film. If the unit cell of the putative derivative changed significantly (> 0 . 5 - 1 . 0 % ) , then the derivative may not be usable. Deciding if there are intensity changes can be difficult for the beginner because it is necessary to differentiate between different exposure times and differences in the rate of falloff for the entire pattern. The best way to convince yourself that the changes are real is to look for reversals where the intensity is greater in one photo and another pair where the intensity differences are reversed (Figs. 2.28 and 2.29). A good heavy-atom derivative has obvious differences. Most photos will not have large differences but may show one or two differences. Remember, "One difference does not a derivative make." The differences should occur at all resolutions. Differences will be found in the lowest-resolution reflections between infinity and 10 A from differences in solvent contrast because of the presence of the heavy atom in the solvent, even if there is no binding to the protein. The differences of an isomorphous derivative will fall off slightly with resolution and will increase with resolution if the derivative is not isomorphous. However, unless the pattern of differences is obvious, it is probably better to decide these questions by collecting some data on the derivative and determining the size of the differences with resolution statistically on a large number of reflections. If rotation photos are used, be careful that you do not compare spots that could be partial in one photograph but not in the other. Examine only reflections roughly perpendicular to the rotation axis and at least one row from the edge of the lunes. Reflections near the rotation axis are probably partial, and very small differences in crystal orientation can cause large intensity changes. Knowing this, it is possible to use rotation photos of about 5 ~ to scan for derivatives for most unit cells. Choose the largest angle you can without getting overlap below 3 A. Align the crystal carefully with still photos 90 ~ apart on the spindle. It is not necessary to align the crystal as precisely as for a precession photo, but be aware that a different pattern of partials can be confused with true intensity changes. In any case the worst that can happen is that you falsely identify a derivative and collect an extra data set. This is far better than missing a derivative altogether. The other common method of finding derivatives is to scan using the
!
2
FIG. 2.28 An intensity reversal between otherwise identical spots on two films. Note that in film 2 the upper spot is larger than the lower, whereas it is the opposite in film 1.
66
DATACOLLECTIONTECHNIQUES
FIG. 2.29 Derivative and native films compared. Left: native iron-binding protein; right: the iron-binding protein soaked in iridium hexachloride. There are clear intensity changes, and many examples of reversals can be found. Also note that the pseudomirror symmetry between top and bottom has been clearly broken in the derivative. area detector. As people are gaining familiarity with area detectors, this is becoming more common. In most laboratories with both detectors and cameras it is easier to get time on the camera, and you can usefully fill time by scanning for derivatives with film while waiting for the detector to become available. In using the area detector, collect enough frames to index and integrate a small a m o u n t of data. This is then merged with the native data, and the resulting statistics (see following) can be used to determine if the crystal is derivatized. If it is not, it can be removed and another crystal tried with a minimal waste of time. This is k n o w n as the "take-it-off" strategy. This method can be used to scan several crystals in a single day. It takes overnight to make a precession photo for the same purpose, and then one is usually comparing fewer total unique spots and the method is not quantitative.
67
2.6 Overall Data Collection Strategy
. . . . . 2.6 . . . . . OVERALL DATA COLLECTION STRATEGY Unique Data The essence of data collection strategy is to collect every unique reflection at least once. First you need to determine the unique volume of data for your space group. This is done by considering the symmetry of your space group and including an additional center of symmetry. Thus the space groups P222, P2122, P21212, P212121, C222, and C2221 all have m m m symmetry in reciprocal space because both a 2-fold and 21 screw degenerate to a mirror plane when a center of symmetry is added. To determine the unique data, you can look up your space group in the International Tables and determine the reciprocal space symmetry (also called the Patterson symmetry because the Patterson function also adds a center of symmetry). In Table 2.3 tl~e volume needed for each space group is listed, For instance, for orthorhombic
TABLE2.3 Unique Data for the Various Point Groups Crystal system Triclinic
Class 1
Data symmetry 1
Unique data
-h,h;
-k,k;
-h,h;
0, k;
0, h; Monoclinic (2-fold parallel to b) Orthorhombic Tetragonal (4-fold parallel to c)
2/m
Rhombohedral
Hexagonal (6-fold parallel to c) Cubic
-k,k;
l,-I
0, k; 0,1or
0, h;
0, k;
-l,l
mmm
0, h;
0, k;
0,1
4/m
0, h; 0, k; 0,1or 0, 1 and any 90 ~ about c
4/mmm
0, h; k - > h , k ; h - > k , h ; 0, k;
3
3
0, h; - k , 0; 0, lor 0, 1 and any 120 ~ about c
3 32 32
3
0, h; 0, h;
0, k;
O, h;
k >- - h/2, k - - h , 1
longest cell edge (~k) 8.0
The factor of 8 can be changed to 12 if mirrors are used. In practice, spots have some width, so that the center-to-center distance that two reflections can have and still be resolved is also dependent upon the optics and the particular crystal. In practice, it is better to err on the safe side. Trying to squeeze too much data onto the detector and overlapping adjacent spots will lower data quality. The easiest and best way to determine the d is to first do a rough backof-the-envelope calculation and then put the crystal on the machine. Find an orientation with the closest spacing on the detector, collect a short data set with the longest axis on the face of the detector, and examine some of the frames (Figs. 2.35 and 2.36). The closest spots should be clearly separated. On the Hamlin detector this can be a single pixel, while on the Bruker (with XENGEN) detector there should be a separation of at least 3 pixels. If the spacing is too close, the detector must be moved back. Do not collect data with overlapping spots! An area detector can be equipped with three types of goniostat, twocircle, three-circle, or four-circle. The two-circle is most limited, consisting of a rotation axis and a swing movement for the detector. Some improvement can be made by mounting crystals so that the rotation axis is a diagonal of the unit cell. With such a setup it is very hard to collect a single data set at high resolution from a single mount.
78
DATACOLLECTIONTECHNIQUES
FIG. 2.35 Frame from San Diego Multiwire Systems (Hamlin) area detector. Frame from a multiwavelength data collection experiment at beam line I-5 at the Stanford Synchrotron Radiation Laboratories. Spots are well separated. Data collection geometry has been set up to allow
Bijvoet pairs to be collected simultaneously on the left and right halves of the detector (rotation axis is horizontal at this beam line) by taking advantage of the mirror symmetry of the samples space group (P2~2~2~). (Frame courtesy of Brian Crane, Scripps Research Institute.)
A three-circle goniometer has a rotation axis and a ~ rotation mounted with X fixed at 45 ~ A swing angle is also provided for the detector. Data are usually collected by rotating around the rotation axis for as far as possible. The crystal can be rotated around the 4~ axis to collect new data. Usually rotating 4~ 90 ~ will give the most new unique data. A four-circle goniostat will allow the most control over data collection. It has all the movements of the three-circle, and X can be adjusted a full 360 ~ Because of the sizes of the detector and the X circle, however, these collide after co has moved over a limited r a n g e m u s u a l l y about 60 ~ To overcome this
2.9 AreaDetector Data Collection
79
FIG. 2.36 Area detector frame of data collected on a Bruker area detector. The frame is 0.25 ~ rotation of Fe-binding protein. The detector was located at 17 cm. Note that the spots (see lower right) are just being separated.
disadvantage, the crystal is ratcheted by advancing ~b 60 ~ and a n o t h e r a) sweep collected. A useful recipe for data collection using a four-circle goniostat with a Bruker area detector and an o r t h o r h o m b i c crystal is given in Table 2.4. It collects the unique data in a m i n i m u m a m o u n t of time at 2-A resolution at a crystal-to-detector distance of 12 cm. The crystal is m o u n t e d so that one axis is a p p r o x i m a t e l y along the capillary (i.e., at X 0~ o) and this axis will be coincident). This needs to be only accurate to a b o u t +_5~ Optically center the crystal on the goniostat. M o v e X to 0, a) to 50, and set the swing to 22.5 ~
80
DATACOLLECTIONTECHNIQUES TABLE2.4 Example of Bruker Area Detector Data Collection for Orthorhombic Crystals
Swing
X
4)
Oscillation (a))
Number of frames
-10
22.5
15
0
0.25
240
50
-10
22.5
15
60
0.25
240
50
20
22.5
75
0
0.25
120
Run
Start
1
50
2 3
End
N o w rotate 4) and take still frames until a 0 layer (the 0 layer is the one that passes t h r o u g h the b e a m stop) is centered on the detector (Fig. 2.37) so that the outer edge of the circle is at the edge of the detector. Define this 4) angle as 0 ~ As data are collected, the 0 layer circle will move from the center of the detector t o w a r d the b e a m stop. The X is then offset to m a x i m i z e the a m o u n t of unique data. O t h e r w i s e the data on the top and b o t t o m of the detector will be related by m i r r o r symmetry. Since not all the data can be collected in one run (we need 90 ~ + 22.5~ a n o t h e r run is done with 4) rotated. To fill in the data that were missed by the limit of the detector height, a fill-in run of half the length at X 90 - 15 ~ or 75 ~ is used. If Bijvoet pairs are desired, then
+m
~.
..
+t0
....iiiiiiiiiiiiiiiiiiiiiiiiiiiiiii i. . . . !!"9 i!
:"':'" .:!
. .......... "":.. .."......... :. "i.
--~,...... ,i .... ~:~...... -! ....
............................. 'i:; I uea op "i shadow
i---~~~
i ii;;! !!i!;ii i.i.i
.:'::" .i:"
,,:
................................ .... .............
.,,:
::"i ~r
..,...................... ~..~'~.. .... ~~
i!i!i!:!!i!i;i.ii -..;-.-.,; 9
A
D,
o....o
B
FIG. 2.37 Starting position for ()rthorhombic data collection. (A) The crystal has been aligned so that one axis is vertical, coincident with the rotation axis, and the crystal is rotated until the 0 layer stretches from the beam stop to the edge of the detector. The 0 layer always intersects the beam stop. (B)X has been rotated 15~to maximize the unique data and to minimize the effect of the blind region around the rotation axis.
2.9 AreaDetectorData Collection
81
TABLE2.5 Example of Bruker Area Detector Data Collection for Orthorhombic Crystals
Run
Start
1
50
2 3 4 5 6
End
Oscillation (co)
Number of frames
0
0.25
240
180
0.25
240
60
0.25
240
-15
240
0.25
240
75
0
0.25
120
-75
180
0.25
120
Swing
X
-10
22.5
15
50
-10
22.5
50
-10
22.5
50
-10
22.5
50
20
22.5
50
20
22.5
-15 15
4'
interleave runs at X = - X , ~b + 180 ~ leaving the other angles the same, for a total of six runs (Table 2.5).
Increasing Signal-to-Noise Ratio Other than modifications to the optics and the beam stop, there are several easy ways to lower the background for marginal crystals. The first is to decrease the width of each frame so that each reflection takes about three frames to diffract completely. The background is a continuous value as the oscillation angle changes, whereas the spot is not. Taken to the extreme, it easy to see how this helps. If each frame is 1.0 ~ wide and the spot diffracts for 0.25 ~ of this oscillation, then in the pixels containing the reflection, background will have accumulated for four times longer than the reflection counts. This will greatly decrease the signal-to-noise ratio. In tests on our Bruker area detector we have found that collecting 0.1 ~ frames instead of 0.25 ~ frames increased the I/or(I) ratio for weak high-resolution reflections beyond 2.0 A by two times. A second method of reducing background is to pull the detector back. The background falls off as a square of the distance from the crystal (ignoring air absorption for now). To a first approximation, the diffracted rays are parallel and do not decrease in intensity with distance. So doubling the distance from the crystal will decrease the background four times. Of course, it will also decrease the amount of data that can be collected in a single frame. If distances greater than about 15 cm are used, a helium path is necessary or the gains in background will be lost to air absorption. If you have many small crystals, or if your crystals are not radiation sensitive, an increase in signal can be had by pulling the detector back and collecting more crystal positions to make up for the lower reflections per frame.
82
DATACOLLECTIONTECHNIQUES
In tests at Scripps using a Bruker area detector, we found that I/~r increases about 1% per centimeter for helium versus air for distances greater than 10 cm. So at a d of 20 cm, an increase of 10% in I A r is expected. Hamlin-style detectors require greater distances, so helium paths are a must.
.....
2.10 . . . . .
IMAGE PLATE DATA COLLECTION Image plates are relatively new but have the potential of becoming the data collection method of choice. They have a high spatial resolution of 1 0 0 - 1 5 0 / ~ m , similiar to film, and subtend a large angle so that more data are collected at once. ~8 Image plates can be used either as an alternative to film or as a replacement for the detector in an area detector. In the film mode they can be used in the same cassettes that X-ray film is used in and scanned off-line up to several hours later. In the area detector mode they are automatically scanned after each exposure by apparatus built directly into the machine. The dynamic range of an image plate is much higher than that of filmmit can reach 12 bits for image plates, whereas film is limited to 8 bits in practice. 19 Image plates are exposed with X-rays, as with any other detector, and the X-ray photon causes a chemical change in the plate coating that releases a fluorescence that is detected by a photomultiplier when scanned with light of the proper wavelength. Image plates are read out by a laser beam on a scanner. The quality of this scanner largely determines the limits of the image plate. The construction of a high-quality scanner is a technically difficult feat because of the mechanical precision needed and the high quality of the electronics needed to take full advantage of the image plate's capability. The photomultiplier must have low noise and use a high-quality analog-to-digital converter, and the laser used for scanning must be stable and must hit precisely when scanned. Image plates have a wider range of sensitivity with respect to X-ray wavelengths, which gives them higher counting efficiency at higher energies. This makes them the detector of choice for white-radiation Laue experiments that use very bright synchrotron light sources. Because only a few exposures are needed for Laue data sets, manual handling of the plates is not a great disadvantage. For collecting data sets with monochromatic radiation where hundreds of exposures are needed, an automated method of scanning the plates, such as the MAR Research scanner (Fig. 2.38), is a necessity. Miyahara, J., et al. (1986). Nucl. Instrum. Methods A246, 572-578. 19This is based upon practical experience and is not a theoretical limit in either case. 18
2.10 ImagePlate Data Collection
83
FIG. 2.38 Frame from a MAR research image plate. The crystal was rotated 1~about the horizontal axis for a 5-min exposure using a rotating anode source. The edge of the image is about 2.1-A resolution.
Image plates are erased by exposing to white light. This means they can be handled in the room light before they are exposed to X-rays. After exposure they must be protected from light. Cosmic background radiation will slowly expose the plate, so they need to be freshly erased before being used. Exposure to very bright X-rays such as the direct beam will cause a spot that will take a long time to erase and can even show up for m a n y e x p o s u r e erasure cycles (months). With the use of an image plate as a film replacement on a monochromatic X-ray setup, the data are collected using the rotation m e t h o d as previously described. Software used for the analysis of film can be adapted easily
84
DATACOLLECTIONTECHNIQUES
by removing the corrections for film sensitivity, because image plates are linear. Since image plates are becoming common for use in other applications to replace film, such as radiography of gels, an image plate scanner may be available at your institution. These scanners are perfectly adequate for replacing film in preliminary characterizations of crystals.
. . . . . 2.11 . . . . . SYNCHROTRON RADIATION LIGHT SOURCES The term synchrotron light is misleading because the sources of synchrotron light are usually electron storage rings. Synchrotrons are very different machines that are never used directly as X-ray sources. The first observation of synchrotron radiation was made using synchrotrons, and therefore the name. It is inaccurate to say "We are collecting data using a synchrotron," but the term has become so common that "synchrotron" is now synonymous with synchrotron radiation light source in protein crystallographic jargon. An excellent book on synchrotron sources and crystallography is Helliwell's Macromolecular Crystallography with Synchrotron Radiation. There is room here only to touch on the subject and to point out areas of special interest.
Differences from Standard Sources Synchrotron radiation as available at a storage ring has a continuous spectrum in the area of interest to protein crystallographers and is very bright (Fig. 2.39). Even after tight monochromatization where only a small fraction of the total energy is used, the sources are still up to two orders of magnitude brighter than the best rotating anodes. The tight monochromatization can mean lowered backgrounds and decreased radiation damage for the same exposure. Furthermore, the optics at storage rings is usually far superior to anything used in laboratory sources providing tightly collimated highintensity beams. One reason for the better optics is that the source is located meters away instead of within less than a meter, giving effectively a parallel source. The combination of brighter, tighter optics makes synchrotron sources the best for very large unit cells such as are found in viruses with cells from 300 to 1000 A. In our experience with many different crystals we have always found an increase in signal-to-noise ratio at storage rings. The ability to tune the wavelength allows the use of more optimal energies. Wavelengths near 1.0 A ~ show very little absorption by the capillary and the solvent around the crys-
2.11 SynchrotronRadiationLightSources
Synchrotron
radiation___ ----.,,-I e ......
85
J
I
e+
C
!
magnet Wiggler
FIG. 2.39 Synchrotron radiation source. A storage ring has high-energy electrons held in orbit by bending magnets (A). As the electrons accelerate around the curve they emit synchrotron radiation (B). Because the beam is so intense, all experiments are done in shielded hutches that are interlocked so that personnel cannot be inside while the shutters are open. A wiggler (C) is a method of increasing the brilliance of the X-rays by combining several beams from local excursions of the electron path.
tal, allowing wetter mounts while obtaining better signal-to-noise ratios. Corrections due to absorption are minimized. We have not found that harder radiation decreases lifetimes; in fact lifetimes are longer, since the absorption that causes free radical damage is more efficient at lower energies.
Special SynchrotronTechniques The simultaneous availability of all wavelengths led to the development of white-radiation Laue photography. Exposure times for Laue photographs can be very shortml0-ms exposures for a typical lysozyme crystal at the best s o u r c e s u a n d yet contain almost all the diffraction information in one or two photos. Furthermore, since most of the factors that need to be corrected for in reducing the data are a function of wavelength, especially absorption,
86
DATACOLLECTIONTECHNIQUES
a 9
0
FIG. 2.40 How overlaps arise with white radiation. In white-radiation Laue photography a range of wavelengths is used simultaneously, and thus there are many Ewald spheres in diffracting conditions simultaneously. Two are shown here that differ in wavelength by a factor of 2. The resulting diffraction exits the crystal in the same direction and is recorded on the detector in the same spot, leading to an energy overlap.
the presence of the same reflection measured at different wavelengths in the same data set (Fig. 2.40) allows these parameters to be accurately accounted for by least-squares scaling. Moffat and co-workers have collected lysozyme data sets that compare favorably with data collected a diffractometer. 2~ The brightness of the source and high-quality optics make the storage ring an ideal place to collect data on very large unit cells like those found in viruses.
Time-Resolved Data Collection The short exposure times needed at the storage rings has allowed the collection of time-resolved protein crystallographic data. Reactions are initiated by laser flashing or in flow cells (for very slow reactions) and then data are collected by white-radiation Laue photography at appropriate time points. When an undulator is used to intensify the beam and white-radiation Laue photography on storage phosphors, the exposure time can be as short as microseconds. One of the chief difficulties can be the relatively high concentration of a protein crystal. This makes it difficult to deliver enough substrate, and the optical density can be very high. If light is to be used to start a photoreaction, 2~ B., and Moffat, K. (1987). In "Computational Aspects of Protein Crystal Analysis. Proceeedings of the Daresbury Study Weekend, DL/SCI/R25" (Helliwell, J. R., Machin, P. A., and Papiz, M. Z., eds.), pp. 84-89. See also, Helliwell, J. R., et al. (1989). J. Appl. Crystallogr. 22, 483-487.
2.12 DataReduction
87
high absorption necessitates intense light sources and causes gradients across the crystal. It is better to illuminate off the absorbance peak at a position where the crystal is still transparent to the light so that light can get to the entire crystal volume. These experiments are, therefore, technically demanding and must be done carefully to ensure that most of the crystal is synchronized; otherwise the time resolution will be lost. Reactions can also be started by diffusing in substrates using a flow-cell apparatus. In this case, the reaction must be very s l o w - - o n the order of h o u r s - - o r else the diffusion time will be greater than the reaction time and the reaction will not be synchronized across the crystal.
. . . . . 2.12 . . . . . DATA REDUCTION Integration of Intensity Integration of the intensity in a spot is a matter of separating the background counts from the reflection. Two methods are in general use: 1. Mask and count. In this method the region that is to be considered the spot is masked and the pixels within this region are summed (Fig. 2.41). The background is determined from the pixels adjacent to the spot and this value is subtracted to give the final intensity. The method works well when spots are well above background. The pixels can be counts in the case of counters, as in diffractometers and area detectors, or optical densities in the case of film. nP
nB
Ihkl = ~ countsP - ~ countsB 1
1
2. Profile fitting. In profile fitting a curve is fit to the data and the area under the curve is taken to be the intensity (Fig. 2.42). The curve, or profile, can either be a geometric shape such as a Gaussian or it can be derived by averaging over the brighter spots. The advantage of the latter method is that bright reflections can be used to determine the profile, which is then applied to weak reflections. Different profiles are usually used depending upon the position of the spot on the detector. For example, the detector might be separated into a 4 x 4 array and a different profile used in each of the 16 areas. Then, to find the area of the spot, this curve is best-fit to the counts found in the area where the spot is predicted to be, and the area under the curve is then used to find the integrated intensity rather than the counts themselves.
88
DATA COLLECTIONTECHNIQUES
1D
i
FIG. 2.41 Integration by masking in one and two dimensions.
Error Estimation An a c c u r a t e e s t i m a t i o n of the e r r o r is i m p o r t a n t . T h e e r r o r of a single reflection is t e r m e d its or. C o n t r i b u t i o n s to or: C o u n t i n g statistics-or .......t ~ - V ' N p e a k + N b a c k g r o u n d (note t h e i n clusion of b a c k g r o u n d counts) 9 Instability of d e t e c t o r ; usually a c o n s t a n t 9
t I
I
FIG. 2.42 Profile fitting. Profile fitting can more accurately find the intensity of a peak--especially in the example on the right, where the background is sloped.
2.12 DataReduction
89
9 Profile fitting: deviation from observed and ideal shape 9 Local variation in background Other sources of errors are saturated pixels (photographic film is especially vulnerable to this), overlapped profiles, and errors in background models. Merged multiple measurements of several reflections should be weighted by o'. Different data reduction packages will determine different values of o-, and the data are probably better averaged without ~r weighting. In my experience the o- of some packages can differ by at least a factor of 2. Reflections are often rejected by the ratio of intensity to ~r, I/o(I).
Polarization Correction The polarization correction arises from the dependence of scattering efficiency as a function of scattering angle. For polarized sources, the scattering efficiency is also a function of the change of polarization direction with the angle of the scattering plane. Sources can be polarized by a monochromator, so this correction is dependent upon the optics of the source used. For unpolarized radiation, p = 1/211 + cos2(2{3)].
Lorentz Correction The Lorentz correction accounts for the rate with which a reflection passes through the Ewald sphere. Reflections near the rotation axis remain in diffracting conditions for a longer time. At some point this correction becomes so large that the reflections very close to the rotation axis are rejected.
Decay or Radiation Damage Prolonged irradiation of a sample induces radiation damage. Decay usually affects higher-resolution reflections faster. If there is a choice, the higher-resolution data should be collected first. Decay should be monitored by collecting a set of standard reflections. If the decay exceeds about 20%, data collection should be halted. Although a decay correction can partially account for decay, different reflections can decay at different rates so that a single decay parameter cannot restore the accuracy of the data set. Radiation damage can be reduced by lowering the temperature (see Chapter 6). This slows down the free radical chain reactions that are thought to induce radiation damage. Decay is a function of time and dose. However, it is not linear with dose, and brighter sources can collect more counts before the same
90
DATACOLLECTIONTECHNIQUES
amount of decay sets in. This is a great advantage of synchrotron sources. Also, once a sample is irradiated the free radical chain reactions are initiated and will continue even after the beam has been off for some time. Irradiation affects samples at different rates, and some samples are very sensitive. The presence of a metal that absorbs X-rays more efficiently, such as iron, platinum, or mercury, can speed up decay.
Absorption Absorption is probably the largest source of uncorrectable error in data sets. The path length of the diffracted X-rays through glass, crystal, solvent, and air determines the amount of absorption. This path length is different for each reflection. Unfortunately, there is no entirely accurate way to model this absorption. Two approaches are generally used. In the first, experimental measurements are made of the absorption in different directions through the sample, and each reflection is corrected by these factors. In the second method, a least-squares fit is made to the differences between symmetryrelated reflections as a function of some parameter believed to be a function of absorption. The experimental correction is easily calculated in the case of a diffractometer. In the case of two-dimensional detectors, the second method is normally used. The overall error in a data set can be estimated by comparing symmetry-related reflections, which in the ideal case would be indentical. The reflections are calculated as
~,,(I,, - L) R ~y111111 ----
...... 3 ...... COMPUTATIONAL TECHNIQUES
There are several different crystallographic software packages available and it would be impossible to cover them all. The XtalView package is used for specific examples in this book. XtalView is a window-based visually oriented package that is especially easy for novices to learn. Options and commands are shown as buttons, sliders, and menus. All options are visible, making it easy to spot them and ideal for publishing in book form. You may not want to use XtalViewmperhaps you already have a favorite package. In any case, most programs have similar options and features. For consistency, a single package is necessary for this book so that we can get right to explaining the methods and spend less time explaining the particular implementation. XtalView was written at the Research Institute of Scripps Clinic by the author. It runs under X-windows, which is available on most workstations (Figs. 3.1 and 3.2). At present it has been ported to Sun workstations, (including the SparcStation series), Silicon Graphics, and DECstations running ULTRIX. DENZO, MOSFLM, and XPLOR are used as the primary examples for data collection and protein refinement, which XtalView does not include.
91
92
COMPUTATIONAL TECHNIQUES
A
r~
XtalView
Xtalmgr
Project: examplesA
~-~
I ~ examples
~
Crystal: cvccp
Ne....
Directory:. /as d/prog/XtalV iew/exam pies Utilities: ~
Applications: ~
resflt limit1 [limit2] < >
xHeavy Command: xheavy ccp.aul.sol
(LiSt Files)
('Auto Name Output)
Input Argument 1:
ccp.aul .sol Filter: *.sol
I ccp.aul.sol
ccpaul ano,sol
r---i
I i~
(Add Args)
( Run Command]
( History,,.~
Input Argument 2:
Output Argument:
Fi Iter:
Filter: *,phs
r, 1
ccp,calc.phs ccp.phs ccp50.phs hp50.phs
o x
FIG. 3.1 XtalView xtalmgr. (A) The XtalView xtalmgr program is used to organize data and to start the individual applications. It has a graphical user interface using buttons, pulldown menus, and scrolling lists. Data are organized into Projects, which can be entered and edited using the field at the top of the window. The Crystal field is a keyword used to access the parameters for a specific crystal type such as the unit-cell parameters and the the space group symmetry operators. Other applications are selected from a pulldown menu (not shown) accessed from the Applications glyph. Selecting an application causes all files with the correct extension to be listed in one of the three file lists at the bottom. Files can be selected from these lists by clicking on them with the mouse. The command line is then built up using Add Args and then the application is started with Run Command. (B) The crystal editor is used to enter the unit-cell parameters, space group information, and any other relevant information. The space group symmetry operators for all space groups are kept in a table and can be accessed either by space-group number or by symbol as found in the International Tables for Crystallography, Vol. 1. The information entered into the editor is then available to all XtalView programs by simply entering the crystal keyword ( c v c c p in this example).
3.1 Terminology
93
B
r-'~
Crystal Ed itor Crystal: cvccpA Title: Unit Cell: 49.2 5G.7 98.8 90.0 90.0 90,0 S pace Gro u p: P2(1)2(1)2(1) Space Group#: 1 ~ ( Find Space Group by number) Symmetries: ~ 1/2-x,-y, 1/2+z; 1/2+x,1/2-y,-z; -x,1/2+y,1/2-z, Other Fie Ids: Keyword: Data:
(ReplaceField)
(Create
Field) (DeleteField)
chromatium vinosum ccp orhtorhombic form ncrsymml 1.0 0.0 0.0 0.0 1.0 0.0 0,0 0.0 1.0 0.0 0.0 0.0 ncrsymm2 -0,99881 -0.03213 0.03673 0.02744 -0.99217 -0.12187
( Up date Th is Crystal)
FIG. 3.1 (continued).
.....
3.1 . . . . .
TERMINOLOGY Reflection
A reflection is a single X-ray-diffraction vector that is the combined scattering resulting from the individual scattering of all the electrons in the unit cell along a particular direction. It has a magnitude, tFI, that is referred to as F, a phase, a, and the Miller indices h, k, I. The diffraction vector for protein is called Fp and the same diffraction vector with a heavy atom soaked in is FpH. The diffraction vector for the heavy atom alone is fh (the lowercase letters remind us that fh can never be directly measured but is always calculated). Two separate observations of the same reflection will be called F1 and F2.
Standard OPEN LOOK window frame Quit button
Current Crystal Database Key
--
Input file and directory
W i p n a l SP.E.
unkcd. ~ 0 5 2 0 i 6 7 0 0 1 ~ ~ m 1 ~ ) m w m i m
.. . . ......... .. .. ....... ... .. ,. . .. .. .' ... . . . .. ...
krn2lrtrhg"~liRrluul
flk b h l o f l n
Several file types supported Radio button controls
tn
C.*r
I
.my
Hnl
I df I ut
law
Viewer W a I.1.
.
1 nllnbov
I
* *
J.
.m.
8.
.*
....... . ... r
E
svmm:
e -10 +10
.. *.
2z 1:
I'
-
1.
Y'
.v. .-v. - -v. .
x.
V'
-v.
-1.
V'
1,
V'
V'
I'
.
-
I
POllf<m4: X' I' -1 P?s 1.5 A), the refinement strategies can be improved to take advantage of the increased number of data available. By going to higher resolution, we can add many more parameters to the model, including thermal anisotropy, split side chains, and riding hydrogens. The validity of these has been well known in small-molecule crystallog-
3.11
203
Refinement of Coordinates
D ,
180
,+ +
'+ +
+!
+
+ +
120
! I
,
|
I
!
l
"-
:
-"a I
I
"I"
:
i
I
I
,
'
+I
I
§ +
9
.....
+
+
+
+
i 4-- W
Ill Ii If I! I!
. . . . . . . . . .
+
+
+ +
...... +
+
+
1-t-! !
+
!
+
i
1 +
i
i++
r.......... H ........... --k. -180
+,,____+_ ........
"'"-'i
I
iI
Ii Ii
I
,
-180
+
9
i
. . . . -I,i + ',
+
9. . . . I I I !
-~,...,| T+
II
l
! ,
§ ....
',J
............. + + + ---_1_
I
"',, '~:
+
,-ii x i!
i+:
+i+
I
-120
~, t ~ ' "
+ +
i 1 !
I
+ ++
++
+
+
---+-.... N.--~,...:' I ~"~.~
-60
+11",
i -,
'
:+
:
-t-__.I,' . . . . . ;'. . . . . . . . . . . . . . . . .
i
""- ,
'~
i
60 I . . . . . . . . . . . . -.-.- ' -
Psi
:
+~
i .... "-+--+ ~.§
I+
'
0
i ii
I! \ \
I
............
-120
+
.....
I . . . . . .
-60
FIG. 3.33
o Phi
l,
60
. . . . . . .
i ...... 4-
120
.
180
(continued).
raphy for many years, where R-factors of 1% are obtainable. The program to be used here is SHELX-97, the latest version of SHELX written by George Sheldrick. 27 SHELX has not been widely used to proteins in the past, partly because converting the large number of macromolecular coordinates and residue structures to small-molecule conventions was troublesome. However, the new version has features that make these conversions unnecessary or automatic. It also builds the geometry restraints needed for refining structures at less than atomic resolution. SHELX-97 has several advantages for ultra-high-resolution refinements: 1. The structure factor calculation is more accurate than in XPLOR or TNT. At resolutions above about 1.8 A, XPLOR and TNT show significant 27Sheldrick, G. M. (1998). " H i g h Resolution Structure Refinement" in Crystallographic (K. W a t e n p a u g h and P. E. Bourne, eds.). O x f o r d University Press, Oxford.
Computing 7
204
COMPUTATIONAL TECHNIQUES
180 i\
i
'
i"
++.Z!
+i+ ++ ,
+
,
\
120 i
i
60
.........
"-. + ""
Psi .
.
.
.
.,".. ,I I
-60
.
.
.i.! . . . . . . . . . . . . . .
I
I ,
-
i. . . . . . . . . . . . .
s'!
'...I.,illi.l
, -~-
i.~
I . . . . . . . . . . .
I'I| ,
L .
.
.
- v ~ -!:~', ""-.. i
+ ' ~ ; ' l i ~" l. i i i ~ " L
i--
Ii ii ii
"
i .
9
L ........
II +
.
-. " ....
II
, _._._++,,.,,
I ', " 9
",.4-
," I I I
I .....
'
++ -k
~ |
.
I
,,
s~S ~SSp a SS i (" I
.
l I
. . . . . . . . . . . . . . . . . .
.
,
I
',
i
,
..I
-120
-180 -180
120
-60
FIG. 3.33
0 Phi
60
120
180
(continued).
errors that are due to the use of an FFT approximation, and by 1.0 .Ji these errors have become a significant portion of the R-factor. 2s 2. Anisotropic B's can be used if there are enough data. Proteins typically exhibit considerable anisotropy, and including this model increases the accuracy at the expense of more parameters. 3. SHELX includes generation of fixed or riding hydrogens. These hydrogens move with the atom they are bonded to and effectively make the structure factor calculation more accurate. 4. Partial structure is easily generated and refined. SHELX has free variables that allow correct refinement of the occupancies of the split parts. 2Sin an FFT from model to structure factors, the model coordinates themselves are not transformed, but an approximation of the equivalent electron density built on a grid. In theory, an accurate enough representation can be computed, but in current practice it usually introduces a small error.
3.11 Refinementof Coordinates
205
One of the chief drawbacks is that the program is considerably slower than the above-mentioned refinement software. However, with the increased speed available on today's computers this is no longer a serious impediment. The quality of the density of very-high-resolution structures is greatly increased (Plate 7). Features of SHELX
Use of Intensity Data Considerable experimentation by Sheldrick and others has shown that it is best to refine directly against the original intensity data (I) rather than ]FI, and to include all the data, even the negative observations. The argument for this is that the intensity data, consisting of real observations, have errors that give an expected distribution which for the weakest data has a negative component, that is, because of the experimental error, some of the weak data will be negative in intensity as they are measured to be less than the surrounding background. For example, consider a reflection that is exactly 0 in intensity. Half the time it is measured it will be greater than 0 and half the time less than 0. If we threw out the negative measurements and then averaged, we would find that the reflection now has a positive intensity. Leaving out negative intensities raises the mean of the weak highest-resolution data and thus affects the weighting scheme. This change to the weighting leads to slower convergence. Since it is impossible to take the square root of a negative number, when IFI's are used negative observations are removed from the data set, leading directly to this problem. As discussed later, the inclusion of negative intensities is also needed for correct uncertainty calculations.
Riding Hydrogens Riding hydrogens are hydrogens added to the model in a fixed geometry to the heavy atoms. Thus, they are not free to refine but move with the heavier atom to which they are attached, their chief effect is to make the structure factor calculation slightly more accurate by accounting for the small, but measurable, contribution to the scattering in the crystal. Indeed, if they are left out the heavy atom will move slightly in the direction of the left-out hydrogen to account for the missing density. Given accurate data, riding hydrogens become significant somewhere around 1.5-A resolution. An R-free cross-validation test can be used to verify that riding hydrogens are valid. Typically a 1% drop in R-free is found when hydrogens are added.
Anisotropic Thermal Parameters At lower resolutions a single isotropic thermal factor, B, is used to represent the thermal vibrational motion of the atom, as well as static disorder
206
COMPUTATIONALTECHNIQUES
within the crystal. A more complete model of this motion uses six parameters: three to describe the axes of motion of the vibration, and three to describe the magnitude in each direction. In reciprocal space, these motions are described by a symmetrical matrix with the terms, Uij, which are the actual parameters refined. We have added a feature to the latest version of xfit to view these termal parameters in a manner similar to the familiar program ORTEP, but in real time. Examples of this are shown in Fig. 3.34. The main obstacle to using anisotropic thermal parameters at lower resolution is the lack of sufficient data to justify adding five more parameters per atom. As can be seen in Table 3.5, increasing the resolution from 1.9 A to 1.35 A results in three times as much data being available. If anisotrpic thermal parameters are added at 1.9 A, the ratio of data to observed parameters becomes about 1 : 1 (10,741 : 9211 ), considerably less than the 2 : 1 minimum required for a well-conditioned least-squares minimization. The exact point at which this crossover occurs for a given protein crystal depends on the solvent content. If the solvent content is higher than 50%, then there are fewer protein atoms for the same volume unit cell and the crossover is at a lower resolution. For low-solvent content crystals, the 2 : 1 crossover happens at a higher resolution.
Solvent Model SHELX includes a bulk solvent model based on work by Moews and Kretsinger 29 and being used increasingly in all refinements. Most refinement protocols in the past have excluded from the calculations the so-called solvent region (roughly, reflections between infinity and 7-5 A). Using the bulk solvent correction, which includes just two adjustable parameters, K, a scale factor, and B, a thermal parameter, it is possible to include all the data from infinity on up with a substantial drop in R-value for the low resolution data. This has been done in the refinement example in Table 3.5. A further advantage is that maps that were calculated without the solvent region can be subject to ripple as a result of series termination errors (see Sec. 3.12, Resolution Cutoffs), and the inclusion of the low-resolution terms removes that problem.
Split Side Chains In high-resolution structures, especially in frozen high-resolution structures, discrete disorder of the protein can often be detected. For instance a side chain on the surface may have two equally populated conformers. At 2 9 Moews, P. C., and Kretsinger, R. H. (1975). Refinement of the structure of carp muscle calcium-binding parvalbumin by model building and difference Fourier analysis. J. Mol. Biol. 91,201-228.
3.11 Refinementof Coordinates
207
FIG. 3.34 Stereo figures of the Fe3S4cluster and the S~ ligands at 1.35 A illustrating the density and the corresponding thermal ellipsoids. (A) 2Fo - Fc erA-weighted electron density map contoured at 5or. (B) Thermal ellipsoids showing the 25% probability surface (i.e., an atom can be found within this surface 25% of the time) for the cluster and 50% probability lines for the major axes. Note how all the atoms move in similar directions. The long axis is along the crystallographic c axis and, since all atoms in the crystal show this same elongation, it is probably
a n y given m o m e n t half the side chains in the crystal will be in o n e o r i e n t a t i o n a n d half in the other. B o t h p a r t s can be refined s i m u l t a n e o u s l y in S H E L X , a n d the relative o c c u p a n c i e s of the t w o parts can be tied t o g e t h e r a n d also
TABLE 3.5 Examole Refinement of 7-Fe Ferredoxin (FD) Using the Methods Outlinedd
Start (5FD1)
1.9-z
868
0
30
3547
10741
44.6
48.1
43.9
47.0
1 Refine model as is
1.9-1
868
0
30
3547
10741
27.7
33.1
27.5
32.6
2 Refit
1.35-1
989
0
30
3547
29336
27.2
30.5
25.6
28.9
3 Added waters
1.35-1
958
0
188
4096
29336
19.8
23.9
18.5
22.7
4 Pruned waters
1.35-1
1013
0
157
4087
29336
20.0
24.3
18.7
23.0
5 Anisotropic B's, refit
1.35-7-
1013
0
157
9177
29336
16.1
22.3
14.8
21.0
6 Added hydrogens
1.35-x
1013
776
157
9177
29336
15.4
21.3
14.1
20.1
7 Diagonal block
1.35-x
1013
776
157
9177
29336
15.4
21.5
14.0
20.3
8 Added waters, refit
1.35-x
1011.5
776
162
921 1
29336
15.3
21.3
9 Final refinement (6FDI)
1.35-x
1011.5
776
162
921 1
30880
15.0
-
13.9 13.8
20.1 -
. z N , 3occupancy , sum of asymmetric unit; N,, number of hydrogen atoms; N,,number of waters; N,,,, number of refinement parameters; Noh,,number of observations; R = 1 [ ( F , , - F L ) / EF]100; R,,,, calculated for 5% of data not used in refinement; R (>40) including only data for which F / a ( F ) > 4.0.
3.11 Refinementof Coordinates
209
refined. We have added a feature to xfit that simplifies splitting the side chains and fitting the two halves. Depending on whether one side of the split is clicked on or the root is clicked, xfit will fit half of the residue or the entire residue. Besides inspection of the electron density maps, split atoms can be found by an examination of the anisotropic thermal parameters. SHELX detects possibly split atoms and prints a list for further inspection.
Cross-Validation To check the validity of each step and the overall refinement, the statistical parameter R-free is used. Five percent of the data is held in a separate pool and kept out of in the least-squares refinement. Thus, if R-free decreases it must be because the model has become better, while for the other 95% there is the danger that the R-value decreased because of overfitting of the free parameters of the model, leading to a false minimum. When parameters are added to the model, the validity can be checked by an expected drop in R-free. Another use is to check for the best value of refinement parameters. For example, the sigma applied to bond lengths can be varied in a series of refinements. As this is done, it is found that the R-value slowly decreases as the restraint is removed. This is expected because removing the restraint allows the minimizer to achieve a closer fit. However, R-free shows a shallow minimum where tightening the restraint causes an increase in R-free, and relaxing shows no improvement in R-free and eventually actually increases R-free. Thus, R-free can be used to find the correct target sigmas for bond lengths, bond angles, and thermal parameters. In the sample refinement of Table 3.5, note how the R-free drops 2 % when the thermal model is changed to anisotropic, and 1% when riding hydrogens are added. As a test of R-free we tried refining a model with anisotropic thermal values at a lower resolution where the ratio of data to parameters was about 1:1. Although the R-value dropped 3 %, the R-free actually increased 1% and justified our confidence that by following R-free we can avoid adding too many parameters too soon.
Checking the Refinement Besides following the R-value, R-free, and visually inspection maps, the resulting structure can be checked by looking at the final polypeptide geometry. For this we use PROCHECK. 3~SHELX does not refine any torsions and so these may be used in a manner analogous to R-free in that the torsions 3~ R. A., MacArthur, M. W., Moss, D. S., and Thornton, J. M. (1993). PROCHECK: A program to check the stereochemical quality of protein structures. J. Appl. Crystollogr., 26, 283-291.
210
COMPUTATIONALTECHNIQUES
were not restrained to target values. In the 7-Fe ferredoxin example of Table 3.5, we found that the torsions of the main chain and side chains fell well within the P R O C H E C K plots. In face we were delighted to find that the bond lengths and angle distributions were below our restraints in SHELX and even tighter than the Engh and Huber geometry 31 in PROCHECK, clear evidence that the restraints did not overly determine the geometry but were instead determined by the extra resolution of the data.
Positional Uncertainty Analysis Positional uncertainty analysis involves calculating the standard deviations of the positional parameters or, as these are known in crystallographic jargon, standard uncertainties. Traditional small-molecule methods of estimating positional uncertainties 32 involve normal matrix inversion. (The standard uncertainty was k n o w n as the estimated standard deviation until the International Union of Crystallography recommended changing the terminology.) Calculating standard uncertainties is quite distinct from refinement, but, because it is usually done by the same software, the processes are often confused. Traditional uncertainty analysis is proteins has consisted of little more than a Luzatti plot of R-value versus resolution (which as Cruikshank 33 points out is a misuse of Luzatti's method), or a erA calculation. 34 In either case these methods lump the entire structure into one average uncertainty. Some parts of the structure will be much worse and some much better. For something as plastic as a protein, these make very poor methods of uncertainty analysis. A better method proposed by Cruikshank allows for estimating uncertainties based on the atom type and its B-value. Uncertainties have been shown to correlate well to Cruikshank's equation by comparing uncertainties calculated by full-matrix inversion with those derived from the equation: 3~Engh, R. A, and Huber, R. (1991). Accurate bond and angle parameters for X-ray protein structure refinement. Acta Crystall(}g. A47, 392-400. 32Schwarzenbach, D., Abrahams, S. C., Flack, H. D., Gonschorek, W., Hahn, T., Huml, K., Marsh, R. E., Prince, E., Robertson, B. E., Rollett, J. S., and Wilson, A. J. C. (1989). Statistical descriptors in crystallography. Report of the International Union of Crystallography Subcommittee on Statistical Descriptors. Acta Crystallogr. A45, 63-75; Schwarzenbach, D., Abrahams, S. C., Flack, H. D., and Wilson, A. J. C. (1995). Statistical descriptors in crystallography. II. Report of a Working Group on Expression of Uncertainty in Measurement. Acta Crystallogr.
A51,565-569. 33Cruikshank, D. W. J., Protein precision re-examined: Luzatti plots do not estimate final erros, In "Macromolecular Refinement, Proceedings of the CCP4 Study Weekend, January 1996" (Dodson et al., eds.), 1996. 34Read, R. J. (1986). Improved Fourier coefficients for maps using phases from partial structures with errors. Aca Crystallog., 42, 140-149.
3.11 Refinementof Coordinates
,~(xi) k
(
~]nat /=1 .... Z 2 / Z
2
nob . . . . . tions -- np . . . . .
ters
)
1 + 0.04Bi + 0.003B } 1 + 0.04B w + 0.003 B 2
211
C-1/3
dmin R,
where k is about 1.0, Z is the atomic number, B w is the Wilson B for the structure, B i is the B-value of the atom in question, C is fractional completeness for the data to dmin, and R is the crystallographic R-factor. Note that as the number of observations increases, the uncertainties will decrease, and if the number of observations falls below the number of parameters, the equation becomes invalid. In this formula, the uncertainty for a given atom in a given structure depends mostly on the B-value for a given atom, Bi, and the scattering factor of the atom, Z i , which together determine the atom's contribution to the total scattering. Atoms with low B-values have lower uncertainties in their positons, and atoms with higher atomic number and thus more electrons, will also have lower uncertainty. For a typical protein it becomes possible to estimate individual atom positional uncertainties somewhere above 2.0-A resolution, with the uncertainty strongly dependent on resolution. 3s Cruikshank gives examples where a 2.0-A resolution protein structure has an average uncertainty on coordinates of 0.32 ~i (Cruikshank, 1996: see note 33). At 1.6 A resolution the average uncertainty drops to about 0.13 A, and at 1.0 A resolution the typical uncertainty is about
0.03 A. Calculating Standard Uncertainties If the normal matrix of the least-squares minimization operation is inverted, the resulting matrix, after scaling, contains at each element i j, cijcricri, where cij is the correlation between the two parameters i and j and cri and cri are their standard deviations. On the diagonal of the matrix where i - j, since the correlation of a variable with itself is 1, this becomes cry, and thus inverting the matrix gives us the standard deviation on any parameter. In crystallography these standard deviations are termed the standard uncertainties. Each atom coordinate and thermal parameter will thus have a standard uncertainty associated with it. The radial uncertainty of an atom at x, y, z
3SThe uncertainties are actually dependent on the ratio of observations to refined parameters, but since the number of observed data increases with increasing resolution, for a given protein this is an accurate statement. The resolution at which this ratio crosses 1.0 depends on each crystal's characteristics. Proteins with high solvent content will have fewer parameters to refine and proteins that have more than one molecule per asymmetric unit that can be averaged will have fewer free parameters to adjust.
212
COMPUTATIONALTECHNIQUES (a) Atom positional esds including solvent (Angstroms) 0.25
./...i
0.20
9 9
0.15-
9
o 9
:.:) 9
9
o"
9
~:~0.10, 9149
~..
11 9
0.05
o.oo-~ 0
5
10
1'5
20
25
30
3'5
40
Equivalent B-value
FIG. 3.35 Plot of positional uncertainties versus thermal parameter for carbon, nitrogen, and oxygen atoms from a 1.35-A resolution structure. The upper, black line is for carbon, the middle line is for nitrogen, and the lower line is for oxygen. Note that as the atomic number for an atom increases (carbon = 6, nitrogen = 7, oxygen = 8), and thus its contribution to the total scattering, the positiona uncertainty decreases. As the B increases, so does the positional uncertainty.
w i t h u n c e r t a i n t i e s of o.x, o-y, o.z will have a radial u n c e r t a i n t y of 9 2 (o.2 + o .2 + o.zZ). Similarly, the s t a n d a r d u n c e r t a i n t y on a b o n d can be calc u l a t e d by n o t i n g t h a t for t w o u n c o r r e l a t e d q u a n t i t i e s that are s u b t r a c t e d , the s t a n d a r d deviation of the r e s u l t a n t is o.22 j = o.)z + o.2z. T h u s if we calculate the length of a b o n d we can calculate the s t a n d a r d u n c e r t a i n t y on the b o n d , o. b,,n~, with length, l, f r o m the p o s i t i o n a l u n c e r t a i n t i e s by the e q u a t i o n o.2
bond
-
(o.x2 + o.2) x, - x2 I x2 I
- y2 + (o.2Y l + o.}') y~ I -
+ (Oz~ + O z~) z, - z~ 1
,)
/
3.11 Refinementof Coordinates
213
which is the sum of the positional uncertainties projected onto the bond to account for the direction dependence of the bond. Similar considerations can be used to find the uncertainty on any quantities derived from the atomic coordinates. The foregoing treatment assumes that the atoms are uncorrelated. If the off-diagonal elements are nonzero, this indicates a correlation between parameters. The general equation is more complex and takes into account the off-diagonal correlations and the uncertainties in the unit-cell measurements. If two independent measurements of the same quantity are averaged, such as two bond lengths in a dimeric molecule, the standard uncertainty of the average is ~r~ - X/(~rx21 + ~rx2 ~ )/2 9 The general equation for n measurements is o-~ - X/s The standard uncertainties for proteins can be calculated using the program SHELX. First the structure must be refined to convergence, and then a cycle is done using the full matrix with zero Marquardt damping and all restraints switched off (including the restraints will give artificially low values). All reflection data are used; there are no deletions due to weakness and no exclusions by resolution, even of low-resolution data often excluded as the "solvent region." SHELX then calculates the standard uncertainties using the full covariance matrix and the estimated uncertainties in the unit cell. The standard uncertainties in the derived parameters bond lengths and angles are then calculated from the positional uncertainties. The relevant SHELX commands are L.S.
DAMP BOND
1
0
0
Remove the commands DFIX, DANG, FLAT, and other restraints, since these will artifically lower the standard uncertainties.
Block Diagonal Calculations For several reasons, only a few proteins thus far have been analyzed by matrix inversion for the uncertainty in atomic positions. The main reason has been that until recently, most X-ray structures had not been determined to high enough resolution to have a big enough ratio of data to parameters to allow meaningful calculation of standard uncertainties. Second, even when the resolution of the data was high enough, the calculation for inverting the full least-squares matrix was too large to fit in the memories of the available computers. For a given protein the amount of memory required to hold the full least-squares matrix needed for the calculation of uncertainty is given by the equation [n(n + 1)/2] • 4 bytes, where n is the number of
214
COMPUTATIONALTECHNIQUES
parameters. 36 For a typical small protein with 1000 atoms and 200 solvent molecules, the number of parameters including x, y, and z, and six thermal parameters per atom is then 1200 atoms x 9 parameters = 10,800 parameters atom which would require 222 megabytes of memory plus an overhead of about 10 megabytes. For a medium-sized protein of 3000 atoms, the amount of memory for the matrix increases to 1.4 gigabytes! Because of the manner in which the memory is accessed during the analysis, it is not possible to set up the problem to use virtual memory efficiently. -~7This means that unless the computer has enough main memory to hold the entire matrix, continuous swapping of memory pages causes the program to slow to about 1% of full speed, which also renders the computer useless to others on multiuser systems. Recently, computers with very large memories have become available. The smaller proteins can now easily be done on a lab workstation with one gigabyte of memory. A useful approximation to the full-matrix calculation is a blockdiagonal calculation, where portions of the full matrix are extracted into smaller matrices along the diagonal. A good approximation is to use a block matrix retaining the three (x, y, z) positional parameters for all atoms without the thermal parameters, which will require 32/92 or 1/9 as much memory. This gives a considerably smaller matrix of 154 megabytes for the 3000-atom example. Since the thermal parameters contain very little information about the positions of the atoms, this is a good approximation. We have done tests showing that there is only a 1% difference between calculations done with and without thermal parameters. The SHELX command for doing the positional terms by block-diagonal standard uncertainty analysis only is: L.S.
1
DAMP
0 0
BLOC BOND
1
The memory requirements can be further reduced by breaking up the positional parameters into overlapping blocks. For example, the protein 36Since the matrix is symmetric about the diagonal, the calculation requires the diagonal plus the upper half of the matrix, which is (n(n + 1))/2 matrix elements, and each element requires 4 bytes of memory. 37To evaluate a single parameter, the column and the row containing the parameter need to be accessed. Thus if the matrix is arranged to put columns sequentially, rows will be far apart, and vice versa. If the program starts to swap, very large number of pages are continuously moved in and out of memory and the CPU grinds to a near halt.
3.11 Refinementof Coordinates
215
could be cut into several pieces with an overlap of two residues. If we cut the 3000-atom example into three pieces with an overlap of 20 atoms, into 1020-atom pieces (3060 parameters/block), the memory requirement for each block drops to 18 megabytes. Similarly, the time to calculate the standard uncertainties is reduced. The drawback is that the standard uncertainties will be underestimated because the sum parts of the matrix have been omitted. The amount by which the standard uncertainty is underestimated can be estimated however, and compensated for. Figure 3.36 shows the percentage by which the error is underestimated when the positional parameters are divided into blocks. As shown in the figure, when the 12 overlapping blocks are used, the standard uncertainty is underestimated by only 16%. For large proteins, this is an excellent approximation.
SHELX Refinement Strategy As a typical example of how to use SHELX in protein refinement we'll assume that a data set has been collected on a crystal using synchrotron radiation and cryocrystallography and now a very-high-resolution data set
0
-
-4-
-8-
E3
o~
-12 -
-16
-
-20 0
I
i
I
J
i
I
2
4
6
8
10
12
14
no. of blocks
FIG.3.36 Effectof block size on the underestimation of positional uncertainty parameters.
216
COMPUTATIONALTECHNIQUES
exists for a protein that was previously solved a room temperature with 2.0resolution data. This is a typical application of SHELX. The following example is for SHELX-97 or later versions. See Appendix B, Useful Web Sites, for the SHELX site that offers information about obtaining SHELX. Start with a PDB format file with the coordinates to be refined and run SHELXPRO with the I option, .ins from PDB file, which creates an .ins file from the PDB coordinates. The program will prompt you for the unit cell and space group. N a m e this file something like prot. i. ins. Also prepare your data and convert SHELX .hkl format. You can do this with xprepfin. The data must be divided into a working set and a free set for R-free crossvalidation. This is done with SHELXPRO and the V command, which marks the R-free reflections with a - 1. For the example, this file will be p r o t . h k l . To turn on the R-free calculations, edit the .ins file and add the - 1 flag to the CGLS command: CGLS
20 -i
When SHELX runs it will look for the reflection data in a file that has the same name as the .ins file but with the extension changed to .hkl. Rather than copying or moving the data file, we can create a soft link to the file. Thus we need to enter: in -s p r o t . h k l
prot.l.hkl
N o w we can run SHELX with the command: shelxl
prot.l
>& p r o t . l . l o g
&
This starts SHELX in the background and puts the output into a log file. We can follow the log file to monitor progress with the command tail
-f p r o t . l o g
which continuously monitors the file for new lines being added as SHELX runs. As the program runs it will create three other output files, a CIF format reflection file with an .fcf extension, a longer log file with t h e . 1 st extension, a new instruction file with the extension .res (for restart file), and a new PDB format file with the extension .pdg. When SHELX stops you can fit the output with xfit by building the command xfit p r o t . l . p d g
prot.l.fcf
After fitting the model, write the output PDB file from xfit into p r o t . 1 . f i t . p d b . With SHELPRO you can update the .res file with the PDB file using the U command. Write the output from this into p r o t . 2. i n s . Make another soft link to p r o t . h k l and run the new .ins file. Repeat this process until further fitting seems to be unnecessary, adding or subtracting waters as needed (see Chapter 4, Sec. 4.4, Editing Waters).
3.12 Fittingof Maps
217
If you are above 1.5-A resolution, you can make the model anisotropic by adding the command"
ANISO_* $C SN $0 $S to change the thermal model for isotropic to anisotropic. Run this refinement and look for a drop in R-free of about 2 %. After another manual fitting step, again, if at very high resolution, add riding hydrogens by removing the REM comments from the HFIX commands in the .ins file. This should give a drop in R-free of about 1%. Continue refinement if necessary to convergence. For the last cycle, remove the -1 flag from the CGLS command to refine the final model against all the data. The R-factor should be in the teens and R-free 5 - 1 0 % greater than the R-factor.
. . . . . 3.12 . . . . . FITTING OF MAPS Calculating Electron Density Maps Resolution Cutoffs Normally, you can specify a minimum and a maximum resolution cutoff when calculating an electron density map (Fig. 3.37). Remember that the map is being calculated using a Fourier synthesis and that this causes series termination errors for the higher-resolution terms left out. These series termination errors show up as ripples in the electron density with periods close to the highest resolution used in the map. The effect of leaving out low-resolution terms is to add low-period ripples that make the map look "choppy." Low-resolution terms are often left out of refinements and, thus, they often end up left out of map calculations. How much effect this has on the interpretability of the map depends upon how high the other limit is. For instance, a 4.0- to 3.0-A map is hard to interpret even with perfect phases, while a 4.0- to 2.0-A map is relatively straightforward. As a rule of thumb, a 3-A map should include data from 10- to 3-A.
Nonorthogonal Coordinates The main problem with nonorthogonal coordinates is making certain that the map and the model are in the same coordinate frame, as discussed previously Coordinate Systems (see Sec. 3.1). It is also common to discover after a model has been fit that the cell transformation is incorrectly specified for the refinement program and the only indication this has happened may
218
COMPUTATIONALTECHNIQUES
be a high R-factor. The XtalView system allows the user to specify a 3 x 3 matrix for the Cartesian-to-fractional coordinate transformation (for fractional-to-Cartesian the inverse matrix is computed), so that if another program is being used, the matrix from this other program can be entered. In XtalView the matrix used is the same as F R O D O and X P L O R in the cases tested so far.
Map Boundaries M a n y programs, such as F R O D O or O, can display only as much of the unit cell as is stored in the precalculated map file. This means that the user must decide beforehand on the map boundaries that will cover an entire molecule. A mini-map can be used to determine boundaries that will cover an entire molecule. In xfit there is no need for this because the program is smart enough to k n o w that the density at 1.1 is the same as at 0.1, and the maps always contain a full unit cell.
Combined Phase Coefficients Phase bias is a serious problem in the early stages of fitting and refinement. One way to avoid phase bias is to use only the MIR phases to calculate the maps; this guarantees that the maps are unbiased with respect to the model being fit. However, MIR phases are noisy and usually limited in resolution. Several methods have been developed for combining calculated model phases with experimental phases to allow information from both to be used. This approach can be used to increase the resolution, to allow the use of partial models, and to help reduce model bias. All methods rely on the difference between F,, and F~ to weight the a m o u n t of calculated phase contribution. Combined phase coefficients allow the inclusion of the low-resolution MIR phases, which are usually accurate, with calculated phases from higher resolutions. The MIR phases alone should be used in the resolution range infinity to 5.0 A, where model phases are inaccurate. At other resolutions both phases are used in a weighted manner. The phase combination program
FIG. 3.37 Effect of different resolution cutoffs. The maps use the same data but differ in the resolution limits used. Thick lines are the model used to calculate the phases; thin lines represent the electron density contoured at l~r. (A) 37-5.0, (B) 37-4.5, (C) 37-4.0, (D) 37-3.7, (E) 37-3.3, (F) 37-3.0, (G) 37-2.0. The turns of the helix become apparent at 3.7, and the carbonyl bulges are apparent at 3.0. However, these maps are made with refined phases, and starting maps will appear to have lower resolution because of phase errors. Map (H), 5.0-3.7, shows the deleterious effect of truncating the low-resolution data too severely. Compare the densities in (H) and (D).
A m
41'
m
C
D
FIG. 3.37 (continued). 220
FIG. 3.37 (continued). 221
G
FIG. 3.37 (continued).
3.12 Fittingof Maps
223
puts out a combined figure of merit that is used to weight the map, as in the MIR case. The combined maps are smoother, especially at resolutions between 3 and 2.5 A, than maps made from an incomplete model for phasing. The most common phase combination procedure is Bricogne's adaptation of Sim's weighting scheme. 38 Two-phase probability distributions are multiplied together by the following procedure. The phase probability for the MIR phases has been previously stored as the four Hendrickson-Lattman coefficients, AMIR, BMIR, CMIR, DMIR (see note 11). The phase probabilities for the calculated phases are calculated from the equation:
exp[2lFob~]lFcl] Pc(~b)
(F 2 - F2)cos(& - &c)'
(3.35)
where (F 2 - F 2) is the root-mean-square difference in intensities and is calculated in bins of resolution. It can be seen that when the R-factor is large, that is Fo and Fc do not agree, then the contribution from the calculated phase is lowered because the denominator will be large. As the R-factor decreases, the calculated phase contribution will increase. This probability is expressed in terms of A and B, where A is the cosine part of phase, B is sine part of the phase at the maximum value of the probability distribution and x / A 2 + B 2 = m F o , where m is the figure of merit. These are then added to A M~Rand B MIR with the relative weights set by WMI R and Wcalc; A
=
WMI R X
A MIR +
Wcalc X
A calc,
(3.36)
with a similar equation for B. The new phase is found by evaluating: Pj(~b)- exp(A(cos(b)& B(sin~b)& C(cos2~b) + D(sin2~b))
(3.37)
which gives a new most-probable phase and figure of merit using Eqs. 3.20 and 3.21. The success of this process can partially be judged by an increase in the figure of merit. The weights should be adjusted so that the new phases are, on average, between the MIR and the calculated phase. Low-resolution phases will be closer to the MIR phase, and the higher-resolution phase will be closer to the calculated phases. The swing point, where more of the calculated phase is included than the MIR phase, should be located between 4.0 and 3.5 A, based upon past experience. This seems to give maps that contain new information from the filtering effect of the calculated phases but is not overwhelmed by them so that no information from the MIR is left.
EvaluatingMap Quality An experienced crystallographer can usually quickly assess the quality of a map by inspection. It is especially important for new crystallographers 38Bricogne, G. (1976). Acta
Crystallogr.
A32, 832.
224
COMPUTATIONALTECHNIQUES
to try to learn this technique so that a lot of time is not wasted trying to fit poor maps.
Judging Electron Density First look at the map on a large scale, where both protein and solvent should be visible. There should be a large contrast difference between the solvent and the protein. If the map is contoured at 1or, there should be few long connected regions in the solvent region (look at several sections). The protein region of the map should have connected densities that are cleanly separated. The heights of these ridges should be consistent over the protein region. Excessive "peakiness" is a bad sign. It might help at this point to consider that a protein is a long polypeptide composed mostly of carbon, nitrogen, and oxygen, which all have about the same electron density. The exception is a small number of sulfur atoms. Any error in the phases will cause some volumes to have too little density, and another volume of the map will be correspondingly too high. With a practiced eye, you can quickly discern this. Another feature to look for in judging the quality of the map is the contrast between protein and solvent regions. For this purpose, a slab of electron density over a large area is needed (Fig. 3.38). There should be a clear difference in level between the protein and solvent. The solvent should comprise a few large areas of low-level peaks rarely rising above 1-2 times of the root-mean-square value (or) of the map.
Electron Density Histograms The previous statements can be restated more precisely. In a correctly phased map, the distribution of the densities will be that for a protein molecule, which is independent of its fold or space group, and will have a characteristic histogram. In fact, the histogram can be used to compute the probable amount of phase error by comparing the histogram of an unknown with that of correctly phased maps of known structures (Fig. 3.39). The histogram is dependent upon the resolution range used and also on the percentage solvent in the crystal. XtalView comes with a program, xedh (Fig. 3.40), that computes the histogram of a map and can be used to compare it with known histograms. This is easily enough done; except that, of course, there is a large gray area where a map may be interpretable in spite of the phase errors. However, the histogram comparison method gives an objective method of estimating phase error that, with experience, will give a guide to the "interpretability" of a set of phases. The histogram method requires no model, and it does not matter how the phases are derived.
3.12 Fittingof Maps
225
-0.456 i 0-094 ,
C)
out.fin
infile=infile.tmp outfile:outfile.tmp
# this complains # it w o r k s cat > $infile
about
an
inappropriate
operation
but
# o u t p u t of m t z 2 v a r i o u s is s a v e d in m t z 2 v a r i o u s . l o g mtz2various h k l i n $ i n f i l e h k l o u t $ o u t f i l e \ mtz2various, log labin I(+)=I(+) SIGI(+)=SIGI(+) I(-)=I(-) S I G I ( - ) = S I G I (-) OUTPUT USER ' (315,4F12.2) ' END eof #place outfile cat $outfile rm rm
on
stdout
$ infile $outfile
Using this as a guide, you can easily convert any other CCP4 file by changing the l a b i n line to the appropriate labels for your file. Of course you must have CCP4 executables installed on your computer and in your path for this to work. If you are just going to run it every once in a while, you can make a simpler script and load the output into xprepfin using the xtalmgr or on the command line. Here's a similar script that leaves the output in for_xprepfin.fin: # ! / b i n / c s h -f mtz2various hklin my_file.mtz hklout for_xprepfin.fin 1.5), and especially if the B's are anisotropic, the program cannot do an accurate structure factor calculation because it
324
XtalViewTUTORIALS
FIG. 4.38 The xfit SFCalc window is used to calculate structure factors from the model and can be used for making omit maps. Use the Shake option to reduce phase bias.
4.4 A TypicalManual FittingSessionwith Xfit
325
uses an isotropic reverse FFT algorithm. However, if you are at the point of making your B's anisotropic and are still wondering where side chains are located, you may want to rethink your fitting strategy!
Finding Geometry Errors The Error w i n d o w (Fig. 4.39) is used to find geometry errors in the model. Bring up the Error w i n d o w and click on Analyze Active Model. The geometry is analyzed and the results put into the error list so you can click through it. Also, the phi-psi plot is popped up. Residues with bad phi-psi's are marked on the phi-psi plot (and put in the error list). As you click on each error or click on Go To Next Error, the model is moved to center the residue with the error. When you click on Fit Error Res the residue is activated so you can repair it. Use Delete Error Res to send the residue to oblivion.
Editing Waters Two shortcut keys have been added specifically to make editing the water list faster, thus encouraging users to actually look at the waters. Note
FIG. 4.39 The Geometry Errors window is used to analyze the geometry of the model. Click on an error to go to it.
326
XtalViewTUTORIALS
that in the majority of cases editing the water list and removing bad waters will cause your R-free to drop even if there is a very small increase or no change in the overall R-factor. A bad water is one in which there is no density, the density is not shaped like a water, or the B-value is higher than, say, 80.0. For high-B-value waters with sensible density, you can try halving the occupancy. 9 Shift + W. Adds a water where the cursor is and at the center of the slab. The new water is in fitting mode, and its position can be adjusted using the middle mouse button. To quickly position the water, leave the Refine window up in a corner and, after you have added the water, click on the Translation button to real-space-refine the water into the center of the peak. Use a thin slab (5 A) to get the out-of-screen direction close enough for the translation to pull it in. Use the semicolon key to end the water fitting, or go on to the next one and use the semicolon key at the end of the water-adding sequence. 9 Shift + D. Deletes the last picked water/residue. Click on a water and then issue this command. If it's a water, the program deletes it. If it's a residue, the program asks for confirmation of the delete. You can undelete a water by going to the Model window, finding the deleted water, and deleting it again to toggle the DEL flag. 9 Automated water addition. First make a difference map. The resolution should be better than 2.3 A, and the map should show clear water peaks. Bring up the Waters... popup (Fig. 4.40) and set the minimum density to be the lowest you will accept for a water. Press Add Water and wait. The new waters are added to the Error p o p u p - - n o t because they are errors, but this gives you a quick way to navigate through the list of added waters to see if you agree with them. Be sure to save the model.
.....
4.5 . . . . .
INTERFACING TO OTHER PROGRAMS This section gives specific information for interfacing XtalView to popular software. Molscript. The xfit script commands Rotation and Translation can be used to set the viewpoint in Molscript. The translation is the inverse of Molscript's, but other than this simple change in sign, the commands can be cut and pasted into a Molscript control file. When a good view is found in xfit, use View/Make Script/Save and edit the resulting file to get the rotation and translation commands. Paste these into the Molscript file, reverse the translation, and run Molscript.
4.5 Interfacingto Other Programs
327
FIG. 4.40 Add water molecules with this window. Waters can also be renamed to match the nearest atom in another structure.
SHELXL. Using SHELX with XtalView is straightforward. Xprepfin writes out an h kl file and can be used to convert from various input formats. You can then m a r k this file for R-free tests using SHELXPRO. For the first refinement cycle use SHELXPRO with the I option to create the .ins file. After refinement read this and the .fcf file directly into xfit. After fitting, use SHELXPRO to update the .res file with the fit .pdb file to make the next .ins file. SHELXPRO has an XtalView c o m m a n d (X) to write a .phs file, but this is not needed with versions of XtalView after 3.1. Xfit reads and understands the ANISOU cards and thus keeps the anisotropic U's correct between cycles. D E N Z O . Xprepfin can prepare input from the scalepack output .SCA file by choosing Other as the input file type and D E N Z O I's from the Other menu. CCP4. The preferred way to use XtalView with CCP4 map files is to use the phase files as input and not m a p files. This will be faster, as well as saving disk space and allowing you to make omit maps on the fly. Starting with XtalView 4.0, xfit can read the structure factors directly from .mtz files as long as you use the standard CCP4 labels for FO, F C A L C and PHIB.
328
XtalViewTUTORIALS
However, there is a CCP4 map converter available from CCMS that was kindly provided by John Irwin. Send e-mail to c c m s - h e l p @ s d s c . e d u and ask. You can also convert CCP4 map files by FFTing them into structure factors and reading this into xfit. You can convert .mtz files with xprepfin or by means of the CCP4 program mtz2various. Several XtalView programs can write .mtz files, including xprepfin, xheavy, and xfit. PHASES. Xheavy writes an input file for PHASES that will get you most of the way to running PHASES. Look at the menu on the menu button for saving a phase file. In this way you can use xheavy for the heavy-atom location and refinement and then switch to PHASES. There is not much difference in the actual phases produced. XPLOR. Xprepfin can be used to generate an input file for XPLOR Fo~s data. If your native data has Bijvoet pairs, be sure to use the Average F1 and F2 option in xprepfin. You can switch the segment ID with the chain ID in xfit by using the options on the Files... window. Since the chain ID field in a PDB file is only one character, the first character in the seg ID is used. This allows one to get around the fact that XPLOR loses the chain ID.
XPLOR,TNT, PROLSQ,and OtherRefinementPrograms To make maps, first prepare your native data into an empty phase file with xprepfin by reading it in and setting the Fake Phs output option. This creates a file with 0.0 for the phase. N o w run xfit, loading the empty phase file and your latest refined PDB file. The FFT window will pop up, but don't hit the Apply button. Instead just set the map type you want (e.g., 2 m F o - DF~); then go to the SfCalc window and choose the Calculate All and Scale button. Xfit will calculate F~ and the phase, scale F,, to F~ (to put on an absolute scale), and then FFT your density. If you want to look at an Fo - F~ map as a second map, just reload the phases with the File window and repeat the procedure, setting the FFT type to F o - Fc.
5 PROTEIN CRYSTALLOGRAPHY COOKBOOK
.....
5.1 . . . . .
MULTIPLE I S O M O R P H O U S R E P L A C E M E N T
Multiple isomorphous replacement (MIR) is the oldest method of phasing proteins and is still very successful. The basic method has changed little since myoglobin was solved, although the detailed implementation has. The basic cycle of MIR phasing is diagrammed in Fig. 5.1. 1. Soak the crystals in heavy-atom solutions to scan for possible derivatives. Crystals that survive the soaking are tested to see if they still diffract. Those that do are scanned to see if they produce any changes in X-ray intensity. 2. If intensity changes are observed, a data set is collected. The first data should be collected quickly. If possible, a nearly complete data set to at least 5 A should be collected within 24 h or even faster. Many heavy-atom derivatives are unstable in the X-ray beam, and either the crystals quickly degrade or they change with time. Often the best data are from the first run, and even though the crystal still diffracts strongly, later data are found to have significantly lower phasing power. Continue collecting data if the crystal has not degraded. Frozen crystals are more stable (see Chapter 6), but freezing can also change the unit cell leading to significant nonisomorphism. 329
330
PROTEINCRYSTALLOGRAPHYCOOKBOOK l Soak
Crystal in Heavy .
.
.
.
.
.
.
.
.
.
.
.
Atom
Collect partial data
Solution 1 !
set
Evaluate differences: above 10%?~ . ~ Toss Finish Data
collection
!
Merge and Scale with native
J Evaluate s!atistics: Isomorphous?~Tossl--P If Phases available
Cross Fourier or else Make Patterson
I PattersonSolvable? ~ J Refine and Compute
Save ~
SIR phases I
Cross Fourier other derivatives to solve and put on same origin Refine solutions and Compute MIR phases
Map interpretable?
I No I
FIG. 5.1 Heavy-atom phasing scheme.
3. Evaluate the heavy-atom statistics. By looking at the statistics of intensity changes it is possible to tell whether the crystal is likely to be a good derivative. The overall percentage difference should be above 10-12 %. While you may solve and refine a derivative with weaker changes, it will have little phasing power; try resoaking to see if you can raise the percentage difference with longer soaks and/or higher soaking concentrations. The centric data should have larger differences than the acentric zones. The root-meansquare magnitude of the differences should fall off with resolution in roughly the same proportion as the scattering factor for the heavy atom (including the temperature factor). If the differences do not fall off, this indicates noise or nonisomorphism. 4. Solve for the positions of heavy atoms. It is necessary to solve at
5.1 MultipleIsomorphousReplacement
331
least one derivative's Patterson map. This may not be the Patterson map of the first derivative you find. After solving one Patterson map, you can solve the others by cross-phasing with the single isomorphous replacement (SIR) phases of the first. This also puts all the heavy atoms on the same origin. 5. Refine each heavy-atom solution. Look for additional sites. Be conservative about adding sites at this point. 6. With two or more derivatives you can co-refine the derivatives to improve the phases (if the derivatives share common sites this should be done cautiously). This gives you the first set of protein phases. 7. Make difference Fourier transforms of each derivative, preferably leaving this derivative out of the protein phase calculation to reduce bias. Look for new heavy-atom sites and confirm old ones. 8. Refine the updated solutions and co-refine to produce better protein phases. 9. Reiterate if necessary. 10. Calculate the protein electron density map and evaluate its quality. If the map is difficult to interpret, go back to step I and look for more derivatives or work to improve the ones that you already have. Be objectivemyou will not divine the structure from poor protein phases without considerable luck.
Example 1"Patterson from Endonuclease III Escherichia coli endonuclease III (Table 5.1) was solved at the Scripps Research Institute by MIR techniques. 1 The solution of a single-site derivative Patterson map is presented here. Endonuclease III crystals were soaked in thiomersal, an organomercurial, at I mM, for 2 days. Data were collected on an area detector and then merged with the native data. The isomorphous difference Patterson is shown in Fig. 5.2. The symmetry of the Patterson is m m m , orthogonal mirrors at 0 and 1/2 in all three directions. In space group P212121, there are three Harker planes arising from the three 21 screw axes (Table 5.1) at x = 1/2, y = 1/2, and z = 1/2. A heavy-atom site will give rise to three unique self-vectors on these Harker sections (plus all the peaks related by Patterson symmetry). Note in Table 5.1 that if there is a vector at 2x on one section it will be at 1/2 - 2x on the other. Thus, if we line up the Harker sections so that the common axes run in opposite directions and 0.0 is opposite 0.5, then the self-vectors on the two sections will line up (Fig. 5.2). On the Harker sections for this derivative there are just three
1Kuo, C. F, McRee, D. E., Fisher, C. L., O'Handley, S. F., Cunningham, R. P., and Tainer, J. A. (1992). Science 258,434-440.
332
PROTEIN CRYSTALLOGRAPHY COOKBOOK 0.0000
Y
0.5000
._~~ ~
y
~~-
Y=
L:H?. :
O. 5 0 0 0
~
i P
II