Protein NMR Spectroscopy: Practical Techniques and Applications
Protein NMR Spectroscopy: Practical Techniques and App...
70 downloads
1068 Views
69MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Protein NMR Spectroscopy: Practical Techniques and Applications
Protein NMR Spectroscopy: Practical Techniques and Applications Edited by Lu-Yun Lian NMR Centre for Structural Biology, Institute of Integrative Biology, The University of Liverpool, Liverpool, UK Gordon Roberts Henry Wellcome Laboratories of Structural Biology Department of Biochemistry, University of Leicester, Leicester, UK
This edition first published 2011 Ó 2011 John Wiley & Sons Ltd Registered office John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com. The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books. Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought. The publisher and the author make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of fitness for a particular purpose. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for every situation. In view of ongoing research, equipment modifications, changes in governmental regulations, and the constant flow of information relating to the use of experimental reagents, equipment, and devices, the reader is urged to review and evaluate the information provided in the package insert or instructions for each chemical, piece of equipment, reagent, or device for, among other things, any changes in the instructions or indication of usage and for added warnings and precautions. The fact that an organization or Website is referred to in this work as a citation and/or a potential source of further information does not mean that the author or the publisher endorses the information the organization or Website may provide or recommendations it may make. Further, readers should be aware that Internet Websites listed in this work may have changed or disappeared between when this work was written and when it is read. No warranty may be created or extended by any promotional statements for this work. Neither the publisher nor the author shall be liable for any damages arising herefrom. Library of Congress Cataloging-in-Publication Data Protein NMR spectroscopy : practical techniques and applications / edited by Lu-Yun Lian, Gordon Roberts. p. cm. Includes bibliographical references and index. ISBN 978-0-470-72193-3 1. Proteins–Analysis. 2. Nuclear magnetic resonance spectroscopy. I. Lian, Lu-Yun. II. Roberts, G. C. K. (Gordon Carl Kenmure) QP551.P69725 2011 5470 .7–dc22 2011010948 A catalogue record for this book is available from the British Library. Print ISBN: 9780470721933 ePDF ISBN: 9781119972013 oBook ISBN: 9781119972006 ePub ISBN: 9781119972822 Mobi: 9781119972884 Set in 10/12 pt Times by Thomson Digital, Noida, India
Contents List of Contributors
1
Introduction Lu-Yun Lian and Gordon Roberts
1
References
4
Sample Preparation, Data Collection and Processing Frederick W. Muskett
5
1.1 1.2
2
xiii
Introduction Sample Preparation 1.2.1 Initial Considerations 1.2.2 Additives 1.2.3 Sample Conditions 1.2.4 Special Cases 1.2.5 NMR Sample Tubes 1.2.5.1 3 mm Tubes 1.3 Data Collection 1.3.1 Locking 1.3.2 Tuning 1.3.3 Shimming 1.3.4 Calibrating Pulses 1.3.5 Acquisition Parameters 1.3.6 Fast Acquisition Methods 1.4 Data Processing References
5 5 6 7 7 8 9 10 11 11 11 12 13 14 16 17 20
Isotope Labelling Mitsuhiro Takeda and Masatsune Kainosho
23
2.1 2.2
Introduction Production Methods for Isotopically Labelled Proteins 2.2.1 Recombinant Protein Expression in Living Organisms 2.2.1.1 Escherichia coli 2.2.1.2 Yeast Cells 2.2.1.3 Other Host Cells
23 24 24 24 25 25
vi
Contents
2.2.2 Cell-Free Synthesis Protocol 1: Preparation of the Amino Acid Free S30 Extract Protocol 2: Cell-Free Reaction on a Small Scale 2.3 Uniform Isotope Labelling of Proteins 2.3.1 Uniform 15N Labelling 2.3.2 Uniform 13C, 15N Labelling 2.3.3 2H Labelling 2.4 Selective Isotope Labelling of Proteins 2.4.1 Amino Acid Type-Selective Labelling 2.4.2 Reverse Labelling 2.4.3 Stereo-Selective Labelling 2.5 Segmental Labelling 2.6 SAIL Methods 2.6.1 Concept of SAIL 2.6.2 Practical Procedure for the SAIL Method Protocol 3: Production of SAIL Proteins by the E. coli Cell-Free Method 2.6.3 Residue-Selective SAIL Method Protocol 4: Optimisation of the Amount of SAIL Amino Acids for the Production of Calmodulin Selectively Labelled by SAIL Phenylalanine 2.7 Concluding Remarks Acknowledgements References 3
Resonance Assignments Lu-Yun Lian and Igor L. Barsukov 3.1 3.2
Introduction Resonance Assignment of Unlabelled Proteins 3.2.1 Spin System Assignments 3.2.2 Sequence-Specific Assignments 3.2.3 Possible Difficulties 3.3 15 N-Edited Experiments 3.4 Triple Resonance 3.4.1 3D Triple Resonance 3.4.1.1 Identification of Spin Systems 3.4.1.2 Sequential Assignment 3.4.1.3 Proline Residues 3.4.2 4D Triple Resonance 3.4.3 Computer-Assisted Backbone Assignments 3.4.4 Unstructured Proteins 3.4.5 Large Proteins 3.5 Side-Chain Assignments References
25 26 28 29 29 30 30 32 32 34 36 37 38 38 41 41 42
45 45 46 46 55 55 56 57 59 60 60 62 62 64 68 74 74 76 76 77 77 81
Contents
4
Measurement of Structural Restraints Geerten W. Vuister, Nico Tjandra, Yang Shen, Alex Grishaev and Stephan Grzesiek 4.1 4.2
4.3
4.4
4.5
Introduction NOE-Based Distance Restraints 4.2.1 Physical Background 4.2.2 NMR Experiments for Measuring the NOE 4.2.3 Set-up of NOESY Experiments 4.2.3.1 Estimation of T2s Recipe 4.1: 1–1 Echo Experiment Recipe 4.2: Set-up of Optimal Acquisition Times Recipe 4.3: Set-up of a 3D 15N-Edited NOESY Experiment (Figure 4.2a) Recipe 4.4: Set-up of a 3D 13C-Edited NOESY Experiment 4.2.4 Deriving Structural Information from NOE Cross-peaks Recipe 4.5: Extraction of Distances Using Classes Recipe 4.6: Extraction of Distances Using the Two-Spin Approximation 4.2.5 Information Content of NOE Restraints Dihedral Restraints Derived from J-Couplings 4.3.1 Physical Background 4.3.2 NMR Experiments for Measuring J-Couplings Recipe 4.7: E.COSY Experiment Recipe 4.8: Quantitative J-Correlation 4.3.3 Deriving Structural Information from J-Couplings Hydrogen Bond Restraints 4.4.1 NMR H-Bond Observables 4.4.2 Detection of NH O¼C H-Bonds in Proteins Recipe 4.9: Setting up a Long-Range HNCO Experiment for H-Bond Detection Orientational Restraints 4.5.1 Physical Background 4.5.1.1 Dipolar Couplings in Anisotropic Solution 4.5.1.2 The Alignment Tensor 4.5.1.3 Chemical Shifts in Anisotropic Solution 4.5.2 Alignment Methods 4.5.2.1 Intrinsic Molecular Alignment 4.5.2.2 Indirect Alignment by External Media 4.5.3 Measurements and Data Analysis 4.5.4 Determination of the Alignment Tensor 4.5.4.1 Degeneracy of Solutions 4.5.4.2 Prediction of the Alignment Tensor from the Structure 4.5.5 RDCs in Structure Validation 4.5.5.1 Q-Factor
vii
83
83 84 84 86 87 87 88 89 90 91 92 95 95 95 96 96 97 98 100 102 103 103 104 106 107 108 108 109 111 112 112 113 116 118 121 121 122 122
viii
Contents
4.5.5.2 Using RDC Values for Database Screening RDCs in Structure Determination 4.5.6.1 Structure Refinement 4.5.6.2 Domain Orientation 4.5.6.3 De Novo Structure Determination 4.5.7 Conclusion 4.6 Chemical Shift Structural Restraints 4.6.1 Origin of Chemical Shifts and Its Relation to Protein Structure 4.6.2 Obtaining Chemical Shifts 4.6.3 Backbone Dihedral Angle Restraints from Chemical Shifts (TALOS) Recipe 4.10: Using the TALOS þ Program (for details see http://spin.niddk.nih.gov/bax/software/TALOS þ /) 4.6.4 Protein Structure Determination from Chemical Shifts (CS-Rosetta) Recipe 4.11: CS-Rosetta Structure Calculation 4.7 Solution Scattering Restraints 4.7.1 Physical Background 4.7.2 Shape Reconstructions from Solution Scattering Data 4.7.3 Use of SAXS in High-Resolution Structure Determination 4.7.4 Sample Preparation 4.7.5 Data Collection 4.7.6 Data Processing and Initial Analysis Acknowledgement References
122 122 122 125 128 129 129 129 131
Calculation of Structures from NMR Restraints Peter Guntert
159
4.5.6
5
5.1 5.2 5.3
Introduction Historical Development Structure Calculation Algorithms 5.3.1 Molecular Dynamics Simulation versus NMR Structure Calculation 5.3.2 Potential Energy – Target Function 5.3.3 Torsion Angle Dynamics 5.3.3.1 Tree Structure 5.3.3.2 Kinetic Energy 5.3.3.3 Forces ¼ Torques ¼ Gradient of the Target Function 5.3.3.4 Equations of Motion 5.3.3.5 Torsional Accelerations 5.3.3.6 Time Step 5.3.4 Simulated Annealing Protocol for Simulated Annealing 5.4 Automated NOE Assignment 5.4.1 Ambiguity of Chemical Shift Based NOESY Assignment
132 132 134 136 137 137 139 140 141 142 145 147 147
159 161 164 164 165 166 167 167 169 169 170 171 172 172 173 174
Contents
5.4.2 5.4.3
Ambiguous Distance Restraints Combined Automated NOE Assignment and Structure Calculation with CYANA 5.4.4 Network-Anchoring 5.4.5 Constraint Combination 5.4.6 Structure Calculation Cycles 5.5 Nonclassical Approaches 5.5.1 Assignment-Free Methods 5.5.2 Methods Based on Residual Dipolar Couplings 5.5.3 Chemical Shift-Based Structure Determination 5.6 Fully Automated Structure Analysis References 6
Paramagnetic Tools in Protein NMR Peter H.J. Keizers and Marcellus Ubbink 6.1 6.2
Introduction Types of Restraints 6.2.1 Paramagnetic Dipolar Relaxation Enhancement 6.2.2 Other Types of Relaxation 6.2.3 Residual Dipolar Couplings 6.2.4 Contact and Pseudocontact Shifts 6.3 What Metals to Use? 6.4 Paramagnetic Probes 6.4.1 Substitution of Metals 6.4.2 Free Probes 6.4.3 Nitroxide Labels 6.4.4 Metal Binding Peptides 6.4.5 Synthetic Metal Chelating Tags Protocol for the Application of Paramagnetic NMR on Diamagnetic Proteins 6.5 Examples 6.5.1 Structure Determination of Paramagnetic Proteins 6.5.2 Structure Determination Using Artificial Paramagnets 6.5.3 Structures of Protein Complexes 6.5.4 Studying Dynamics with Paramagnetism 6.6 Conclusions and Perspective References 7
Structural and Dynamic Information on Ligand Binding Gordon Roberts 7.1 7.2
Introduction Fundamentals of Exchange Effects on NMR Spectra 7.2.1 Definitions 7.2.2 Lineshape 7.2.3 Identification of the Exchange Regime
ix
175 175 177 177 177 178 178 179 180 181 185 193 193 194 194 197 197 199 200 203 203 204 204 205 206 207 209 209 209 210 211 212 213 221 221 222 222 225 227
x
Contents
7.3
Measurement of Equilibrium and Rate Constants 7.3.1 Lineshape Analysis 7.3.1.1 Slow Exchange 7.3.1.2 Fast Exchange 7.3.2 Magnetisation Transfer Experiments 7.3.2.1 Saturation Transfer 7.3.2.2 Inversion Transfer 7.3.2.3 Two-Dimensional Exchange Spectroscopy 7.3.3 Relaxation Dispersion Experiments 7.4 Detecting Binding – NMR Screening 7.4.1 Detecting Binding by Changes in Rotational and Translational Mobility of the Ligand 7.4.2 Detecting Binding by Magnetisation Transfer 7.4.2.1 Saturation Transfer Difference (STD) Spectroscopy 7.4.2.2 Water-LOGSY 7.5 Mechanistic Information 7.5.1 Problems of Fast Exchange 7.5.2 Identification of Kinetic Mechanisms 7.5.2.1 Slow Exchange 7.5.2.2 Fast Exchange 7.6 Structural Information 7.6.1 Ligand Conformation – the Transferred NOE 7.6.1.1 Exchange Rate 7.6.1.2 Contributions from Other Species 7.6.1.3 Spin Diffusion 7.6.1.4 Structure Calculation 7.6.2 Interligand Transferred NOEs 7.6.2.1 Two Ligands Bound Simultaneously 7.6.2.2 Competitive Ligands – INPHARMA 7.6.3 Ligand Conformation – Transferred Cross-Correlated Relaxation 7.6.4 Chemical Shift Mapping – Location of the Binding Site 7.6.5 Paramagnetic Relaxation Experiments 7.6.6 Isotope-Filtered and -Edited Experiments References 8
Macromolecular Complexes Paul C. Driscoll 8.1 Introduction 8.2 Spectral Simplification through Differential Isotope Labelling 8.3 Basic NMR Characterisation of Complexes Protocol for Protein–Protein Titrations 8.4 3D Structure Determination of Macromolecular Protein–Ligand Complexes 8.4.1 NOEs
229 229 229 230 231 233 233 233 235 238 239 240 240 241 241 242 242 243 243 246 246 248 249 250 251 251 252 252 253 253 254 256 259 269 269 270 273 273 277 277
Contents
8.4.2 Saturation Transfer 8.4.3 Residual Dipolar Couplings 8.4.4 Paramagnetic Relaxation Enhancements 8.4.5 Pseudo-Contact Shifts 8.4.6 Data-Driven Docking 8.4.7 Small Angle X-Ray Scattering (SAXS) 8.5 Literature Examples 8.5.1 Protein–Protein Interactions 8.5.2 Protein–DNA Interactions 8.5.3 Protein–RNA Interaction 8.5.3.1 Protein–dsRNA 8.5.3.2 Protein–ssRNA References 9
Studying Partially Folded and Intrinsically Disordered Proteins Using NMR Residual Dipolar Couplings Malene Ringkjøbing Jensen, Valery Ozenne, Loic Salmon, Gabrielle Nodet, Phineus Markwick, Pau Bernado´ and Martin Blackledge 9.1 9.2 9.3 9.4
Introduction Ensemble Descriptions of Unfolded Proteins Experimental Techniques for the Characterisation of IDPs NMR Spectroscopy of Intrinsically Disordered Proteins 9.4.1 Chemical Shifts 9.4.2 Scalar Couplings 9.4.3 Nuclear Overhauser Enhancements 9.4.4 Paramagnetic Relaxation Enhancements 9.4.5 Residual Dipolar Couplings 9.5 Residual Dipolar Couplings 9.5.1 Interpretation of RDCs in Disordered Proteins 9.5.2 RDCs in Highly Flexible Systems: Explicit Ensemble Models 9.5.3 RDCs to Detect Deviation from Random Coil Behaviour in IDPs 9.5.4 Multiple RDCs Increase the Accuracy of Determination of Local Conformational Propensity 9.5.5 Quantitative Analysis of Local Conformational Propensities from RDCs 9.5.6 Conformational Sampling in the Disordered Transactivation Domain of p53 9.6 Conclusions References Index
xi
282 286 289 291 293 296 297 297 301 303 303 305 310
319
319 320 320 321 321 322 322 322 323 323 324 327 329 333 335 339 340 340 347
List of Contributors Igor Barsukov, NMR Centre for Structural Biology, The University of Liverpool, School of Biological Sciences, Biosciences Building, Crown Street, Liverpool L69 7ZB, United Kingdom Pau Bernado´, Institute for Research in Biomedicine, c/Baldiri Reixac 10, 08028Barcelona, Spain Martin Blackledge, Institut de Biologie Structurale, UMR 5075 CEA-CNRS-UJF, 41 Rue Jules Horowitz, Grenoble 38027, France Paul Driscoll, Division of Molecular Structure, MRC National Institute for Medical Research, The Ridgeway, Mill Hill, London NW7 1AA, United Kingdom Alex Grishaev, Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, 5 Memorial Drive, Bethesda, MD 20892 Stephan Grzesiek, Biozentrum, University of Basel, Klingelbergstrasse 50/70, CH-4056 Basel, Switzerland Peter G€ untert, Institut f€ ur Biophysikalische Chemie, BMRZ, J.W. Goethe-Universit€at Max-von-Laue-Str. 9, 60438 Frankfurt am Main, Germany Malene Ringkjøbing Jensen, Institut de Biologie Structurale, UMR 5075 CEA-CNRSUJF, 41 Rue Jules Horowitz, Grenoble 38027, France Masatsune Kainosho, Center for Structural Biology, Graduate School of Science, Nagoya University, Chikusa Ku, Furo Cho, Nagoya, Aichi 4648602, Japan Peter H.J. Keizers, Leiden Institute of Chemistry, Leiden University, Gorlaeus Laboratories, P.O. Box 9502, 2300 RA Leiden, The Netherlands Lu-Yun Lian, NMR Centre for Structural Biology, The University of Liverpool, School of Biological Sciences, Biosciences Building, Crown Street, Liverpool L69 7ZB, United Kingdom Phineus Markwick, Howard Hughes Medical Institute, 9500 Gilman Drive, La Jolla, California 92093-0378, USA
xiv
List of Contributors
Frederick W. Muskett, Henry Wellcome Laboratories of Structural Biology, Department of Biochemistry, University of Leicester, PO Box 138, Lancaster Road, Leicester LE1 9HN, United Kingdom Gabrielle Nodet, Institut de Biologie Structurale, UMR 5075 CEA-CNRS-UJF, 41 Rue Jules Horowitz, Grenoble 38027, France Valery Ozenne, Institut de Biologie Structurale, UMR 5075 CEA-CNRS-UJF, 41 Rue Jules Horowitz, Grenoble 38027, France Gordon Roberts, Henry Wellcome Laboratories of Structural Biology, Department of Biochemistry, University of Leicester, PO Box 138, Lancaster Road, Leicester LE1 9HN, United Kingdom Loic Salmon, Institut de Biologie Structurale, UMR 5075 CEA-CNRS-UJF, 41 Rue Jules Horowitz, Grenoble 38027, France Yang Shen, Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, 5 Memorial Drive, Bethesda, MD 20892 Mitsuhiro Takeda, Center for Structural Biology, Graduate School of Science, Nagoya University, Chikusa Ku, Furo Cho, Nagoya, Aichi 4648602, Japan Nico Tjandra, Laboratory of Molecular Biophysics, National Heart, Lung, and Blood Institute, National Institutes of Health, 50 South Drive, Bethesda, Maryland 20892, USA Marcellus Ubbink, Leiden Institute of Chemistry, Leiden University, Gorlaeus Laboratories, PO Box 9502, 2300 RA Leiden, The Netherlands Geerten Vuister, Protein Biophysics, Institute of Molecules and Materials, Radboud University Nijmegen, PO Box 9101, 6500 HB Nijmegen, The Netherlands; and Henry Wellcome Laboratories of Structural Biology, Department of Biochemistry, University of Leicester, PO Box 138, Lancaster Road, Leicester LE1 9HN, United Kingdom
(a)
Chemical Shifts C , C , C', H , H , N
ANN predicted ϕ/ψ distribution
Tri-peptide (i-1,i,i+1)
Calculate similarity (chemical shifts & sequence)
chemical shifts sequence
matched tri-peptides ( j-1,j,j+1)k, k=1-10
Tri-peptide Database ( j-1,j,j+1) {chemical shifts, Sequence, ϕ and ψ angles}
Predefined
(ϕ j,ψj)k
consistent?
YES
average (ϕ,ψ) & std .dev.
Predicted (ϕi ,ψi )
(b)
Figure 4.21 (a) Flowchart of TALOS þ method. (b) Graphic TALOS þ inspection interface for TolR protein. (See details at TALOS þ webpage http://spin.niddk.nih.gov/bax/software/ TALOS þ /)
(a)
chemical
Chemical Shifts
shifts 1. MFR Fragment Selection (sequence)
Fragment Candidates
Structural Database Calculate chemical shifts (SPARTA) Predefined
2. Rosetta Fragment Assembly All-atom Models
3. Chemical Shift Score Calculation
Rescored Models
(b)
Converged?
(c) Rosetta Energy
Rosetta Energy
-120 -140 -160 -180 0
2
4
6
8 10 12
(d)
Predicted Structure
30 0 -30 -60 -90 0
Cα RMSD to TolR
YES
2
4
6
8 10 12
Cα RMSD to TolR
(e)
Rosetta Energy
30 0 -30 -60 -90 0
4 6 8 10 12 Cα RMSD to Lowest Energy Model 2
Figure 4.22 (a) Flowchart of CS-Rosetta protocol. (b–e) CS-Rosetta structure generation of TolR protein. (b) Plot of Rosetta full-atom energy versus C a rmsd relative to the experimental TolR monomer structure for all CS-Rosetta models. (c) as (b) but showing the Rosetta full-atom energy rescored (augmented) by the chemical shift deviation (x 2) energy. (d) as (c) showing the Rosetta full-atom energy rescored by the chemical shift deviation (x2) energy versus the C a rmsd from the lowest energy CS-Rosetta model. (e) Backbone ribbon representation of 10 CS-Rosetta models with lowest energy (dark grey) superimposed on the experimental monomer NMR structure (light grey) for TolR
Figure 5.7 Structures obtained by fully automated structure determination with the FLYA algorithm (blue) superimposed on the corresponding NMR structures determined by conventional methods (dark red). (a) ENTH domain At3g16270(9–135) from Arabidopsis thaliana [147]. (b) Rhodanese homology domain At4g01050(175–295) from Arabidopsis thaliana [148]. (c) Src homology domain 2 (SH2) from the human feline sarcoma oncogene Fes [143]
Figure 6.5 The complex of nitrite reductase (NiR) and pseudoazurin (Paz). NiR is shown in spacefill, with its subunits in blue, pink and green. The best twenty Paz orientations are shown as Ca traces. The Paz copper atoms are shown as green spheres and the positions of the Gd3 þ ions in the CLaNP molecules as orange spheres. Reprinted from Vlasie et al. [96], Copyright (2008) with permission from Elsevier
Figure 6.6 The ensemble of complex structures of adrenodoxin (Adx) and cytochrome c (Cc) from a PCS-based simulation illustrates the degree of dynamics between the two proteins. Adx is shown in a surface representation coloured according to the electrostatic potential with red for negative and blue for positive; the Fe2S2 binding loop is in yellow. The centres of mass of Cc are represented by green spheres. Reprinted with permission from Xu et al. [125]. Copyright 2008, American Chemical Society
Figure 7.4 Relaxation dispersion. (A) Schematic representation of signal dephasing during CPMG pulse trains based on the analogy to the runners described in the text, where the y axis plots the distance of the runners from the starting position. A blue or red line indicates a spin in the major or minor state, respectively. Dashed lines correspond to spins experiencing at least one conformational transition, whereas the solid lines correspond to no transitions. Reproduced by permission from Mittermaier and Kay, Science (2006) 312, 224–228. (B) 15N CPMG relaxation dispersion profiles obtained at 500 MHz (blue, lower) and 800 MHz (red, upper) proton Larmor frequencies for Leu7 of the Fyn SH3 domain partially saturated with its ligand, a 12-residue peptide. Data are shown for 20, 30, 35, 40, 45, and 50 C (a)–(f), illustrating the effects on the relaxation dispersion of the changes in koff, from 11.7 s1 at 20 C to 331 s1 at 50 C. Best-fit curves were generated using a single value of pB optimised for all temperatures, koff values fit globally, and a Dv value taken from HSQC spectra of the free and peptide-saturated states. Reproduced by permission from Demers and Mittermaier, J. Amer. Chem. Soc. (2009) 131, 4355–4367
Figure 8.6 Top: best-fit superposition of the backbone atoms of the 40 simulated annealing structures of the N-terminal domain of enzyme I (EIN) complexed to the histidine-containing phosphocarrier protein (HPr). Bottom: ribbon diagrams illustrating two views of the 40 kDa EINHPr complex. HPr is shown in green, the a-domain of EIN in red, and the a/b-domain and C-terminal helix of EIN in blue. Also shown in gold are the side-chains of active site histidine residues of both EIN and HPr. Image taken from [59]
Figure 8.7 NMR data for the titration of insulin-like growth factor-2 (IGF2) receptor (IGF2R) domain 11 with IGF2 as reported by Williams et al. [148]. Left: 2D 15N,1H-correlation spectra of IGF2R domain 11 in the absence (black) and presence (red) of IGF2. The insert panel shows an expanded view of the boxed region. Right: the pattern of IGF2binding-dependent chemical shift perturbations for IGF2R domain 11 mapped on a molecular surface representation. Residues with shift perturbations H 0.05 ppm are red, and residues with shift perturbations G0.05 ppm are orange. Blue indicates NH resonances that broaden and disappear (i.e. are ‘bleached’) upon IGF2 binding (cf. Figure 1.2d); grey indicates little or no change in chemical shift upon binding. Image taken from [148]
Figure 8.8 Structural models of IGF2R-D11-IGF2 complex generated using HADDOCK. The two lowest energy structures in each of the candidate clusters are shown, with IGFR2-D11 depicted in surface mode, the IGF2 backbone in ribbon mode and selected side-chains as sticks. The core of the IGF2 binding site is coloured blue, and the side-chain of E1544, which is known to negatively regulate IGF2 binding, is drawn in red. The orientation of IFG2 differs by approximately 20 between the two models. Image taken from [148]
Figure 8.10 NMR-derived structure of the complex formed between the Rnt1p RNAse III dsRBD protein and snR47h AGNN tetraloop hairpin RNA determined by Wu et al. [155]. Left: best-fit superposition of the 15 lowest energy NMR models with the protein shown in blue and RNA in green. Right: schematic representation of the lowest energy structure in the bundle with the RNA helical backbone indicated by thin blue cylinder, the RNA atoms shown in stick form and the protein as ribbons with residues populating the protein-RNA interface shown as ball and sticks. Image adapted from [155]
Figure 8.11 Representations of the molecular components of the structure of the complex between Rous sarcoma virus (RSV) mY packaging signal RNA and Zn-binding RSV nucleocapsid (NC) protein reported by Zhou et al. [164]. The predicted secondary structure elements of the RNA and the coordination of the Zn atoms is shown. Nonnative nucleotides used to enable in vitro transcription and protease cleavage of the expressed fusion protein are depicted in red and grey respectively. Image taken from [164]
Figure 8.12 Top: two sample regions of the 3D 13C-edited NOESY-HMQC spectrum recorded for double 13C,15N isotope-labelled Rous sarcoma virus (RSV) nucleocapsid (NC) protein bound to unlabelled RSV mY packaging signal RNA investigated by Zhou et al. [164]. The crosspeaks correspond to intermolecular NOE contacts associated with residues Arg16 and Ala32 of the NC N-terminal Zn-knuckle, respectively. Bottom: Overlay of the 2D 1HNOESY spectra obtained for specifically protonated GH-mY (black) and UH-mY (red) bound to NC showing intermolecular NOE crosspeaks connecting the stem-loop C (SL-C) tetraloop RNA residues U217, G218 and G220 to the NC N-terminal Zn-knuckle residues Tyr22 and Tyr30. Image taken from [164]
Figure 8.13 (a) Rendering of the 20 NMR-derived structures of the NC:mY complex showing the relative convergence of the secondary structure elements, obtained by best-fit superposition of the SL-C stem carbon atoms. The result shows that the relative positions of SL-B (green), SL-C (brown), O3 (red), the linkers (orange), and the NC Zn-knuckles (blue) are well defined by the NMR data, but the position of SL-A (purple) is not; (b) and (c) show two different stereo views of a representative structure, showing the relative positions of the NC and mY secondary structure elements. Image taken from [164]
Figure 9.3 Residual dipolar couplings (1DNH and 2DCNH) from the two-domain protein, PX, from Sendai virus. (a) 1DNH and 2DC 0 NH RDCs are well reproduced from throughout the protein using flexible meccano (black). Experimental values are shown in grey. (b) Ensemble representation of PX. Reprinted from Malene Ringkjøbing Jensen, Phineus R.L. Markwick, Sebastian Meier, Christian Griesinger, Markus Zweckstetter, Stephan Grzesiek, Pau Bernado´, and Martin Blackledge, Quantitative determination of the conformational properties of partially folded and intrinsically disordered proteins using NMR dipolar couplings, Structure, Vol. 17(9): 1169–1185, Copyright 2009 with permission from Elsevier
Introduction Lu-Yun Lian and Gordon Roberts
The nuclear magnetic resonance (NMR) method is one of the principal techniques used to obtain physical, chemical, electronic and three-dimensional structural information about molecules in solution, whether small molecules, proteins, nucleic acids, or carbohydrates. NMR is a physical phenomenon based upon the magnetic properties of certain atomic nuclei. When exposed to a very strong magnetic field (2–21.1 Tesla) these nuclei align with this field. During an NMR experiment, the alignment is perturbed using a radiofrequency signal (typically a few hundred megahertz). When the radio transmitter is turned off, the nucleus returns to equilibrium and in the process re-emits radio waves. The usefulness of this technique in biochemistry results largely from the fact that nuclei of the same element in different chemical and magnetic environments give rise to distinct spectral lines. This means that each NMR-active atom in a large molecule such as a protein can be observed and can provide information on structure, conformation, ionisation state, pKa, and dynamics. The nuclei which are most relevant to the study of biological macromolecules are shown in Table 1. The proton (1 H) is the most sensitive nucleus for NMR detection. For biological studies, 13 C and 15 N are now just as important, although enrichment with these stable isotopes is necessary. The first published NMR spectrum of a biological macromolecule was the 40 MHz 1 H spectrum of pancreatic ribonuclease reported in 1957. Since then, the significant milestones for NMR include: . . .
Fourier Transform NMR in the late 1960s; the development of two-dimensional NMR in the early 1970s; the development of INEPT/HMQC pulse sequences in the late 1970s;
Protein NMR Spectroscopy: Practical Techniques and Applications, First Edition. Edited by Lu-Yun Lian and Gordon Roberts. Ó 2011 John Wiley & Sons, Ltd. Published 2011 by John Wiley & Sons, Ltd.
2
Protein NMR Spectroscopy
Table 1
Properties of nuclei of interest in NMR studies of proteins
Isotope
Spin
1
1
H H 3 H 13 C 14 N 15 N 17 O 19 F 31 P 111Cd 113Cd 2
a
.
.
/2 1 1 /2 1 /2 1 1 /2 5 /2 1 /2 1 /2 1 /2 1 /2
Frequency (MHz) at 14.0954T
Natural Abundance
600.13 92.124 640.123 150.903 43.367 60.834 81.356 546.686 242.937 127.32 133.188
99.99 0.0115 0 1.108 99.636 0.37 0.038 100 100 12.80 12.22
Relative Sensitivitya 1.00 9.65 10 1.21 1.59 10 1.01 10 1.04 10 2.91 10 0.83 6.63 10 9.66 10 1.11 10
3 2 3 3 2 2 2 2
For equivalent numbers of nuclei (i.e. 100 % isotope).
the application of NMR to solve the full three-dimensional structure of a protein in solution in the early 1980s; the introduction, in the late 1980s, of three- and four- multidimensional heteronuclear experiments for use with 13 C/15 N isotopically labelled proteins, followed in the late 1990s by the TROSY experiments (which require protein deuteration in addition).
Each stage of these developments has been accompanied by improvements in the spectrometer hardware. In particular, the increases in magnetic field strengths, improved probeheads such as cryogenically-cooled probeheads and better electronics have together led to very substantial improvements in resolution and sensitivity. In addition, continuous advances in molecular biology and sample preparation have allowed these NMR-based improvements to be exploited, particularly in the speed with which samples of significant quantities can be produced in a cost-effective way and the ease with which stable isotope enrichment can be accomplished. Finally, the data analysis is now more streamlined and in the case of very high-quality data, the structure determination process, from resonance assignment to structure calculation, can be automated. These developments are important in order for NMR to remain a mainstream technique for high-resolution structure determination and to make significant contributions in structural biology. For structural biology, NMR is unique in that it can be used for studies of macromolecules in both the solution and solid states and it is, furthermore, the only method that can provide information on dynamics at the atomic level. This book focuses on the use of NMR to study protein structure and interactions in solution, with the aim of providing a practical guide to users of the method. The book attempts to deal with methods, approaches and issues commonly encountered in the everyday use of NMR in structural biology. No attempt is made to provide a description of the fundamental physics of NMR, but in some chapters it is necessary to detail the theoretical aspects of the methodology in order that the methods can be appropriately applied. A full discussion of the fundamental basis of the wide range of solution NMR experiments used in structural biology can be found in [1] and other valuable introductions to modern NMR spectroscopy include [2,3].
Introduction
3
The success of any application of NMR depends on the correct sample preparation, the appropriate use of parameters for data acquisition and processing; these are covered in Chapter 1. Once initial data has been collected to assess if a protein system is suitable for NMR studies, the next step will depend upon whether the objective is a determination of the three-dimensional structure of a protein or its complex, or a more limited specific objective such as screening for ligand binding or the determination of pKa values. For all but the smallest proteins, isotope-labelling will be required (Chapter 2), and to go beyond purely qualitative experiments resonance assignments (Chapter 3) will be essential. Structure determination involves the acquisition and treatment of structural restraints (Chapter 4) and the use of these to obtain structural ensembles (Chapter 5). Chapter 6 describes the additional information on protein structure or complex formation which can be obtained when the protein contains a paramagnetic species – either naturally occurring or introduced specifically for the purpose. Chapter 7 describes different approaches to the study of the binding of small molecules, ranging from screening to full structure determination of the complex; this requires an understanding of the theoretical and practical aspects of the effects of chemical exchange in NMR, which is also important in many other areas of biological NMR. Chapter 8 provides a comprehensive description of the use of NMR to study macromolecular complexes; this is a challenging area and the chapter outlines the problems and approaches which can be taken to overcome these challenges. Chapter 9 focuses on the structural studies of intrinsically disordered proteins. The widespread existence and significance of these proteins are becoming increasingly recognised and NMR is currently the best method to provide detailed information on their conformational distributions. NMR is uniquely suited for the characterisation of biomolecular dynamics. Since so many nuclei can be detected simultaneously, NMR can provide a comprehensive description of the internal motions and conformational fluctuations at atomic resolution, and NMR methods have been developed to quantify motions that occur at a wide range of timescales, from picoseconds to days and months. At the same time, consideration of dynamics and the averaging processes to which they lead is an essential part of the use of NMR to obtain structural information. As a result, several chapters in this book deal with methods for obtaining dynamic information from NMR. For additional information the reader is also directed to the following reviews [4–7]. Over the last few years there have been significant developments in the application of solid-state NMR techniques as a tool for determining the high-resolution structures of proteins, ranging from microcrystalline soluble proteins to protein fibrils and membrane proteins. It is now possible to assign the spectra of proteins larger than 100 amino acids using 13 C,15 N –labelling [8]. However, as yet this remains an area for the expert and it is not covered in detail in this book (although some of the methods for isotope-labelling described in Chapter 2 will also be relevant to solid-state studies). For useful reviews, the reader is directed to [9,10].
Note Added in Proof Several valuable relevant reviews have appeared while this book was in production. In particular, two useful qualitative introductions to biomacromolecular NMR for the newcomer to the field would serve as valuable initial reading [11,12]. Clore [13] has
4
Protein NMR Spectroscopy
reviewed the use of relaxation methods (see Chapters 6 and 7) to observe species with low population, Wishart [14] has reviewed the use of chemical shifts in structure determination (see Chapters 4 and 5), and Dominguez et al. [15] have reviewed the use of NMR in the study of protein-RNA complexes (see Chapter 8).
References 1. Cavanagh, J., Fairbrother, W.J., Palmer, A.G. III et al. (2007) Protein NMR Spectroscopy: Principles and Practice, 2nd edn, Academic Press, San Diego. 2. Keeler, J. (2005) Understanding NMR Spectroscopy, John Wiley & Sons, Ltd, Chichester. 3. Levitt, M.H. (2001) Spin Dynamics: Basis of Nuclear Magnetic Resonance, John Wiley & Sons, Ltd, Chichester. 4. Mittermaier, A.K. and Kay, L.E. (2009) Observing biological dynamics at atomic resolution using NMR. TIBS, 34, 601–611. 5. Baldwin, A.J. and Kay, L.E. (2009) NMR spectroscopy brings invisible protein states into focus. Nature Chem. Biol., 5, 808–814. 6. Jarymowycz, V.A. and Stone, M.J. (2006) Fast time scale dynamics of protein backbones: NMR relaxation methods, applications, and functional consequences. Chem. Revs., 106, 1624–1671. 7. Igumenova, T.I., Frederick, K.K. and Wand, A.J. (2006) Characterization of the fast dynamics of protein amino acid side chains using NMR relaxation in solution. Chem. Revs., 106, 1672–1699. 8. Schuetz, A. et al. (2010) Protocols for the sequential solid-state NMR spectroscopic assignment of a uniformly labeled 25kDa protein: HET-s(1-227). ChemBioChem., 11, 1543–1551. 9. Renault, M., Cukkemane, A. and Baldus, M. (2010) Solid-state NMR spectroscopy on complex biomolecules. Angew. Chem. Int. Ed., 49, 8346–8357. 10. McDermott, A. (2009) Structure and dynamics of membrane proteins by magic angle spinning solid-state NMR. Ann. Rev. Biophys., 38, 385–403. 11. Kwan, A.H., Mobli, Gooley, P.R. et al. (2011) Macromolecular NMR spectroscopy for the nonspectroscopist. FEBS Journal, 278, 687–703. 12. Bieri, M., Kwan, A.H., Mobli, M. et al. (2011) Macromolecular NMR spectroscopy for the nonspectroscopist: beyond macromolecular solution structure determination. FEBS Journal, 278, 704–715. 13. Clore, G.M. (2011) Exploring sparsely populated states of macromolecules by diamagnetic and paramagnetic NMR relaxation. Protein Sci., 20, 229–246. 14. Wishart, D.S. (2011) Interpreting protein chemical shift data. Prog. Nucl. Magn. Reson. Spectrosc., 58, 1–61. 15. Dominguez, C., Schubert, M., Duss, O. et al. (2011) Structure determination and dynamics of protein–RNA complexes by NMR spectroscopy. Prog. Nucl. Magn. Reson. Spectrosc., 58, 62–87.
1 Sample Preparation, Data Collection and Processing Frederick W. Muskett
Purgamentum init, exit purgamentum
1.1
Introduction
The power of NMR spectroscopy for the analysis of biological macromolecules is undisputed. During the last two decades, the development of spectrometers and the experiments they perform, software, and the molecular biological techniques for the expression and purification of proteins have progressed at a formidable rate. Enrichment of molecules in the three major isotopes used in NMR (15 N, 13 C and 2 H) is now commonplace and the cost is no longer prohibitive. The software used to analyse the plethora of data we can generate makes spectral assignment and the extraction of data straightforward for all but the most challenging systems. With all these developments it is easy to forget some of the more fundamental requirements for obtaining good quality NMR data, namely a good sample and a well-set-up NMR experiment.
1.2
Sample Preparation
The first, and possibly one of the most important, steps before embarking on an NMR-based project is the preparation of the sample. Spending some time optimising sample Protein NMR Spectroscopy: Practical Techniques and Applications, First Edition. Edited by Lu-Yun Lian and Gordon Roberts. 2011 John Wiley & Sons, Ltd. Published 2011 by John Wiley & Sons, Ltd.
6
Protein NMR Spectroscopy
conditions for concentration, ionic strength, pH and temperature before collecting large amounts of data will pay dividends, particularly if the sample is difficult and expensive to produce. Ideally, the optimised sample will not only give the best possible NMR data, but will also have long-term stability as, assuming the project requires backbone and side-chain assignments, the total acquisition time required can be in the order of several weeks. The following sections will outline the general requirements of a biological sample that is to be used to record NMR data. The assumption has been made that full resonance assignment is required; however, these guidelines could, and probably should, be applied to all samples regardless of the intention of the experiments. 1.2.1
Initial Considerations
This optimisation can be performed in the NMR spectrometer but much can be done using other biophysical techniques such as circular dichroism or fluorescence spectroscopy. These methods require much lower concentrations and do not require isotopic enrichment. The effects of buffer composition on secondary structure content and the melting temperature of the sample give a useful starting point. Once the initial conditions have been determined, final optimisation in the spectrometer can begin. If the sample is a protein, although much can be learned from a simple one-dimensional proton experiment, by far the most useful experiment is the 15 N-edited HSQC. This type of experiment removes a great deal of resonance overlap, allowing the user to see in much more detail the effects of varying pH, ionic strength and temperature. NMR has intrinsically poor sensitivity and, as a result, the concentration of the sample needs to be in the millimolar range. For a conventional room temperature probe ideally the sample concentration needs to be 1 mM but can be as low as 0.5 mM. With the development of cryogenically cooled probes this concentration can be reduced to 0.2 mM and given the right sample can be as low as 0.05 mM (depending on the experiments performed). Whilst the sample can be exchanged into the buffer intended for NMR experiments in the last step of purification (usually gel filtration chromatography) it is rarely at the concentration required. The two main methods used to increase concentration are lyophilisation with subsequent re-suspension in a lower volume and ultra-filtration, or a combination of the two. Unfortunately, whether a sample will survive either method cannot be predicted; in the end one must simply try and see what happens. However, lyophilisation is generally considered the more dangerous of the two. The number and type of disposable ultra-filtration devices on the market is large, each with their own characteristics regarding compatibility with a particular sample and the effective volumes with which they can be used; again, try and see what happens. As a final step, either passing the sample though a 0.2 mm filter or centrifuging in a benchtop micro-centrifuge, to remove any insoluble material or dust, will greatly help sample homogeneity. With modern solvent suppression techniques that can effectively eliminate the 110 M protons from the water signal, dissolving the sample in 2 H2 O would, at first, no longer seem to be required. However, such samples are still important in recording the experiments designed to allow assignment of protein side-chain resonances and for 13 C-edited nuclear Overhauser effect (NOE) experiments. Even the most efficient solvent suppression techniques still leave residual solvent signal and at the same time suppress or distort the signals of interest in that area of the spectrum. In addition, the use of 2 H2 O allows one
Sample Preparation, Data Collection and Processing
7
to carry out experiments in which the coherences are recorded in the directly detected dimension where they have the highest resolution. These two advantages alone outweigh the effort required to transfer the sample into 2 H2 O. Methods for exchanging the solvent for 2 H2 O are the same as for concentrating samples, either lyophilisation or repeated concentration and dilution with 2 H2 O. Alternatively, if the sample is unlikely to survive those methods, the sample can be passed down a short de-salting gel-filtration column that has been pre-equilibrated in 2 H2 O. 1.2.2
Additives
As many NMR experiments require hours or days to complete, addition of anti-microbial agents is highly recommended. Sodium azide at a concentration of 0.02 % w/v is an almost universal method; however, in the rare cases where the azide ion interacts with the sample (e.g. some cytochromes) micromolar concentrations of an antibiotic such as ampicillin or chloramphenicol can be substituted. EDTA or AEBSF are frequently added to NMR samples at a concentration of 0.1–5 mM in order to reduce proteolysis. However, these compounds have nonexchangeable protons that can interfere with the spectrum of the sample. Excessive use of these compounds is best avoided, a better approach being to improve the purification protocol. If the protein sample contains free cysteines reducing agents such as DTTor TCEP are required to stop the protein forming dimers or multimers, which can result in precipitation. In addition, degassing the sample can help; however samples inevitably re-dissolve oxygen during subsequent sample manipulations or during the course of the NMR experiment unless special care is taken to seal the NMR tubes. 1.2.3
Sample Conditions
Although the primary choice of buffer must be that which promotes long-term stability of the sample, some buffer salts are more convenient for NMR than others. As buffer concentrations are typically between 10–50 mM, any covalently bonded protons in the buffer will give rise to sharp and obtrusive signals in the spectrum. This has resulted in phosphate buffer being the primary choice if its buffering range is appropriate to your sample and if it does not interact with your protein – though many proteins bind ligands containing phosphate groups, from ATP to phosphoproteins and DNA, and may bind inorganic phosphate weakly. Otherwise, many of the more common buffer salts are available with deuterium replacing the nonlabile protons. When selecting a pH it should be borne in mind that the exchange rates of amide protons are such that pH values between 3 and 7 are most conducive to observing the signals arising from these groups. For many biological samples, the addition of salts (typically sodium chloride) to the buffer increases solubility and decreases aggregation. Unfortunately, in NMR the dielectric losses at high ionic strength (greater than 150 mM) are severe, particularly in cryogenically cooled probes and at high magnetic fields. Such losses can be dramatic, as degrading the signal-to-noise ratio twofold results in a fourfold increase in acquisition time to achieve comparable spectra. Recently, alternatives to traditional buffer systems have be proposed, such as dipolar ions [1], low conductivity buffers [2] or the use of ‘solubilising salts’ [3]. The general applicability of these alternatives is yet to be realised; however, they should be investigated if the sample has low solubility and/or requires high ionic strength for stability.
8
Protein NMR Spectroscopy
The final parameter to consider in optimising the sample conditions is the temperature at which the experiments are to be performed. As the temperature increases, the correlation time of the molecules decreases and so the resonances become narrower. In addition, varying the temperature will lead to changes in the chemical shift of temperature sensitive groups and may help to resolve any resonance overlaps. Typically, NMR experiments are performed in the temperature range of 293–308 K but if you have determined the thermal stability of the sample in advance (i.e. by CD spectroscopy), you will have a better idea of the attainable upper limit. 1.2.4
Special Cases
There are two types of sample that require extra attention: integral membrane proteins and samples intended for ligand titrations. The use of solution NMR methods to study membrane proteins, although not mainstream, is now feasible. The solubilisation of integral membrane proteins in detergent micelles is relatively straightforward and the procedures for preparing membrane protein samples are essentially as described. There is much debate about which detergents are best for preparing NMR samples and it is apparent that no one detergent will suit all proteins. As a result, screening of several different detergent types at different concentrations is required. In addition, significant improvement of the spectrum can be achieved by using sample temperatures significantly higher than those used for soluble proteins (>310 K). The reader is referred to some excellent reviews on sample optimisation [4–6]. The study of ligand interactions via NMR is a well-established technique, enabling the identification of a specific binding site and the determination of kinetic information (see Chapter 7). The usual method for obtaining this information is to run successive spectra with increasing concentrations of ligand whilst observing the spectral changes that result. However, there is a danger that addition of the ligand will lead to changes in the sample conditions other than those due to ligand binding, resulting in artefactual/artificial changes in the spectrum. It can be difficult to obtain identical buffer conditions on mixing different proportions of two or more samples even if they have been dialysed against the same buffer but in separate dialysis tubes, as many biological macromolecules have a high affinity for electrolytes. In addition, if the ligand is a small molecule it may be difficult to solubilise and impossible to dialyse into the same buffer as the macromolecule. The major concern in mixing two samples together or adding a ligand to a sample is a change in pH, as this alone can result in considerable chemical shift changes in the molecule of interest, as any ionisable groups change state. Fortunately, this property of ionisable groups can be used to monitor the pH of the NMR sample. Addition of a small molecule (e.g. imidazole) to the sample can be used to monitor the sample’s pH as additions of ligand are made. Any pH shift will become immediately apparent as the chemical shift of the imidazole resonances will change. A number of these molecules have recently been characterised and the reader is referred to this article for more details [7]. In order to minimise the number of manipulations during a titration experiment, and so reduce the likelihood of systematic errors, the following procedure is recommended (see also Chapter 8 for a more detailed description). Two samples should be prepared, one of the biological macromolecule alone and the other with the macromolecule and the ligand at the concentration of its maximum titre – estimated, for example, by a biochemical assay.
Sample Preparation, Data Collection and Processing
9
A spectrum is recorded of each sample, giving the initial and end point of the titration series. Assuming 600 ml sample volumes and that the titration series is to be incremented in steps of 0.1 molar equivalents of ligand, 60 ml of each sample is removed and mixed with the other so giving ratios of 1 : 9 and 9 : 1. Spectra are again recorded, so giving the next highest and lowest points in the titration series. This procedure is repeated until both samples are of equal concentration of macromolecule and ligand, at which a spectrum of each is recorded and compared. At this point, the spectra should, of course, be identical unless an error was made. The advantages of this method are that the concentration of the macromolecule never changes due to dilution by the addition of ligand and also, because there is a clearly defined end-point, any errors, or sudden changes in the sample condition, will not go unnoticed. 1.2.5
NMR Sample Tubes
For high-resolution NMR experiments, the homogeneity of the magnetic field in which the sample sits must of course be very high. This is most easily achieved with a small sample volume, but on the other hand, the sensitivity – always limiting in an NMR experiment – sets a limit to how small a sample can be used. The sample is usually housed in a 5 mm diameter tube; the optimal volume of sample for the highest signal-to-noise ratio and magnetic field homogeneity (and hence the optimal resonance lineshape) will be dependent on the probe design and will vary between manufacturer and between probe generations from the same manufacturer. This information should be provided by the manufacturer but is usually of the order of 600 ml in a conventional 5 mm NMR tube. However, the use of ‘standard’ 5 mm tubes is now in decline due to the development of a reduced volume symmetrical microtube by Shigemi Inc., commonly referred to as a ‘Shigemi tube’. These tubes have a plug of susceptibility matched glass at the bottom of the tube with the equivalent in the form of a plunger that forms the top of the sample. The advantage of these tubes is that the sample volume need only match the susceptible volume of the transmitter/receiver coils of the probe. The glass plugs eliminate the change in magnetic susceptibility of the solvent/glass or solvent/air interface and hence the ‘end effects’ associated with short samples observed in standard NMR tubes. Therefore, sample volumes can be reduced to 300–350 ml. The temptation to reduce the sample volume even further should be resisted as the resonance lineshape, particularly of the residual water, deteriorates rapidly. One drawback to the use of these tubes is that samples tend to degas over a period of hours at temperatures much above 298 K. This results in air bubbles forming at the top of the sample, which has disastrous effects on the sample homogeneity and, as a result, on the residual water resonance. Therefore, the sample should be pre-incubated at the required temperature prior to insertion of the sample in the magnet. (Alternatively, the sample homogeneity should be checked several hours after inserting the sample in the magnet and before starting a long NMR experiment.) If conventional 5 mm tubes are to be used, selecting the correct grade is important. Although the intrinsic linewidth of a high molecular weight biological sample is high (i.e. for human ubiquitin, Mr 8.5 KDa, amide resonances range between 10 and 15 Hz) using cheap tubes limits the attainable field homogeneity. However, due to these larger intrinsic linewidths extremely high precision, and therefore expensive, NMR tubes are not really required and those graded for use in 500 MHz and above will suffice.
10
Protein NMR Spectroscopy
Whichever style of NMR tube is selected, none should be used directly from the box. The chemicals used during their manufacture will result in intense signals contaminating the spectra of the sample. Soaking in commercially available laboratory detergents (e.g. Decon 90) is usually sufficient for both new and used tubes. However, if a sample has precipitated, the residue adhering to the glass can be particularly stubborn and soaking in a strong mineral acid may be required. However, considering the cost of making the sample in the first place, perhaps such tubes should be consigned to the glass bin. Avariety of NMR tube cleaners are commercially available (e.g. Sigma-Aldrich or GPE Scientific Ltd); these are extremely useful for ensuring the tubes are thoroughly rinsed after soaking in either detergent or acid. If the sample is to be dissolved in 2 H2 O, the tubes may be soaked in 2 H2 O to exchange the residual water bound to the glass. Once rinsed, tubes are best dried in a stream of dry nitrogen gas as heating the tubes can cause distortions which will adversely affect field homogeneity. 1.2.5.1 3 mm Tubes As discussed previously, the ionic strength of a sample can have profound effects on the signal-to-noise ratio of the observed spectrum. If the sample cannot tolerate low levels of salt then a final option is to use 3 mm NMR tubes, even if the probe is designed to be used with 5 mm tubes. In effect, the sample adds a resistance to the receiver coils of the probe; therefore reducing the volume of sample reduces this additional resistance and so does not degrade the overall sensitivity of the probe. Figure 1.1 shows the effect of increasing ionic strength on the signal-to-noise ratio observed in a 600 MHz cryogenically cooled probe when using a standard 5 mm, 3 mm or Shigemi tube. At low ionic strength (i.e. 400 mM), due to a combination of sample volume and the physical dimensions of the sample.
1.3
Data Collection
The initial four steps required to obtain an NMR spectrum are identical regardless of the type of experiment to be carried out. These are: to establish field/frequency lock, to tune and match the probe, to optimise the magnetic field homogeneity (shimming) and to calibrate the radio frequency pulse lengths. It is recommended that the user follows these four operations every time a sample is used, even if it has been used frequently or recently, as changes may indicate a problem, either with the sample itself or with the spectrometer. Depending on the age of the spectrometer, these steps may now be either fully or partially automatic. However, knowledge of how to perform these steps manually is essential if the sample is challenging or if better than merely ‘good’ data is required. The assumption is made throughout this section that the NMR spectrometer to be used is well maintained, i.e. all heteronuclear pulses are properly calibrated, the VT unit is calibrated and the magnetic field homogeneity is optimised. Additionally, no attempt has been made to discuss which experiments should be collected or the theory behind them. Such a discussion is well beyond the scope of this chapter and the reader is directed to later chapters in this volume and to the following excellent discussions [8–11]. 1.3.1
Locking
Once the sample is safely in the magnet and its temperature has equilibrated the first step is to establish a ‘field-frequency lock’. Even though the magnetic field produced by the superconducting magnet is extremely stable, there is some drift that will adversely affect the lineshape of the spectrum. The function of locking is to ensure that the magnetic field the sample experiences is constant. The use of deuterium as the locking nucleus is now universal and in the case of samples dissolved in H2O, 2 H2 O should be added at a level of 5–10 %. To lock, the magnetic field is first adjusted to bring the deuterium signal on resonance with the lock frequency of the spectrometer. Care should be taken that too much power is not used on the lock transmitter or the magnetic field stability will be poor and shimming will be unresponsive. Optimal lock transmitter power can be found by increasing the lock power stepwise, and observing the lock signal. Once the signal becomes ‘saturated’, indicated by a rise and then drop in lock level when the power is increased, lock power should be reduced by several dB such that a stable lock signal is obtained. Lock phase should also be adjusted to give the highest signal, ensuring that the lock circuit is at its most sensitive, thus providing optimal stability to the system. 1.3.2
Tuning
Putting a biological sample into the transmit/receive coils of an NMR probe affects the tuned circuit of the coil used to excite the sample and to detect the NMR signal. Therefore,
12
Protein NMR Spectroscopy
the probe needs to be tuned back to the correct resonance frequency and matched to the correct impedance. The tuning of the observe coils (usually proton) is strongly sample dependent and proper tuning is essential to obtain the highest sensitivity. This circuit should be tuned and matched to each sample and the procedure repeated if, for example during a titration experiment, additions are made to the sample. Conversely, the indirect or heteronuclear coils of the lock circuit are reasonably insensitive to the sample and do not need routine tuning. Modern spectrometers have a visual feedback (the precise nature of which is manufacturer dependent) to facilitate the tuning and matching procedure as each requires the physical adjustment of capacitors in the circuit. As the two adjustments tend to interact with each other this process can be troublesome. However, assuming the tuning of the circuit is not too far off, if an iterative approach is adopted, i.e. the match is first fully optimised, and then the tune and so on until no improvement is achieved, the process should be relatively straightforward. 1.3.3
Shimming
The superconducting magnet of the spectrometer alone cannot produce a magnetic field homogenous enough for NMR experiments. Therefore, additional room-temperature electromagnetic coils, known as ‘shim-coils’, are placed around the sample and are used to cancel out the residual inhomogeneities of the main magnetic field. These inhomogeneities are either inherent to the magnet itself or are introduced when the sample is put into the bore of the magnet. The process of correcting these inhomogeneities is known as shimming. Shimming can be carried out either manually or semi-automatically using gradient based techniques. The affect of adjusting a shim coil can be observed in one of two ways, by observing the Free Induction Decay (FID) or the lock level. As most biological samples will at some time be dissolved in a nondeuterated solvent it is usual to shim these samples using the lock level. The intense signal from the water decays much more rapidly than expected due to the phenomenon known as radiation damping, consequently shimming on the FID is particularly unproductive for these samples. Modern spectrometers may have up to 40 shim coils named according to the field profile they generate. Due to the field profile of the shims, the higher order shims interact with the lower order shims such that overall shimming is not a first order process. As a consequence, the manual shimming of an NMR magnet has become shrouded in mystery and folklore and there are probably as many protocols to obtain homogeneity as there are NMR spectroscopists. Coupled with the rapid development of automatic gradient-based shimming methods, manual shimming is slowly being ousted to the point that many structural biologists cannot manually shim their samples. But, however robust the gradient-based methods are they can, and do, fail to shim satisfactorily and then the spectroscopist must intervene. The following texts give extensive descriptions of the shimming process and protocols that the reader might try [8,12,13]. As almost all studies of biological macromolecules depend extensively upon 2D and 3D experiments, these samples are never spun, and unlike those for small molecules, these shimming protocols do not include the use of sample spinning. Biological samples usually have linewidths of 10 Hz or more and this should be taken into account during the shimming process. There is little to be gained in spending hours shimming a sample in an attempt to reduce the solute linewidths by a few fractions of a hertz.
Sample Preparation, Data Collection and Processing
13
However, the presence of a very strong solvent signal in samples dissolved in H2O makes reaching ‘good’ shimming much more critical. For these samples, the lineshape of the residual water resonance is important, as a poorly shimmed sample will have long tails from the intense solvent peak obscuring solute signals. In addition, the spectrum will have a distorted baseline that can be particularly detrimental in multidimensional experiments. An easy way to assess the quality of the shimming in an H2O sample is to acquire a onedimensional pre-saturation experiment. After optimisation of the transmitter offset to maximise solvent signal suppression the solvent signal should be easily suppressed using a pre-saturation duration of 1.5 sec and a maximum B1 field of 100 Hz, with the residual solvent signal being no more than 150 Hz wide at its base. Such a residual signal should be achievable even on a cryogenically cooled probe where radiation damping is even more of a problem. Failure to achieve such a ‘lineshape’ will result in poor quality spectra. Although pre-saturation is rarely used in modern NMR experiments it is nevertheless a good test of shimming, as failure to adequately suppress the solvent by this method will not be compensated for by more sophisticated methods (e.g. watergate [14]). 1.3.4
Calibrating Pulses
All but the most basic NMR experiments used to study biological samples are multipulse experiments that depend on applying pulses with the correct flip-angle. Even the simplest ‘pulse and acquire’ experiment benefits from a properly calibrated 90 pulse as then maximum intensity is obtained. The heteronuclear (i.e. 13 C, 15 N and 31 P) pulse widths are insensitive to the sample and once calibrated need only checking occasionally (e.g. every six months or so, unless there has been a change in hardware) and do not need recalibrating for each sample. Calibration of heteronuclear pulses is described in detail in [15]. However, proton pulse widths are sensitive to the sample, particularly ionic strength, and should be calibrated each time the sample is used or if anything is added to the sample. Calibration of proton pulse widths is a straightforward process providing a few simple rules are followed: place the signals of interest in the centre of the spectrum and allow a long enough relaxation delay for the signals to relax fully. Normal practice is to increase the flip angle until the first signal maximum, or the first or second signal null is found. A coarse calibration is performed first, i.e. linearly increasing the pulse width in 4 msec steps, then when an approximate value is found a finer-grained calibration can be performed. The null methods are more precise as the maxima are rather broad, and if the 360 null is chosen a shorter relaxation delay can be used. Once the 360 null is found, simply dividing by four gives the 90 pulse width. If the sample is dissolved in H2O then it is most practical to use this resonance. However, even if the receiver gain is set to its lowest value, ADC overflow will occur, and the null methods are recommended. Pulses designed to selectively excite the solvent resonance are commonly encountered in many NMR experiments optimised for biological samples. The spin-lattice relaxation time of H2O is much longer than those of the macromolecule and the water proton resonance is therefore partially saturated at the end of the pulse sequence. Chemical exchange and spin diffusion between protons on the molecule of interest (in proteins the HN and Ha spins are affected most) and the partially saturated water can lead to a partial saturation of the protons of the macromolecule. This can be avoided by selectively returning the water magnetisation back to the z-axis; so called ‘water-flip-back’.
14
Protein NMR Spectroscopy
Such selective excitation can be achieved either by simply reducing the radio frequency (RF) field strength and/or by increasing the length of the pulse (referred to as selective pulses or soft pulses) or by the use of a shaped pulse. Here, both the amplitude and the phase are varied during the period of the pulse to achieve the desired excitation profile (commonly used shapes are gaussian and sinc shapes).1 Whether selective or shaped pulses are used their excitation profile should be such that only the solvent peak is excited and not the resonances of interest. Modern spectrometer software will calculate the approximate power of such pulses based on the calibrated 90 pulse width supplied by the user. Fine tuning of this power is required by the user to achieve optimum solvent suppression. In addition to the power of the pulse, fine tuning of the transmitter offset may be required. 1.3.5
Acquisition Parameters
Once the initial four steps are complete, attention is now shifted to the acquisition parameters of the NMR experiments themselves. Many of these parameters are sampledependent and therefore no hard and fast rules can be supplied. However, once the acquisition parameters are optimised they can be applied on subsequent occasions. Modern spectrometer software allows the user to write macros for common commands to facilitate spectrometer set-up and data processing; these can be easily adapted to loading experimental parameters, thus avoiding mistakes and hence wasting long hours of spectrometer time (and sample life). The parameters that are applicable to the vast majority of experiments (from 1D to 4D experiments) are considered below and from a practical point of view rather than a theoretical one. The author has avoided the use of manufacturer specific terminology to avoid confusion. The transmitter offset is usually positioned such that the signals of interest are centred. If the spectrum contains an intense solvent peak it is usual to position the transmitter offset on this resonance. Apart from the solvent suppression method employed, this makes the use of convolution functions (see data processing, below) more convenient and reduces the possibility of spectral artefacts arising from such an intense signal. Inspection of the BMRB database (Biological Magnetic Resonance Data Bank, http://www.bmrbwiscedu/) shows that the vast majority of proton resonances found in biological macromolecules (protein, DNA and RNA) are between 14 and 1.0 ppm. However, it should never be assumed that this spectral range will contain all the resonances of interest. For example, paramagnetic systems have much larger spectral widths, sometimes in the order of 100 ppm (see Chapter 6). For such samples, acquisition of all the proton resonances in a single spectrum is impossible. Initially, a relatively wide spectral width should be acquired to avoid aliasing or folding in signals that are shifted outside the ‘normal’ range as a result of bound ligands. In addition, modern digital filters are extremely efficient and resonances that are significantly outside the set spectral width will be filtered out and so never observed. Similarly, the heteronuclear chemical shift ranges are well known (e.g. 90–140 ppm for backbone amide nitrogens and 5–85 ppm for aliphatic side-chain carbons) but should be checked for unusually shifted resonances. Arginine side-chain guanidine nitrogen signals are commonly observed in
1
Such pulses are also commonly used to decouple the carbonyl and Ca spins of proteins.
Sample Preparation, Data Collection and Processing
15
N-HSQC type experiments, usually folded into the spectrum from 70 ppm; less commonly lysine side-chain amino nitrogens are observed (folded from 20 ppm). A common practice is to reduce the spectral width for indirect heteronuclear dimensions (particularly 13 C) and deliberately fold resonances in, so allowing a higher resolution with the same total acquisition time. If the initial incremental delay for the aliased dimension is set to 1/(2SW) where SW is the spectral width (and quadrature is achieved with the States-TPPI method), then resonances that have been folded or aliased an odd number of times will have opposite phase to those that are not aliased (or have been aliased an even number of times). However, care should be taken when folding spectra lest signals be folded on top of others and so increase the resonance overlap (or signal cancellation) in already crowded spectra. The long correlation times of macromolecules results in the signals relaxing relatively quickly, and as a consequence acquisition times are quite short (i.e. 100 ms). There is a temptation to increase the acquisition time of the FID in the misconception that this will enhance the resolution of the spectrum. If data is recorded after the signal has decayed, all that is then measured is noise and, after Fourier transformation, the signal-to-noise ratio of the resulting spectrum is degraded. Determination of the optimal acquisition time is best performed experimentally. If a 1D spectrum with good signal-to-noise is acquired it can be compared to subsequent spectra either acquired with a shorter acquisition time or with the FID truncated during processing. A judgement must be made as the point where the gain in signal-to-noise is outweighed by the loss in resolution. Acquisition times in the indirect dimension(s) of 2-, 3- and 4- dimensional experiments are set via the number of acquisitions, or increments, and are in turn related to the spectral widths in these dimensions. The relationship of AQ ¼ N/(2SW) (where AQ is the total acquisition time, N is the number of increments and SW the spectral width) is well known. Theoretically, optimal resolution requires an acquisition time up to 3/R2, and the maximum signal-to-noise is obtained at 1.26/R2, where R2 is the transverse relaxation rate [16,19], but this is rarely possible to achieve, and so a balance must be struck. This balance is additionally complicated by the constraints of the phase cycle of the NMR experiment itself. This sets a minimum number of scans per increment and only integer multiples of a phase cycle should be used. As a general guide for two-dimensional homonuclear experiments, acquisition times in the indirect dimension of 40–60 msec should give adequate resolution. For heteronuclear three-dimensional experiments, both total acquisition time constraints (experiments usually require 3–4 days) and relaxation limit the proton acquisition time to 20 msec. For 15 N dimensions, acquisition times 30 msec should be used and for 13 C acquisition times are usually limited to 9 msec so as not to resolve one bond carbon couplings. If carbonyl resonances are being recorded, this can be extended to 25 msec to give better resolution in this crowded region. However, when the experiments make use of constant-time periods [17–19] it is usual to acquire the maximum number of points allowable in these dimensions. Although these acquisition times limit the resolution of the spectra, this can be regained during processing with judicious linear prediction of the spectrum if sufficient signal-to-noise is available. Invariably, an NMR experiment consists of multiple scans that are added together. Failure to leave an adequate relaxation delay between scans is likely to result in artefacts in a multidimensional experiment as the spin systems will not be in the same state at the start of each scan. In addition, if the relaxation delay is too short the spins will not have fully relaxed 15
16
Protein NMR Spectroscopy
and so the signal-to-noise ratio will decrease. Traditionally, in an attempt to compromise between signal-to-noise and an artefact free spectrum, an interscan delay of one to one and a half times T1 is used. For a fixed total experiment time there is a trade-off between the increase in observed signal as the relaxation delay is increased and the concomitant decrease in the total number of scans that can be acquired. This becomes more apparent with deuterated proteins which have longer relaxation times. In order to double the signal-tonoise ratio of the spectrum the number of scans acquired needs to increase four fold. Therefore, it may be better to reduce the relaxation delay, and so reduce the observed signal per scan, in order to increase the total number of scans and hence the final signal to noise ratio. Most modern NMR probes have a maximum duty cycle, usually in the order of 15 %, that should not be exceeded. As a result, relaxation delays are in the range of one to one and a half seconds. Additionally, steady state or dummy scans are used to ensure the sample is at thermal equilibrium before data is recorded. The duration of the steady-state period depends upon the experiment being performed. For example, if the experiment contains spin lock periods or decoupling then a steady-state period of five to ten minutes may be required. Avoiding baseline distortions is particularly important in multidimensional experiments, especially when the intensity of the cross-peak is to be determined (e.g. nuclear Overhauser effect spectroscopy (NOESY) spectra). Baseline distortions that manifest as positive and negative ridges can be particularly damaging to a spectrum and may obscure cross-peaks altogether. Reducing these distortions can be achieved by shimming the sample to minimise the linewidth of the remaining solvent signal and by adjusting the pre-acquisition delay to remove the need for frequency dependent phase corrections in the detected dimension. In the indirect dimensions, the initial incremental delay should be set to obtain either a 0 or 180 first-order phase correction. In some cases baseline distortions result if the first few points of the FID are corrupted – this can be caused if the receiver gain has been set too high (referred to as a clipped FID). After Fourier transformation the spectrum shows ‘sinc wiggles’ or truncation artifacts. Additionally, a baseline roll may arise from the transient response of the audio filters to the signal; again, the first few points of the FID are corrupted. If only the first point is affected then the result is a constant baseline offset, but as more points are affected, the baseline distortions become more severe. In order to compare the resonance positions between different samples and spectrometers they are measured relative to a standard compound. Tetramethylsilane (TMS) is the universal reference for 1 H NMR of organic molecules. For biological macromolecules, the situation is less straightforward, since TMS is not soluble in water. The recommended reference for biological samples is the methyl 1 H resonance of 2,2-dimethyl-2-silapentane-5-sulphonic acid (DSS) at 0.00 ppm. However, DSS can interact with biological molecules and a suitable alternative is dioxane, although its resonance, at 3.75 ppm, appears in a more crowded region of the spectrum. Once the proton shifts have been referenced, the heteronuclear chemical shifts are referenced indirectly, using the relevant gyromagnetic ratios. The software used to acquire the data usually carries out this procedure automatically. For a detailed discussion of chemical shift referencing in biological NMR the reader is referred to [20]. 1.3.6
Fast Acquisition Methods
Traditionally, multidimensional NMR experiments are acquired by linearly, and systematically, incrementing the indirect evolution periods. This has the advantage that frequencies
Sample Preparation, Data Collection and Processing
17
are not overlooked, but is time inefficient as regions of frequency space are explored where there is only noise. This can result in total acquisition times of several days. However, new acquisition methods are being developed with the goal of speeding up total acquisition times without loss of information. Experimental time can be reduced from days to hours, hours to minutes and in some cases minutes to seconds. These new methods include projection [21,22], G-matrix [23] and Hadamard NMR [24,25], nonlinear time domain sampling [26,27], and fast-pulsing NMR [28,29]. All have their advantages and disadvantages, some of which will depend on the ‘traditional’ spectrum the sample gives, but all require good signal-to-noise ratios. Consequently, no recommendations can be made as to which the user should try. However, apart from fast-pulsing NMR, all require specific data processing algorithms and/or data manipulation to reconstruct a traditional multidimensional spectrum. Such algorithms are not readily available within the core data processing software. In addition, spectral analysis software development is lagging behind these advances, making analysis of such spectra far from routine. Hopefully, as these methods become more mainstream, so will the processing and analysis.
1.4
Data Processing
Once the experiment is finished, the FID(s) are stored on the computer’s hard drive. Data processing describes the practice of performing mathematical manipulations on the FID prior to conversion from the time domain to the frequency domain by Fourier transformation and, if required, further manipulations on the frequency domain before the spectrum is ready for analysis. Although modern NMR processing software is equipped with a considerable array of tools for ‘fixing’ poor quality data it will nevertheless be suboptimal and its interpretation made all the harder. It is much better to collect the experiments well in the first place rather than depending on software to ‘clean it up’ afterwards. Additionally, some care is needed in the correct use of these mathematical tools as overuse can do more harm than good. The purpose of this section is to provide the user with some practical guidelines to the commonly used data processing tools. The choice of software available for the processing of NMR experiments is a very personal one. The software supplied by the spectrometer manufacturers is more than adequate for the processing of multidimensional experiments and is equipped with the vast majority of algorithms one might want to use. In addition, powerful third party software, such as NMRPipe [30] (http://www.nmrscience.com/nmrpipe.html), is available that can not only process ‘traditional’ multidimensional NMR experiments but can also process some of the fast acquisition methods mentioned above. Additionally, it has built-in functions for data analysis and is readily customisable by the user. Regardless of which software is used the basic steps in transforming the FID into a frequency domain spectrum that the user can analyse are the same and their use is outlined below. The FID stored on the computer consists of a time domain signal that has been sampled at regular intervals and then converted to a digital format. The total number of points in the FID is composed of both real and imaginary data which allows the sign of the frequency with respect to the transmitter offset to be determined. Therefore, the actual spectrum displayed after Fourier transform contains only half the original number of points collected. A one-dimensional spectrum is most usually displayed as a line which is in fact an
18
Protein NMR Spectroscopy
interpolation of these points into a smooth line. Therefore, the more points that make up the line the smoother that line will be. Taking the original FID and adding zeroes to the end of it before Fourier transformation is known as zero filling, with the result that the displayed line is represented by more data points. Doubling the number of points in the FID can be repeated as many times as the user wishes (referred to as zero filling once, twice and so on). Zero filling does not adversely affect the spectrum but nor does it improve the resolution as the measured signal remains the same. It is applied for purely cosmetic reasons. However, the Fourier transform algorithm used by computer programs works best if the number of data points is a power of two. Therefore, it is usual to zero fill the time domain data, prior to Fourier transformation, so that the total number of points is a power of two. The same applies to multidimensional experiments. These are normally displayed as contour plots and the more points the smoother the contours will look. In one-dimensional NMR it is usual to record the FID until it has decayed into the noise. In multidimensional NMR experiments this may not be possible due to the time restrictions discussed above. Not recording the signal until it has fully decayed gives a truncated FID resulting, upon Fourier transformation, in oscillations at the base of the peak. These oscillations are referred to as sinc wiggles as the peak shape is related to a sinc function. The more the FID is truncated the more severe the sinc wiggles. These oscillations are undesirable as they distort the baseline of the spectrum and can obscure nearby signals, sometimes completely. Therefore, the only solution is to apply a weighting function to the FID that drives the signal to zero by the end of the FID (referred to as apodisation). Use of apodisation functions (or weighting functions) is ubiquitous in biological NMR spectroscopy. However, they should be used with caution as poorly matched functions can have side effects, broadening the resonances and reducing the signal-to-noise ratio. There are two basic weighting functions that can be applied to an FID: sensitivityenhancing and resolution-enhancing functions. Sensitivity-enhancing functions, applied to the later part of the FID, improve signal-to-noise ratio at the expense of resolution. The simplest example of this type of weighting function is an exponential. Multiplication of the FID by this function results in the envelope of the FID decaying more rapidly, as a result the resonances become broader (hence they are also known as line-broadening functions) and the noise is suppressed. Resolution-enhancing functions improve resolution at the expense of signal-to-noise by attenuating the first part of the FID and enhancing the latter part. To avoid degrading the signal-to-noise ratio excessively it is usual to apply a second decaying weighting function, commonly a Gaussian function, to drive the tail of the FID to zero, so de-emphasising the noise in the final spectrum. The parameters that define these weighting functions have to be set by the user and additionally have to be ‘matched’ to the FID. Trial and error is used to find the appropriate values. These ‘basic’ weighting functions are rarely used in biological NMR where the use of sine bell functions, and to a lesser extent Gaussian functions, are the most popular. The basic sine bell function is adjusted so that it fits exactly over the FID resulting in a resolution enhancement. This rather severe function gives the typical appearance of a highly resolution enhanced spectrum where the resonances are very sharp, signal-to-noise has been degraded and deep troughs are found on either side of the resonance lines. A simple modification to this function is to shift its maximum towards the beginning of the FID to the limit where the maximum of the function is at the start of the FID. This is referred to as a 90 or p/2 shifted
Sample Preparation, Data Collection and Processing
19
sine function and is now simply a decaying function, and therefore will broaden the resonances but improve signal-to-noise. The degree of shift (anywhere between 0 and 90 but is usually expressed as p/x where p/2 is 90 ) is set by the user in order to optimise resolution without adversely affecting the signal-to-noise. However, the deep troughs on either side of the resonance lines should be avoided, particularly in multidimensional experiments, where signals can be obscured by these distortions. When the FID has been zero filled, a weighting function applied and then Fourier transformed the resulting spectrum will require phase correction to give pure absorption mode signals. There are two phase corrections to be applied, one that is independent of the resonance frequency (zero-order phase correction) and one that is dependent on the resonance frequency (first-order phase correction). Resonances with zero-order phase errors have the same degree of dispersive character across the spectrum whereas resonances with first-order phase errors have varying degrees of dispersive character. The usual procedure to correct phase errors is to first correct the low frequency resonances with a zero-order phase correction and then (if required) to phase the higher frequency resonances with first-order phase correction. The procedure is an iterative one and may require two or three ‘cycles’ of phasing before the spectrum is properly phased. A high degree of first-order phase correction will cause baseline distortions and the problem causing it should be corrected (usually by adjusting the pre-acquisition delay). In the indirect dimensions the phase adjustment should have already been corrected in the experimental design. If this is not the case, the procedure is the same as for the directly detected dimension. In principle, the spectrum should now be ready for analysis. There are, however, two other procedures that might be used if required: a convolution filter to remove an intense solvent signal and a linear prediction procedure to calculate additional data points in a truncated FID. The most common post-acquisition water suppression technique is the convolution difference low-pass filter [31]. This method removes the low-frequency components from the spectrum. Although some baseline distortion occurs near the water signal it is a very effective method or removing this intense signal. However, the user should be aware that this method does not discriminate between the water signal and any signals arising from the sample that are within the filter bandwidth applied. Thus, it should not be considered as an alternative to good solvent suppression. A theoretical description of linear prediction is well beyond the scope of this chapter and the reader is referred to the following excellent articles [32,33]. Here, some general guidance is provided to give the inexperienced user a starting point when using linear prediction algorithms. As mentioned earlier, the indirectly detected dimensions of multidimensional NMR experiments are almost always truncated and linear prediction provides an effective method of ‘extending’ this data, so improving the resolution and spectral quality of these dimensions. In addition, linear prediction is useful for correcting the first few points of a corrupted FID. In general, linear prediction works best for data with relatively high signalto-noise ratios and FIDs should not generally be extended by more than a factor of two as artefacts can arise as well as distortions to the signals themselves. If the 1 H dimensions are Fourier transformed first, the number of signals in the heteronuclear dimensions is reduced; this simplifies the prediction problem and makes the algorithm more stable. Finally, if constant-time experiments have been recorded, the interferogram does not decay as in the FID. In this case, much better results will be obtained if the mirror image linear prediction algorithm is used.
20
Protein NMR Spectroscopy
References 1. Lane, A.N. and Arumugam, S. (2005) Improving NMR sensitivity in room temperature and cooled probes with dipolar ions. J. Magn. Reson., 173, 339–343. 2. Kelly, A.E. et al. (2002) Low-conductivity buffers for high-sensitivity NMR measurements. J. Amer. Chem. Soc., 124, 12013–12019. 3. Hautbergue, G.M. and Golovanov, A.P. (2008) Increasing the sensitivity of cryoprobe protein NMR experiments by using the sole low-conductivity arginine glutamate salt. J. Magn. Reson., 191, 335–339. 4. Krueger-Koplin, R.D. et al. (2004) An evaluation of detergents for NMR structural studies of membrane proteins. J. Biomol. NMR, 28, 43–57. 5. Tamm, L.K. and Liang, B.Y. (2006) NMR of membrane proteins in solution. Prog. Nucl. Magn. Reson. Spectrosc., 48, 201–210. 6. Tian, C.L. et al. (2005) Membrane protein preparation for TROSY NMR screening. Meth. Enzymol., 394, 321–334. 7. Baryshnikova, O.K., Williams, T.C. and Sykes, B.D. (2008) Internal pH indicators for biomolecular NMR. J. Biomol. NMR, 41, 5–7. 8. Cavanagh, J., Fairbrother, W.J., Palmer, A.G. III, et al. (2007) Protein NMR Spectroscopy: Principles and Practice, 2nd edn, Academic Press, San Diego, p. 587. 9. Keeler, J. (2005) Understanding NMR Spectroscopy, 1st edn, John Wiley & Sons, Ltd, Chichester, p. 476. 10. Hounsell, E.F. (1995) H-1 NMR in the structural and conformational analysis of oligosaccharides and glycoconjugates. Prog. Nucl. Magn. Reson. Spectrosc., 27, 445–474. 11. Flinders, J. and Dieckmann, T. (2006) NMR spectroscopy of ribonucleic acids. Prog. Nucl. Magn. Reson. Spectrosc., 48, 137–159. 12. Conover, W.W. (1984) Topics in Carbon-13 NMR Spectroscopy, vol. 4 (ed. G.C. Levy), John Wiley & Sons, Ltd, Chichester, p. 282. 13. Chmurny, G.N.H. and Hoult, D.I. (1990) The ancient and honourable art of shimming. Concepts Magn. Reson., 2, 131–149. 14. Piotto, M., Saudek, V. and Sklenar, V. (1992) Gradient-tailored excitation for single-quantum NMR-spectroscopy of aqueous-solutions. J. Biomol. NMR, 2, 661–665. 15. Braun, S., Kalinowski, H.-O. and Berger, S. (1998) 150 and More Basic NMR Experiments, 2nd edn, Wiley VCH, Weinheim, p. 610. 16. Rovnyak, D., et al. (2004) Resolution and sensitivity of high field nuclear magnetic resonance spectroscopy. J. Biomol. NMR, 30, 1–10. 17. Bax, A. and Freeman, R. (1981) Investigation of complex networks of spin-spin coupling by twodimensional NMR. J. Magn. Reson., 44, 542–561. 18. Rance, M. et al. (1984) Application of omega-1-decoupled 2D correlation spectra to the study of proteins. J. Magn. Reson., 59, 250–261. 19. Bax, A., Mehlkopf, A.F. and Smidt, J. (1979) Absorption spectra from phase-modulated spin echoes. J. Magn. Reson., 35, 373–377. 20. Wishart, D.S. et al. (1995) H-1, C-13 and N-15 chemical-shift referencing in biomolecular NMR. J. Biomol. NMR, 6, 135–140. 21. Kupce, E. and Freeman, R. (2003) Projection-reconstruction of three-dimensional NMR spectra. J. Amer. Chem. Soc., 125, 13958–13959. 22. Kupce, E. and Freeman, R. (2008) Hyperdimensional NMR spectroscopy. Prog. Nucl. Magn. Reson. Spectrosc., 52, 22–30. 23. Kim, S. and Szyperski, T. (2003) GFT NMR, a new approach to rapidly obtain precise highdimensional NMR spectral information. J. Amer. Chem. Soc., 125, 1385–1393. 24. Kupce, E. and Freeman, R. (2003) Frequency-domain Hadamard spectroscopy. J. Magn. Reson., 162, 158–165. 25. Kupce, E. and Freeman, R. (2003) Fast multi-dimensional NMR of proteins. J. Biomol. NMR, 25, 349–354.
Sample Preparation, Data Collection and Processing
21
26. Rovnyak, D. et al. (2004) Accelerated acquisition of high resolution triple-resonance spectra using non-uniform sampling and maximum entropy reconstruction. J. Magn. Reson., 170, 15–21. 27. Marion, D. (2005) Fast acquisition of NMR spectra using Fourier transform of non-equispaced data. J. Biomol. NMR, 32, 141–150. 28. Schanda, P., Kupce, E. and Brutscher, B. (2005) SOFAST-HMQC experiments for recording twodimensional heteronuclear correlation spectra of proteins within a few seconds. J. Biomol. NMR, 33, 199–211. 29. Lescop, E., Schanda, P. and Brutscher, B. (2007) A set of BEST triple-resonance experiments for time-optimized protein resonance assignment. J. Magn. Reson., 187, 163–169. 30. Delaglio, F. et al. (1995) NMRPipe - a multidimensional spectral processing system based on Unix pipes. J. Biomol. NMR, 6, 277–293. 31. Marion, D., Ikura, M. and Bax, A. (1989) Improved solvent suppression in one-dimensional and two-dimensional NMR-spectra by convolution of time-domain data. J. Magn. Reson., 84, 425–430. 32. Stephenson, D.S. (1988) Linear prediction and maximum-entropy methods in NMR spectroscopy. Prog. Nucl. Magn. Reson. Spectrosc., 20, 515–626. 33. Stern, A.S., Li, K.B. and Hoch, J.C. (2002) Modern spectrum analysis in multidimensional NMR spectroscopy: Comparison of linear-prediction extrapolation and maximum-entropy reconstruction. J. Amer. Chem. Soc., 124, 1982–1993.
2 Isotope Labelling Mitsuhiro Takeda and Masatsune Kainosho
2.1
Introduction
The isotopic labelling of proteins is the basis of current NMR methodology, and almost all NMR studies are performed with isotope-labelled proteins. The isotopic labelling of proteins has played crucial roles in addressing the fundamental problems encountered in NMR studies of proteins: signal overlap and line-broadening. In current isotope-aided NMR studies, the atoms with isotopic compositions that are often altered are hydrogen, nitrogen and carbon, and 2 H, 13 C and 15 N nuclei are commonly used for this purpose. The aims of isotope labelling can be categorised as follows: a. Mitigation of NMR peak overlap For biological macromolecules, a huge number of 1 H resonances that are prone to overlap each other can be observed in a resolved manner by using hetero-nuclear correlation methods, such as 1 H -15 N and 1 H -13 C HSQC spectra. The substitution of 1 H atoms by 2 H further reduces the overlap of 1 H resonances. b. Enhancement of signal-to-noise ratio Replacements of 1 H by 2 H lead to reduced dipole and scalar interactions between nearby atoms. Enrichment with 13 C and 15 N is useful for the direct detection of the enriched carbon and nitrogen atoms. c. Resonance assignment Isotopic enrichment of nitrogen and carbon makes it possible to perform sequential backbone and side-chain resonance assignments that utilise the one-bond or two-bond scalar couplings. Protein NMR Spectroscopy: Practical Techniques and Applications, First Edition. Edited by Lu-Yun Lian and Gordon Roberts. 2011 John Wiley & Sons, Ltd. Published 2011 by John Wiley & Sons, Ltd.
24
Protein NMR Spectroscopy
d. Obtaining structural information The NOEs between 1 H atoms, the secondary chemical shifts, the residual dipolar couplings and the presence of hydrogen bonds can be investigated by using 13 C and/or 15 N labelled proteins. 2 H labelling also effectively reduces spin diffusion, as well as dipolar and scalar couplings. e. Spin relaxation analysis Dynamic information about proteins can be obtained by the individual analysis of the relaxation properties of 13 C and 15 N atoms. There are two types of isotopic labelling: uniform and selective. In uniformly labelled proteins, all of the atoms in the target proteins are labelled without any selectivity, which is relatively easy to accomplish. On the other hand, in the case of selective labelling, a selected position is isotopically labelled, which requires special techniques in some cases. Several methods are now available for the production of these isotopically labelled proteins. In this chapter, we will first outline the production method for an isotopically labelled protein. Uniform and selective labelling techniques will then be discussed. Finally, we will describe the stereo-array isotope labelling (SAIL) method. In addition to this chapter, there are several useful reviews on isotope labelling of proteins for NMR [1–6].
2.2 2.2.1
Production Methods for Isotopically Labelled Proteins Recombinant Protein Expression in Living Organisms
2.2.1.1 Escherichia coli By virtue of the molecular cloning techniques developed in the late 1980s and ’90s, target proteins can readily be produced by using specific host organisms. By growing cells transformed with DNA encoding a target gene, the target protein is efficiently produced in the host cells. Amongst the many systems available for heterologous protein production, the Gram-negative bacterium Escherichia coli (E. coli) is the most widely used host for the production of recombinant proteins [7,8]. E. coli can synthesise 20 amino acids from a simple carbon source and nitrogen salt, and thus can grow on minimal media containing limited kinds of nutrients. Therefore, by supplying isotope-labelled nutrients, the E. coli cells can produce a large quantity of an isotope-labelled target protein. One frequently employed system involves the utilisation of a bacteriophage-derived T7 promoter system [9], in which a plasmid encoding a target protein, under the control of the T7 promoter, is transformed into an E. coli strain, such as a K12 or B strain. The advantage of the E. coli expression system over those of other host organisms is that E. coli can grow rapidly and produce large amounts of heterologous proteins. In addition, its metabolic and catabolic pathways for amino acids have been well characterised, and mutations that block key steps are readily available. At present, the use of E. coli is usually considered first in an attempt to express a new protein, unless posttranslational modifications, such as glycosylation, are required for biological activity. However, overexpression of recombinant proteins often results in the formation of insoluble protein inclusion bodies,
Isotope Labelling
25
which are composed of densely packed, denatured proteins in the form of particles [10]. This problem is especially vexatious for proteins with numerous disulfide bonds. If the protein within the inclusion body can be refolded into a functional form, then the insolubility is not a major problem. However, if the refolding is impossible, then slowing the protein expression rate by growing the cells at lower temperatures is worth considering. The expression levels vary for different genes due to many factors, such as mRNA stability, codon bias, protein degradation, and folding efficiency. To circumvent these problems, the optimisation of the DNA sequence (codon usage), the deletion of genes encoding proteases, and the addition of chaperones can be considered. 2.2.1.2 Yeast Cells A major problem encountered in the use of an E. coli expression system is difficulties in the production of functional proteins with numerous disulfide-bonds and/or posttranslational modifications. As an alternative host organism to E. coli, the yeast Pichia pastoris (P. pastoris) has gained popularity [11,12]. The advantage of P. pastoris is that it secretes heterologous proteins into the medium when a specific expression vector is used, which facilitates their purification. In addition, correct disulfide bonds are more likely to be formed, as compared to the E. coli production system. Quite recently, an expression system using the yeast hemiascomycete Kluyveromyces lactis (K. lactis) has been reported [13,14]. One of the major differences between P. pastoris and K. lactis is the promoters used for expression of the target gene. Target protein expression in P. pastoris and K. lactis is induced by adding methanol and galactose, respectively. The target proteins expressed by the yeast cells are modified by a heterogeneous, high-mannose glycan. In some cases, digestion of the attached glycan by a glycosidase is additionally needed to achieve homogeneity of the sample, unless the glycan is involved in some functional aspect. 2.2.1.3 Other Host Cells Baculovirus-infected cells are also regarded as a potentially important expression system. The expression of rhodopsin labelled with specific amino acids by using baculovirusinfected Spodoptera frugiperda (Sf9) cells has been reported [15,16]. Amino acid-type selective labelling of the catalytic domain of c-Abl kinase in Sf9 cells has also been achieved [17]. To accomplish correct folding and posttranslational modification of target proteins, the use of mammalian cells, such as Chinese hamster ovary (CHO) cells, is also a promising approach [18,19]. As compared to E. coli systems, the production of the target protein is difficult and expensive. In general, mammalian cells do not grow on media used for bacteria, and they require amino acids, vitamins, cofactors, and in most cases, serum. Hence, well-defined expression media supplemented with isotope-labelled amino acids are commonly used. 2.2.2
Cell-Free Synthesis
In some cases, using E. coli cells and yeast as protein production systems presents some difficulties. For instance, the expression of a protein toxic to the host cells will not be tolerated. Cross-labelling of amino acids also occurs for specific amino acids, which leads
Protein NMR Spectroscopy
26
to the inefficient and incorrect incorporation of the label into the target protein. Recently, cell-free synthesis systems have drawn intense interest as an alternative method for producing isotope-labelled proteins [20–24]. In cell-free synthesis, the protein production is carried out in a vessel, in which the protein synthesis system is reconstituted. The E. coli cell-free protein production system utilises an E. coli extract, which contains the protein synthesis machinery. The advantage of the E. coli cell-free system is that the metabolic conversion that is present in living cells is strongly suppressed, which enables the selective labelling of target proteins for almost all amino acids. In addition, the incorporation rate of the labelled amino acids is much higher, as compared to the in vivo system, which is especially important for the SAIL method [25]. The expression of trans-membrane regions in membrane proteins by a cell-free system is also possible, by adding various detergents into the cell-free reaction mixture [26]. The E. coli extract is commercially available from several companies, and can also be prepared in the laboratory [24]. Cell-free synthesis using wheat germ extracts has gained attention as a promising approach for NMR sample production [27,28]. The wheat germ cell-free system is considered to be RNase-free and suitable for the expression of large proteins. Aminoacid-selective labelling with the use of the wheat germ cell-free system has been reported [29]. To our knowledge, the preparation of the wheat germ extract seems to be more difficult, as compared to the E. coli extract. The wheat germ extract for cell-free protein synthesis is commercially available from Cell-Free Sciences Co. (http://www. cfsciences.com/). In the production of isotope labelled proteins, it is crucial that unlabelled amino acids are not included in the S30 extract. Otherwise, the employed isotopically labelled amino acids are diluted. The protocol for the preparation of the E. coli S30 extract with minimal unlabelled amino acids is described in Protocol 1.
Protocol 1: Preparation of the Amino Acid Free S30 Extract 1. Inoculate stock of E. coli (A19, BL21 Star etc.) into 10 ml of LB medium in a 50 ml tube and grow the cells overnight at 37 C with shaking. 2. Inoculate 10 ml of the culture medium into 1 l of incomplete rich medium in a 2-litre flask. .
Incomplete rich medium For 1 l, combine the following: 5.6 g KH2PO4, 28.9 g K2HPO4, 1 g Bacto yeast extract, 1.5 mg thiamine Autoclave and add the following: 50 ml 40 % (w/v) D-glucose, 10 ml 0.1 M Mg(OAc)2
3. Grow the cells at 37 C with shaking to an OD650 of 0.7. The growth rate of the cells should be monitored, since it correlates with the activity of the resulting extract.
Isotope Labelling
27
4. Centrifuge the cells (5000 g, 4 C, 10 min) and wash them with 200 ml of ice-cold S30 buffer containing 0.05 % 2-mercaptoethanol by gentle resuspension three times. Do not allow foaming of the suspension. .
S30 buffer For 1 l, combine the following: 10 ml 1 M Tris-acetate (pH 8.2), 10 ml 1.4 M Mg(OAc)2, 10 ml 6 M KOAc, 1 ml 1 M DTT (add after autoclaving)
5. Gently resuspend the cell pellet with 200 ml of ice-cold S30 buffer containing 0.05 % (v/v) 2-mercaptoethanol. Centrifuge the suspension (5000 g, 4 C, 10 min) and weigh the E. coli pellets. Resuspend the pellet in 1.27 ml of S30 buffer per gram of E. coli. 6. Disrupt the cells with a French Press at 20 000 psi (1400 kg cm2). Add 30 ml of 1 M DTT to the lysate immediately after the disruption of the cells. Centrifuge the lysate (30 000 g, 4 C, 30 min) using RNase-free centrifuge tubes. Carefully remove approximately 1.4 ml of the supernatant per gram of E. coli, without mixing with the precipitate. 7. Transfer the supernatant to RNase-free tubes. Centrifuge them (30 000 g, 4 C, 30 min) and remove approximately 1.0 ml of the supernatant per gram of E. coli into a 50 ml tube. 8. Shake the tube at 37 C for 80 min. 9. Dialyse the solution at 4 C for 45 min against 2 l of S30 buffer using a dialysis tube with a MWCO of 6000–8000. Allow a little air into the tube to let it float. Repeat the dialysis twice, and then centrifuge the solution (15 000 g, 10 min, 4 C) and collect the supernatant. 10. Uniformly fill an open column (Econo-column chromatography column, 2.5 20 cm) with Sephadex G25 resin and place the column vertically in a cold space (4 C). Attach an Econo-column funnel to the top end of the column. Pour 500 ml of S30 buffer through the funnel into the column. 11. Apply the supernatant from step 9 to the column that was pre-equilibrated at 4 C in step 10. After loading the supernatant, continue to supply the funnel with the S30 buffer to maintain the flow in the column. When the first fraction reaches to the bottom, start to collect 1.4 times the volume of the applied extract. Determine the first fraction by judging from its colour (yellow) and turbidity. 12. Dialyse the eluate at 4 C against 700 ml of an equal weight mixture of PEG-8000 and S30 buffer. Before use, the PEG-S30 buffer (at 4 C) should always be stirred to avoid PEG deposition. Adjust the dialysis time so as to concentrate the extract up to 0.86 times the volume. Dialyse it at 4 C for 60 min against 2 l of S30 buffer. 13. Transfer the extract to 1.5 ml tubes. Freeze the tubes in liquid nitrogen. Store them at 80 C. The activity of the produced S30 extract should be evaluated by performing cell-free reactions of some proteins on a small-scale (Protocol 2). Empirically, the prepared S30 extract can be stored at 80 C at least for several months.
Protein NMR Spectroscopy
28
Protocol 2: Cell-Free Reaction on a Small Scale 1. Prepare the reaction solution and the dialysis solution by mixing the components as follows: Stock solution
Reaction solution
Dialysis solution
RNase-free water 1.4 M NH4OAc 0.5 M Mg(OAc)2 Mixture of 20 unlabelled amino acids (1 mM each) 0.645 M creatine phosphate LM mixturea 1 mg/ml template DNA 11 mg/ml T7 RNA polymerase 40 units/ml RNase inhibitor 10 mg/ml creatine kinase S30 extract Total volume
123.2 ml 9.8 ml 15 ml 20 ml 40 ml 125 ml 10 ml 4.5 ml 1.25 ml 12.5 ml 150 ml 0.5 ml
1160.8 ml 39.2 ml 60 ml 80 ml 160 ml 500 ml — — — — — 2 ml
a
LM mixture. For 200 ml, combine the following: 22 ml 2 M HEPES-KOH pH 7.5, 33.4 ml 6 M KOAc, 210 mg DTT, 530 mg ATP, 338 mg CTP, 335 mg GTP, 310 mg UTP, 172 mg cAMP, 28 mg folinic acid, 140 mg tRNA, 64 ml 50% (w/v) PEG8000, RNase-free water, up to 200 ml. The prepared LM mixture can be frozen at 20 C for one month or more. For SDS-PAGE analysis of the cell-free reactants, the PEG-8000, which hampers SDS-PAGE analysis, should be removed by ethanol precipitation before the addition of sample dye to the reaction solution.
Thaw the frozen S30 extract on ice. Prepare the creatine phosphate in RNase-free water just prior to use. Heating the amino acid mix up to 60 C is effective to dissolve the amino acids. Excessive heating of SAIL amino acids may cause racemisation, especially at high pH. 2. Pour the dialysis solution into the outer tube. Place the inner membrane apparatus of the Float-A-Lyzer inside the outer tube, and pour the reaction solution into the inner membrane. 3. Shake the tube to allow for the production of target proteins. The optimal temperature and incubation times should be determined with small-scale cell-free reactions. 4. Retrieve the reaction solution and the dialysis solution. If the protein produced has a molecular weight smaller than molecular weight cut-off of the membrane, check the outer solution for the presence of the protein.
The yield of the cell-free synthesis varies depending on different factors, including the sequence of the target gene, the expression plasmid, the buffer conditions, the amount of
Isotope Labelling
29
amino acids used, and the temperature. At least, pilot experiments should be performed with small-scale reactions by using unlabelled amino acids. It seems that the protein production level in the cell-free synthesis is correlated with that in the in vivo expression to some extent. In addition, the incubation times and the temperature affect the expression level of the target proteins. It is especially worthwhile to optimise the magnesium concentration. The utilisation of a special expression vector optimised for cell-free synthesis and the introduction of silent mutations into the target DNA sequence are also worth considering [24]. After optimisation of conditions for the cell-free reaction, uniformly 15 N or 13 C/15 N labelled proteins are commonly produced in a large-scale reaction. For cost control, a commercially available amino acid mixture, such as from an algal source, enriched with 15 N and/or 13 C is often used. However, the composition of each amino acid should be checked prior to use. In our experience, glutamine, asparagine, cysteine and tryptophan residues are missing in the case of an algal lysate. Although glutamine and asparagine are not needed, due to their formation from glutamate/aspartate and ammonium ion mediated by a transaminase, tryptophan and/or cysteine should be supplemented if these residues are present in the amino acid sequence of the target protein. For a comparison between cell-free and in vivo expression, preparation of the target protein by in vivo expression is also highly recommended. Their 1 H-15 N HSQC spectra should be compared carefully. It should be emphasised that the processing of the N-terminus by peptide deformylase is likely to be incomplete in the case of E. coli cellfree expression, thus producing an inhomogeneous sample. In such a case, the residues close to the N-terminus produce doubled resonances, which correspond to the formylated and deformylated forms. One method to overcome this problem is to use a cleavable N-terminal tag. The N-terminal tag can also be used to increase the expression level [24].
2.3 2.3.1
Uniform Isotope Labelling of Proteins Uniform 15N Labelling
An NMR study of a new protein often starts with the preparation of the uniformly 15 N labelled protein. The use of uniformly 15 N labelled proteins is suitable for initial characterisation, due to its low cost. The uniformly 15 N labelled proteins are usually produced by growing E. coli cells transformed with the target DNA on minimal medium (M9) that contains 15 N labelled ammonium chloride (15 NH4 Cl) as the sole nitrogen source [1]. The quality of the NMR sample is evaluated based on the dispersion of peaks, the number of peaks and the uniformity of the peak intensities. Good-quality NMR spectra promise the success of further studies, such as NMR structure determination, dynamic studies, and interaction analyses. If the quality of the NMR spectra is not sufficient, then the sample conditions, such as buffer composition, temperature and concentration, and the construct should be optimised (see Chapter 1).
Protein NMR Spectroscopy
30
2.3.2
Uniform 13C,
15
N Labelling
13
The C enrichment of a target protein further expands the scope of NMR experiments. Uniformly 13 C/15 N labelled proteins are commonly produced by growing E. coli cells in minimal M9 medium containing a 15 N salt and, in addition, a 13 C labelled precursor. Although E. coli cells can utilise several kinds of precursors, including glucose, pyruvate, acetate, succinate and glycerol, as sole carbon sources; glucose is commonly used, since it facilitates a high expression level [4]. The 13 C-enrichment enables a variety of heteronuclear multidimensional experiments involving the backbone and side-chain carbon atoms [30–32]. The secondary chemical shifts of the 13 Ca, 13 Cb and 13 C carbonyl carbons also provide information about secondary structure elements. Some excellent protocols for secondary structure prediction have been developed [33,34]. Recently, a strategy to determine the tertiary structures of proteins, using a limited number of NOE-derived constraints and chemical shifts, has been reported [35]. Although one-bond 13 C-13 C couplings are useful for the assignment of the side-chains in proteins, they complicate relaxation studies of 13 C nuclei. In this context, a protein possessing an alternate 13 C-12 C labelling pattern in the side chains, which can be prepared by growing E. coli in medium containing a combination of either [2-13 C] glycerol and NaH12 CO3 or [1; 3-13 C] glycerol and NaH12 CO3 as the carbon source, is useful in this respect [36]. This approach was applied for the relaxation study of thioredoxin, with the concomitant use of 50 % random fractional deuteration. A notable example of this isotope labelling strategy was the structure determination of an SH3 domain by solid-state NMR [37]. As described above, 13 C-12 C alternate labelling is also useful for backbone assignment [38]. 2.3.3
2
H Labelling
Protein deuteration has long been regarded as a key method to study proteins by NMR [39–41]. The magnetogyric ratio of 2 H is 6.5 times lower than that of 1 H, and thus the substitution of 1 H to 2 H mitigates the unwanted dipolar and scalar couplings in proteins. Theoretically, the substitution of 1 H to 2 H at the a position is expected to lead to a longer T2 relaxation time (about 12-fold for a 50 kDa protein) for the a carbon. Deuteration also removes the scalar coupling and reduces the spin diffusion. Based on the level of deuteration, the 2 H labelling schemes can be classified into two groups: full deuteration, termed ‘perdeuteration’ [42] and random fractional deuteration (the deuteration level is around 50–90 %) which will be described in the following subsection. Deuterated proteins are commonly produced by growing E. coli cells on minimal medium (M9) containing 2 H2 O and either protonated or deuterated carbon sources. While E. coli cells can tolerate 2 H2 O, culturing them in 2 H2 O medium involves a significant reduction in growth rate and yield, due to the deuterium isotope effects. To overcome this problem, several protocols for deuteration have been reported [43–45]. The production method for the deuterated protein is selected based on the intended level of deuteration. When a deuteration level of up to 75–80 % is required, one easy method is to grow the E. coli cells on H2O medium to a high cell density, resuspend the isolated cells in medium containing the intended ratio of 2 H2 O, and then induce the protein expression [4,46]. To achieve higher levels of deuteration, growing E. coli cells adapted to 2 H2 O medium is necessary.
Isotope Labelling
31
The adaptation process involves increasing the ratio of 2 H2 O to 1 H2 O gradually during the culture of E. coli cells. Step-by-step procedures for the deuteration have been well documented in other reviews [43,47]. Protonated glucose is often used in the case of random fractional deuteration. The use of acetate as a carbon source has also been reported [48]. When acetate is used, the deuteration level of the protein linearly correlates with the 2 H2 O ratio in the culture medium [49]. The culture of E. coli cells in 2 H2 Ocontaining medium reportedly affects carbon metabolism [50]. However, the use of glucose presently appears to be the most reliable way to obtain a high yield, as compared to other carbon sources. The addition of a mixture of deuterated amino acids into the culture medium is effective to enhance the growth of E. coli cells and also to increase the expression levels of target proteins. One problem often encountered in the analysis of deuterated proteins is that the amide groups embedded in the core of a protein molecule are highly protected from the solvent. The back-exchange of such amide groups can often be accomplished by denaturing the deuterated protein in 1 H2 O, provided that it can be refolded. On the other hand, when 1 H2 O medium is used, the a and some b protons are partially protonated, even when fully deuterated amino acids are used [51]. Recently, P. pastoris has also been successfully used to prepare deuterated proteins [52]. Deuteration expands the size of proteins amenable to NMR analyses. In a perdeuterated protein, sequential backbone assignments can be accomplished in the absence of the relaxation effect derived from aliphatic protons. Of course, the assignment of the side-chain protons is no longer possible in completely deuterated proteins, although that of the sidechain 13 Cs can still be performed by using a 13 C-originating pulse scheme [53]. For the observation of 13 C atoms attached to 2 H atoms, 2 H decoupling is often required to reduce the line-widths of the deuterated 13 C signals. Irradiation with an RF field sufficiently stronger than the inverse 2 H T1 removes the broadening due to the residual scalar interactions between 2 H and 13 C, and thus results in a much narrower 13 C linewidth, relative to that of the protonated 13 C [54]. The complete absence of aliphatic protons also minimises the effects of spin-diffusion, thus facilitating the observation of long distance NOEs. The problem is that the available NOEs are limited to 1 HN -1 HN pairs. Based on simulations by Venters and coworkers for human carbonic anhydrase II and human profilin, both the backbone and side chain 1 HN -derived NOEs are needed, and a distance constraint greater than 6 A is required to obtain a protein structure with reasonable quality [55]. Another important application of a perdeuterated protein is the cross-saturation experiment and its modified version, the transferred cross-saturation experiment [56–58]. These methods are applicable to a wide variety of biologically relevant interactions [59–61]. A number of these applications of perdeuteration are described in other chapters of this volume. The TROSY (transverse relaxation optimised spectroscopy) technique is often used when analysing large deuterated proteins [62–64]. This method exploits the destructive interference between distinct relaxation mechanisms. For instance, the interference between the 1 H-15 N dipolar interaction and the 15 N chemical shift anisotropy results in distinct linewidths for the two 1 H-coupled 15 N resonances. The cross correlation effect can divide the coherence into two classes: one that is fast-relaxing, due to constructive interference between the cross-correlated relaxations, and another that is slowly relaxing, due to destructive interference. In the TROSY approach, a sharper line is selectively observed. The development of the TROSY method has significantly increased the upper limit of molecular weights amenable to NMR studies [65]. TROSY was originally applied to modify
32
Protein NMR Spectroscopy
the duration of the chemical shift encoding. The TROSYapproach was subsequently utilised in the polarisation transfer step, which is known as CRIPT (cross-correlated relaxation induced polarisation transfer) and CRINEPT (cross-correlated relaxation induced INEPT) [63]. To date, the methods for observing sharper lines arising from the crosscorrelation of different relaxation mechanisms have been applied to amide 1 H-15 N moieties [62], aromatic 1 H-13 C moieties [66] and methyl groups [67]. If side-chain nonexchangeable protons must be observed, then random fractional deuteration may be used for this purpose, as demonstrated for a variety of proteins, such as thioredoxin and staphylococcal nuclease [68,69]. Random fractional deuteration compromises between the reduction of the 1 H NMR information and the reduced line-width of the remaining 1 H resonances. A deuteration level of 50 % has been recommended for structure determination [70]. In combination with methyl-selective protonation, which will be described in the next section, the global fold of a large protein can be determined [71]. A serious problem encountered in the employment of random fractionally deuterated proteins is the presence of multiple isotopomers in the methylene and methyl groups. The chemical shifts in the deuterated sample are slightly different from those of the fully protonated sample, due to deuterium-induced isotope shifts. The substitution of 1 H by 2 H induces upfield chemical shift changes of the carbons separated by one to three chemical bonds from the substituted atom. In the case of methylene groups, there are four isotopomers: CD2, CHRDS, CDRHS and CH2. With the use of a special editing technique, the coherence for specific isotopomers can be eliminated, which enables the relaxation analysis using a specific isotopomer, such as CH2D in methyl and CHD in methylene groups [72–74].
2.4 2.4.1
Selective Isotope Labelling of Proteins Amino Acid Type-Selective Labelling
Amino acid type-selective 15 N labelling in proteins is useful for the reliable identification of a specific amino-acid type for resonances observed in 1 H-15 N HSQC spectra. Although the well-established sequential assignment method using a uniformly 13 C/15 N labelled protein is a powerful tool (see Chapter 3), it is desirable to confirm the assignment by different approaches. In the amino acid type-selective 15 N labelled protein, the nitrogen atoms in a specific amino acid are selectively enriched with 15 N. In its 1 H-15 N HSQC spectrum, therefore, the resonances originating from this selected residue type are selectively observed, which enables the reliable assignment of the amino acid type for the peaks. The amino acid-selective 15 N-labelled protein can be produced by using E. coli cells for specific residues, and by the cell-free synthesis system for a wide range of amino acid residues [21]. When E. coli cells are used to produce the 15 N-selectively labelled protein, two potential problems are isotopic dilution and cross-labelling, caused by various metabolic pathways. Provided that a selected amino acid is located downstream of the biosynthetic pathway, the selective enrichment can be achieved by adding the labelled amino acid to the unlabelled amino acid pool just prior to the induction of protein expression [75]. However, this approach does not work well in the cases where the amino acids are key intermediates in amino acid metabolic pathways, such as with Gly, Ser, Asx
Isotope Labelling
33
and Glx. To ensure the controlled incorporation of an isotope labelled amino acid into a target protein, auxotrophic strains, such as E. coli with lesions in the appropriate amino acid metabolic pathways, are commonly employed [76,77]. For instance, by using E. coli deficient in the biosynthesis of shikimic acid, the selective labelling of aromatic amino acids (Phe, Tyr and Trp) can be accomplished [78]. Many different auxotrophic strains of E. coli are available from a stock centre (http://cgsc.biology.yale.edu/). The addition of a large amount of unlabelled amino acids is also known to suppress metabolic conversion effectively in E. coli cells. As an alternative approach, a cell-free production system is suitable for the amino acid selective labelling method. In the cell-free reaction system, the interconversion of amino acids is suppressed to a large extent. However, isotope-labelled amino acids are generally costly, as compared to isotope labelled carbon sources and ammonium salts. The amino acid selective 14 N reverse labelling of a protein has been reported, whose cost is reasonable as compared to its counterpart [79]. In this method, specific amino acids are added in their unlabelled forms into M9 medium containing 15 NH4 Cl. Subsequently, in the produced protein, some peaks are missing or weakened as compared to those in the uniformly 15 N labelled sample. Specific 13 C amino acid type-selective labelling of proteins is also useful when combined with 13 C direct detection. Recent advances in cryogenic probe technology have enabled the direct observation of carbon atoms with high sensitivity even in large proteins, thus encouraging researchers to revisit this traditional technique [80,81]. The direct observation of 13 C nuclei has some advantages over that of 1 H nuclei. For instance, the 13 C nuclei are less affected by paramagnetic relaxation as compared to the 1 H nuclei, which enables the detection of atoms close to the paramagnetic centre in analyses of metal-binding proteins [82]. In the case of an NMR study of Streptomyces subtilisin inhibitor (SSI), amino acid selective labelling for the 13 C carbonyl carbons has been employed [83,84]. The characteristics of the carbonyl carbon observations are that the CSA of the carbonyl carbon is large, which induces line broadening for higher magnetic fields. Conversely, the carbon detection can be accomplished by performing the NMR experiment at a lower magnetic field. Recently, deuterium-induced isotope shifts for carbonyl carbons were used to monitor the amide proton exchange rates of the residues embedded in the hydrophobic core of a globular protein, leading to investigation of the detailed mechanism of the dynamic fluctuation of a five-stranded b-sheet in SSI [84,85]. The amino acid type selective 13 C carbonyl– and 15 N-double labelling method is a powerful tool for reliable resonance assignment. This method utilises proteins in which the main chain carbonyl carbons of a specific amino acid are labelled with 13 C, and the amide nitrogens of another kind of amino acid are labelled with 15 N. The NMR signals of amino acids that possess a 15 N -13 CO linkage are extracted on the basis of 13 C-15 N spin couplings. This method was first demonstrated for the assignment of the methionyl carbonyl carbons within SSI (Streptomyces Subtilisin Inhibitor). SSI contains three Met residues, at positions 70, 73 and 103. Their succeeding residues are Cys71, Val74 and Asn104, respectively. In this case, two SSI samples were prepared: SSI doubly labelled with [1-13 C]-Met and 15 N-Val and that doubly labelled with [1-13 C]-Met and 15 N-Cys. In the 1D 13 C spectra for the two samples, three peaks from the methionyl carbonyl carbons were observed. Based on the scalar coupling with directly coupled 15 N in the succeeding residue (15 Hz), the peak assignments were readily obtained [83]. The experimental procedures for this approach
34
Protein NMR Spectroscopy
have been described earlier [1]. The 13 C-carbonyl, 15 N double labelling approach can be effectively applied for very large proteins. Amino acid type identification of peaks in 1 H -15 N HSQC spectra can be achieved by using 19 different samples, where each of the 19 nonproline residues is selectively 15 Nlabelled [86], although this is obviously time-consuming and costly. The same information can be obtained with much less effort by the use of combinatorial selective labelling (CSL) [79,87–89]. In this approach, several amino acids are simultaneously 15 N or 13 C carbonyl-labelled. The idea of combinatorial amino acid labelling originated within 14 N reverse labelling [79], in which a 14 N labelled amino acid mixture was added to M9 medium containing 15 NH4 Cl. Parker and coworkers developed a CSL method capable of yielding large numbers of residue-type and sequence-specific backbone amide assignments, which involves comparing the cross-peak intensities in 1 H-15 N HSQC and 1 H-15 N HNCO spectra collected for five samples containing different combinations of 13 C- and 15 N-labelled amino acids [87,88]. The important consideration in this approach involves the problems of cross-labelling. To avoid this, the use of a cell-free system expands the applicability of this method. 2.4.2
Reverse Labelling
As mentioned above, the depletion of a specific isotope in a selected atom, as well as selective isotopic enrichment, is an important concept in stable isotope labelling approaches. In some cases, for example, specific protonation of the selected residues in a fully deuterated background is effective for elucidating the structure. Amino acid-type selective labelling often involves the addition of protonated 15 N- or 15 N, 13 C-labelled amino acids to the deuterated expression medium [90,91]. The procedure for the accomplishment of residue-selective labelling in a deuterium background has been well documented [44]. By combining deuteration with selective 1 H-, 13 C- and 15 N-labelling of a limited number of amino acid residues, a sufficient number of NOEs can be identified to determine the global folds of large proteins. The utility of this procedure has been demonstrated for the structure determination of the 25 kDa tryptophan repressor from E. coli [92,93]. Although the deuteration of nonexchangeable protons in proteins improves the quality of the NH-based triple resonance experiments employed for backbone assignments [94–96], the selective protonation of the a position in the deuterium background is useful for the backbone assignments in some cases, because the Ha atom is used as the starting point of the triple resonance experiment [97]. The Ha-based multidimensional experiment has an advantage over NH-based experiments, in that it can be performed under conditions with 100 % 2 H2 O buffer. The quality of both the NH- and Ha-based experiments for backbone assignment depends largely on the line-width of the 13 Ca resonance. In this regard, the presence of 13 Ca-13 Cb coupling (40 Hz) is unfavourable for these experiments. A commercially available 2 H, 13 C and 15 N labelled amino acid mixture can be backprotonated at the a position by using a simple chemical reaction [98]. The b subunit of human chorionic gonadotropin has been prepared by CHO cells cultured in a medium containing amino acids labelled only in the backbone (N, Ca, Ha, C0 ) atoms [99]. Since mammalian cells require nearly all naturally occurring amino acids, no amino acid scrambling would be expected. The backbone 13 C, 15 N, (50%) 2 H-labelled leucine,
Isotope Labelling
35
phenylalanine and valine were synthesised and utilised. For the selective 13 Ca enrichment, glycerol can be used with 12 C-13 C alternate labelling [74]. By growing E. coli cells on deuterium-containing minimal medium, using either combinations of [2-13 C] glycerol and NaH13 CO3 or [1; 3-13 C] glycerol and NaH13 CO3 as carbon sources, proteins with Ca atoms enriched with 13 C and carbonyl and Cb atoms with 12 C are obtained. When [2-13 C] glycerol is used, the Ca positions of Ala, Cys, Phe, Gly, His, Lys, Ser, Val, Trp and Tyr residues are efficiently enriched with 13 C. On the other hand, when [1; 3-13 C] glycerol is used, those of Glu, Leu, Gln, Pro and Arg residues are enriched by 13 C. The Ca atoms of Asp, Ile, Met, Asn and Thr are partially enriched in both cases. In this labelling pattern, the T2 relaxation is minimised due to the absence of 13 Cb, 1 Ha and 13 C0 nuclei. With the use of the 13 C direct observation technique, the backbone assignment has been accomplished via 15 N-13 Ca [38]. As compared to 13 C-attached protons, 12 C-attached protons yield sharper peaks, due to the absence of dipolar relaxation. If one prepares a protein that selectively contains unlabelled (12 C) amino acids, while others are uniformly 13 C labelled, then the NOEs between the 12 C-amino acid residues and 13 C-labelled amino acids can be observed by using isotope-editing and -filtering techniques. The phenylalanine aromatic resonances of the DNA binding domain of Drosophila heat shock factor were assigned by this method [100]. This type of approach has been widely used to study many proteins, such as calcium-free calmodulin [101], the 24 kDa Dbl homology domain [102], and the 25 kDa anti-apoptotic protein, Bcl-xL [103]. Side-chain methyl groups are valuable probes in NMR studies of the structures and dynamics of proteins [72,104,105]. Recent advances in labelling techniques have made it possible to introduce 1 H and 13 C only into methyl positions, which are otherwise deuterated [71,106,107]. The merit of this type of methyl labelling is its applicability to large proteins. For the production of the methyl-protonated proteins, an isotope labelled precursor is commonly employed. Perdeuterated proteins, in which the methyl groups of Ile (g2 only), Leu, Val and Ala are selectively protonated, can be produced by adding protonated pyruvate to a 2 H2 O-based medium [106]. The problem with the use of pyruvate is that the methyl group contains isotopomers: 13 CH3 , 13 CHD2 , 13 CH2 D and 13 CD3 [106]. The 13 C and 1 H chemical shifts of these isotopomers, except for CD3, are slightly different, due to 2 H isotope shifts of 0.3 and -0.02 ppm per deuteration [108]. Hence, a-ketobutyrate (for Ile (d1)) and a-ketoisovalerate (for Leu, Val) are now commonly used for the methyl labelling. With the use of commercially available ketobutyrates/ketoisovalerates with 13 CHD2 , 13 CH2 D labelling, in addition to the 13 CH3 -based precursors, one can now produce samples that are uniform in the isotopomers of choice [75,109]. The methyl-protonation of Ile, Leu, and Val residues by using precursors has been well established. Quite recently, the methyl selective labelling of alanine residues has also been reported [110], using methyl-protonated Ala instead of the precursor. In a protein methyl-protonated in a deuterium background, the global fold can be determined by the use of a set of methyl-methyl, methyl-NH, and NH-NH distance restraints, as demonstrated for the C-terminal SH2 domain of phospholipase Cg1 [71]. The three methyl isotopomers (13 CH3 , 13 CHD2 and 13 CH2 D) have distinct relaxation properties, and the corresponding 2D correlation pulse schemes that yield the best resolution differences between them have been reported [108]. The properties of the three isotopomers are also distinct. In the case of the 13 CH3 methyl isotopomer, by using
36
Protein NMR Spectroscopy
an HMQC pulse scheme, the methyl groups of an extremely large protein complex, such as the 670 kDa 20S proteasome, can be observed [111]. In the case of the 13 CHD2 isotopomer, an 1 H relaxation analysis can be performed [112], while in the case of the 13 CH2 D isotopomer, a 2 H relaxation analysis is performed. Due to the poor deuterium chemical shift dispersion and the rapid decay of the deuterium magnetisation, the relaxation times were obtained indirectly, as those of the triple spin terms, IzCzDz or IzCzDy. The picosecond to nanosecond time scale dynamics of the methyl-containing side-chains have been studied by using a 2 H-based NMR relaxation approach. This method has been applied to analyse the dynamic behaviours of the methyl side-chains of the C-terminal SH2 domain of phospholipase C-g1 [73]. The assignment of the methyl signals can be achieved by utilising the side-chain 13 C-13 C connectivity [113,114]. In the case of the 20S proteasome, the assignments of the methyl groups were transferred from those obtained in the state of the 21 kDa a monomer, and then the remaining signals were assigned by introducing mutations [111]. 2.4.3
Stereo-Selective Labelling
The stereospecific assignment of diastereotopic groups in Leu and Val residues is essential for defining the precise orientations of the isopropyl groups of Val and Leu residues [115]. Biosynthetic fractional 13 C labelling can be used for the unambiguous stereospecific assignment of diastereotopic methyl groups in Val and Leu residues [116–118]. In this method, a mixture of roughly 90 % [12 C6 ]-glucose and 10 % uniformly [13 C6 ]-glucose is used as the sole carbon source. This method exploits the fact that the biosynthesis of Val and Leu residues from glucose is stereo-selective. The isopropyl group of valine and leucine is composed of two pyruvate molecules originating from glucose. In this isopropyl group, the pro-R methyl group (d1 in Leu and g 1 in Val) and the adjacent carbon atom (g in Leu and b in Val) originate from the same pyruvate molecule, and the isotopic composition of the two groups becomes the same. On the other hand, the pro-S methyl group (d 2 in Leu and g 2 in Val) and the adjacent carbon atom originate from two different pyruvate molecules. Therefore, when 90 % [12 C6 ]-glucose and 10 % uniformly [13 C6 ]-glucose are used as precursors, the 13 C atoms located in the pro-R methyl group are always directly bonded to 13 C, and meanwhile, a large proportion (90 %) of the 13 C atoms located in the pro-S methyl group are connected to the 12 C atom. Based on the presence of 13 C-13 C J coupling, the stereospecific assignment of the isopropyl methyl group can readily be performed. If the observation of the peaks of the diastereotopic methyl groups is hampered by severe overlaps with the methyl groups from other amino acids, such as Ile, Lys and Thr, then unlabelling of the problematic amino is useful for overcoming this problem [118]. The use of ‘block’-13 C-labelled Val and Leu is a straightforward approach for the precise stereospecific assignment of their prochiral methyl groups, and this has been applied to the analysis of cystatin A and phosphoprotein CPI-17 [119,120]. Stereoselective deuteration of one methyl group in Leu residues has also been demonstrated for L. casei dihydrofolate reductase [121]. The stereospecific assignment of methylene protons is also of considerable importance. In this case, an amino acid in which one 1 H atom in the methylene group is stereo-selectively substituted by 2 H is incorporated into the target protein [122,123]. Staphylococcal nuclease H124L, in which the Gly residues were labelled with [2 HR ; 2-13 C] Gly or [2 HS ; 2-13 C] Gly,
Isotope Labelling
37
was prepared for the NMR analysis [124]. The linewidth improvement by deuteration is more prominent for the deuteration within an amino acid than that of the neighbouring amino acids. Thus, the extensive stereo-specific isotope labelling of target proteins promises considerable improvement of NMR spectra, which was ultimately accomplished by the SAIL method [25].
2.5
Segmental Labelling
Segmental labelling is a promising strategy to study large proteins. For instance, when a target protein comprises two structural domains, the N-terminal half of the polypeptide chain is labelled by 13 C and 15 N, and the remaining C-terminal half is unlabelled, thereby reducing the observable NMR signals to a manageable number. The segmental labelling method differs from the aforementioned amino acid selective labelling in that the labelled amino acids are sequentially positioned in the primary sequence, thus enabling the conventional sequential backbone assignment and the structure determination for the isotope labelled segment. In addition, the segment labelling can be used for NMR studies of interdomain interactions [125,126]. In segmental labelling schemes, an individual protein segment is expressed in medium containing the desired isotopic precursors or amino acids, while the rest of the protein segment(s) is expressed in unlabelled medium. Ligation of the independently labelled proteins yields the segmentally labelled protein. There are two methods that have been used for the segment ligation: protein trans-splicing (PTS) and expressed protein ligation (EPL). PTS is based on the utilisation of a protein splicing reaction. Protein splicing is a posttranslational processing event in which an internal protein segment, the intein, can catalyse its own excision from a precursor protein and concomitantly ligate the flanking regions to form the mature protein. The first application of PTS for NMR was performed for the C-terminal domain of the E. coli RNA polymerase a-subunit, using an intein, PI-PfuI from Pyrococcus furiosus, which can be cleaved into N-terminal and C-terminal portions [127]. The N- and C-terminal portions can adopt the correct folds of the intein, in turn leading to the protein splicing reaction, and thus the N-terminal segment of the target protein, attached to the N-terminal half of PI-pfuI, and the C-terminal segment of the target protein, attached to the C-terminal half of PI-pfuI, were mixed, heat denatured, and refolded. As a result, the PI-pfuI was formed, and then the protein splicing reaction occurred. The same strategy was also utilised for the 52 kDa b subunit of F1-ATPase [128] and for maltosebinding protein [129,130]. PTS can also be performed in E. coli cells [131]. In this method, two plasmids were used. One plasmid contained the T7/lac promoter for expression of the N-terminal portion of Ssp DnaE (inteinN) fused to GB1. The other plasmid was designed to express the C-terminal portion of Ssp DnaE (inteinC) fused to the CBD under the control of the araBAD promoter. The E. coli cells were first grown in unlabelled medium, and then the expression of the C-terminal fragment (inteinC-CBD) was induced by adding L-arabinose. Subsequently, the cells were harvested by centrifugation and resuspended in 15 N-labelled M9 medium. Expression of the N-terminal fragment (GB1-inteinN) was then induced by adding IPTG. In the E. coli cells, trans-splicing between the 15 N-labelled N-terminal fragment and the unlabelled C-terminal fragment occurred. The advantage of this method is
38
Protein NMR Spectroscopy
that it does not require either the individual preparation of precursor fragments before protein ligation or additional chemical reagents. EPL is based on a reaction originally employed in native chemical ligation, where the C-terminal thioester of one peptide reacts with the N-terminal cysteine residue of a second peptide [132–135]. Nucleophiles that react with thioesters, such as thiols (i.e. bmercaptoethanol, DTT, cysteine) and hydroxylamine, are used to shift the N-S equilibrium by attacking the thioester, which in turn induces the N-terminal cleavage of the intein. The choice of certain thiols depends on the accessibility of the catalytic pocket of the intein/extein splicing domain and the properties of the target proteins. Protein ligation requires a C-terminal thioester group and an N-terminal a-cysteine at the ends of the protein fragments. Such protein termini can be generated by expressing the protein fragments as fusions with a full-length or truncated intein, and subsequently inducing intein cleavage. As compared to the PTS method described above, EPL can be performed under milder conditions. The utility of this strategy has been demonstrated for over 40 proteins, such as two folded domains (SH2 and SH3) of Abelson protein tyrosine kinase [136] and s70-like factors [137]. The insertion of a synthetic peptide into a peptide segment can be performed, and thus a fluorescent probe can be site-specifically introduced into a polypeptide [138]. The IMPACT (Intein-mediated purification with an affinity chitin binding tag) kit is commercially available from New England Biolabs. This system utilises a modified intein in conjunction with a chitin-binding domain. The target protein is produced as a fusion with the intein-CBD at its N-terminus or C-terminus, and the fusion protein is absorbed onto a chitin column. The immobilised protein is then induced to undergo self-cleavage under mild conditions, resulting in the release of the target protein while the intein-CBD remains bound to the column [139].
2.6 2.6.1
SAIL Methods Concept of SAIL
Protein deuteration is the key to study large proteins by NMR. However, random fractional deuteration suffers from the presence of numerous isotopomers and the dilution of 1 H. Selective deuteration, such as methyl-selective protonation, in a deuterium background improves this situation to some extent. However, the methyl groups are localised to specific sites in the protein molecule, and the overall distance constraints cannot be obtained. To overcome the size-limitations of NMR structure determination without compromising the accuracy of the structure to be determined, the stereo-array isotope labelling (SAIL) method was developed [25]. The concept of the SAIL method is to utilise proteins exclusively composed of special amino acids that are stereo- and regio-specifically isotope labelled with 2 H, 13 C and 15 N (Figure 2.1). The optimal isotope labelling pattern for protein structure determination is designed as follows: a. In methylene groups, one of the two protons is stereo-selectively substituted by a deuteron. In this labelling pattern, the remaining 1 H atoms produce sharpened peaks, due to the absence of 1 H-1 H dipoles and coupling within the methylene group. In addition, the 1 H no longer overlaps with the substituted proton, thereby simplifying the NMR spectra of the methylene region. Furthermore, precise NOE-derived constraints can readily be obtained, due to the known stereospecific assignment of the 1 H atom. Once
Isotope Labelling
39
Methylene 13C
13C
H
H
H
D
Methyl H
13C
H
H
13C
D
D
H Prochiral Methyl 13C
13C 13
H313C
CH3
13
CHD2
D312C
Aromatic Ring H C
13C
H
D
13
H
H
H
13C 13C
H
D
C
13C
13C
13C
12
13C
12C
12C 13
C
H
D
Figure 2.1 SAIL amino acids. Design concepts in the SAIL amino acids
the positions of the methylene proton and the carbon are defined, the position of the substituted 2 H is automatically defined through the geometry of the methylene group. b. In methyl groups, two of the three protons are substituted with 2 H. The aim of this labelling pattern is the reduction of the spin diffusion effect. In addition, this labelling pattern might be used for advanced relaxation analyses of methyl groups. c. In the diastereotopic methyl groups in Leu and Val residues, one prochiral methyl is 13 CHD2 , and the other one is 12 CD3 . The aim of this labelling pattern is the observation of one methyl group with a known stereospecific assignment. d. In aromatic groups, the target proton-carbon moieties are 1 H-13 C, and the other moieties are 2 H and 12 C, thus eliminating the 13 C-13 C scalar coupling and the 1 H-1 H dipole coupling within aromatic rings. The SAIL approach profoundly simplifies the NMR spectra, by reducing the number of nonexchangeable protons, which are prone to overlap each other, to less than half of the original number and thus reducing the expected numbers of NOE peaks by 40–45% [25,140,141]. Since the NOEs eliminated provide information on either fixed (geminal) or redundant distance constraints, the reduction of the proton density is not a problem for structure determination. Conversely, the NOEs involving protons with known stereospecific assignments largely contribute towards defining the molecular conformation of the target protein. Another important advantage of the SAIL protein is the improved signal-to-noise ratio, which is mainly derived from the increased T2 relaxation times, due to the replacement
Protein NMR Spectroscopy
40
of unneeded 1 H and 13 C with 2 H and 12 C, respectively. In the case of the aromatic groups of Phe, Tyr and Trp, the absence of one-bond carbon-carbon coupling eliminates the need for a constant time scheme, thereby reducing the duration of the pulse scheme [141,142]. To perform this method, 20 special amino acids with a complete stereo- and regio-specific pattern of isotope labelling (SAIL amino acids) have been chemically and enzymatically synthesised (Figure 2.2) [25,142–144]. These SAIL amino acids are commercially available from SAILTechnologies, a company that was established to supply SAIL amino acids to the NMR community (www.sail-technologies.com). To address more difficult cases, such as NMR studies of membrane proteins, other SAIL amino acids with different isotope labelling patterns are now being designed and synthesised [145]. H
13CO H 2
H
H215N
13C
15
15NH 2
HD 213C
N
H 13
13
C
13
15
NH
H
Ala
13
H 13
HS
13
C
13 13
C
C
D D
15
13
13
H2 NO C 15
CO2H
H
NH2
D
C
D
13
Cys
Gly
15
NH2
D
Asp
13
13C
C
H
C
C
H
D
H H
13C
H215NO13C
13CO H 2
H215N
13
Asn
D
13
13
HO 213C
15NH 2
H
H
13CO H 2
H
CO2H
C
13C
NH2
Arg
15
C
H
C
C
13
H
13CO H 2
D H
CO2H
15
NH2
D HO213C
13
H H
13C
D
13
13C
15
C
H
CO2H NH2
D
Glu
Gln
H 15N 13 13
H
C
13
C 15
13
C
15NH
D
H
13
H
H
His
15
C 13
13C
NH 2
CD2H
Ile
H H
13C
HD 213C
13
CO2H
13C
H
15NH 2
13C
13C
13
H
C 13CD
15
NH2
2H
15NH
D 13
H
13
C
13
Figure 2.2
13 13
C
D
Thr
15
C
D
H
13
C
C
13
C
C
Trp
13
C
D
13CO H 2
13C
H
15
C
D
H
H
13C
13
13
HO
H
C
N
13
H
CO2H
13
C
C
D
CO2H 15
NH2
H
H
Ser
H
13
D H
13
H 215N
Pro
13CO H 2
13
HO
C
H 15NH 2
H
13C
13C
D
13
CO 2H
D312C
15NH 2
HD213C
13C 13C
H
D
Tyr
13
H
D
C
13C
C
H
Lys
13 15
H
H
C
13
H
C
D
NH2
H 13
2
D
13C
D
13C
Phe
13CO H 2
13C
CO2H
13C 13
H
D
Met
HO
13
H
D
H
H
D
13C
S
15NH
D
13C
D
H
Leu
H D
13C
D
H
13C
13C
D312C
D
13CO H 2
H H
HD213C
13C
HD 213C
2
13CO H 2
H H
13C
C
13C
N
D
13CO H 2
H
13 13
C
C
15
CO2H
NH2
H
H
Val
Chemical structures of the SAIL amino acids. Symbols: H, 1H; D, 2H
NH 2
Isotope Labelling
2.6.2
41
Practical Procedure for the SAIL Method
The production of SAIL proteins starts with cell-free synthesis, using SAIL amino acids. As described in the protein production section, the advantages of the cell-free system are its minimised metabolic scrambling and high incorporation rate of the added amino acid into the target protein. Thus far, the E. coli cell-free expression system has been used for the production of SAIL proteins. Once the pilot experiment has been accomplished, the production of the SAIL protein is performed according to Protocol 3.
Protocol 3: Production of SAIL Proteins by the E. coli Cell-Free Method 1. Prepare the reaction solution and the dialysis solution by mixing the components as follows:
2.
3. 4.
5.
Stock solution
Reaction solution
Dialysis solution
RNase-free water 1.4 M NH4OAc 0.5 M Mg(OAc)2 SAIL amino acid mixture (Total 60 mg) 0.645 M creatine phosphate LM mixture 1 mg/ml template DNA 11 mg/ml T7 RNA polymerase 40 units/ml RNase inhibitor 10 mg/ml creatine kinase S30 extract Total volume
1269.5 ml 98 ml 150 ml 200 ml 400 ml 1250 ml 100 ml 45 ml 12.5 ml 125 ml 1500 ml 5 ml
11 608 ml 392 ml 600 ml 800 ml 1600 ml 5000 ml — — — — — 20 ml
Dissolve the SAIL amino acid mixture in water and then add it to the cell-free reaction solution. If the SAIL amino acids appear to be insoluble in water, then warm the solution up to 60 C; L-tryptophan and L-tyrosine are especially likely to be insoluble, as compared to other amino acids. Cut the outer tube of the Float-A-Lyzer at an appropriate height such that inner solution in the tube will be completely immersed in the dialysis solution when the inner membrane apparatus is placed within the outer tube. Pour the dialysis solution into the outer tube. Place the inner membrane apparatus of the Float-A-Lyzer within the outer tube, and pour the reaction solution into the inner membrane. Cover the tube with Parafilm. Shake the tube to facilitate the production of target proteins under optimised conditions. Retrieve the reaction solution and the dialysis solution. If the produced protein has a molecular weight smaller than the molecular weight cut-off of the membrane, then check the outer solution for the presence of the protein. Purify the produced protein according to the purification procedures for the target protein. The N-terminus of the protein produced by cell-free expression may be
42
Protein NMR Spectroscopy
heterogeneous, due to incomplete deformylation by peptide deformylase. This can be overcome by using a cleavable N-terminal tag. 6. Transfer the prepared sample into the NMR tube.
When the SAIL protein is prepared, 1 H-15 N HSQC, 1 H-13 C constant time HSQC for the aliphatic region (with 2 H decoupling during 13 C chemical shift encoding), and 1 H-13 C HSQC for the aromatic region are commonly acquired. In the case of SAIL proteins, the number of time points for the indirect 13 C dimension is set to a relatively large number, and the window function should be optimised. A comparison of the quality of the NMR spectra between uniformly labelled and SAIL proteins is highly recommended (Figure 2.3). Firstly, the chemical shifts of the protons and carbons should be slightly different between them due to isotope shift effects, which confirms the desired isotope labelling pattern. Secondly, the linewidth of each resonance in the SAIL protein should be much less than that in the uniformly labelled protein. The NMR experiments to be acquired for the SAIL proteins for structure determination are essentially the same as those for the uniformly labelled proteins. 2 H decoupling during the 13 C chemical shift encoding ensures the narrow linewidth in the 13 C dimension. In the case of the chemical shift encoding of aromatic carbons, a constant time scheme is no longer required because of the absence of one-bond carbon-carbon couplings (Figure 2.4). If the quality of these three spectra is good, then the set of NMR experiments needed for resonance assignment and structure determination should be acquired (see Chapters 3 and 4). In our laboratory, two 13 C-edited NOESY-HSQCs for aliphatic and aromatic regions and a 15 N-edited NOESY-HSQC are acquired. The pulse sequences employed in these NOESY experiments are the same as those used for the uniformly labelled proteins, except that 2 H decoupling is applied in the 13 C chemical shift encoding. Since the 1 H density in SAIL proteins is about half of that in the corresponding uniformly labelled proteins, the optimal mixing time for the SAIL protein is expected to be longer than that for the uniformly labelled protein [146]. The aromatic resonances in Phe and Tyr residues are connected to the Hb-Cb moieties within the given aromatic amino acids. Some caution should be used, since the chemical shifts are different between the SAIL and uniformly labelled proteins, so that when the TALOS program is employed, the input data should be adjusted prior to use [33,34]. Details for structure determination of the SAIL-labelled proteins have been described [145]. 2.6.3
Residue-Selective SAIL Method
Along with the full SAIL labelling method, residue-selective labelling by SAIL amino acid(s) is also a powerful approach. As compared to the full SAIL approach, the residueselective SAIL method has some obvious advantages. For example, well-established and thus more robust in vivo expression systems can be used for protein production, provided that the added SAIL amino acid is not affected by metabolic scrambling. As compared to in vitro expression, in vivo expression requires a much larger amount of the amino acid to obtain the intended amount in the target protein. One strategy to overcome this problem involves the use of an auxotrophic E. coli strain. By growing the E. coli auxotrophic strain in minimal medium containing a small amount of the target isotope labelled amino acid and
[ppm]
13C
[ppm]
(c)
2.6
2.6
V23β
2.4
2.4
2.2
2.2
V357β E131β2 K29β3
Q253β2
2.0
2.0
E3β2
1H
1H
[ppm]
1.8
[ppm]
1.8
1.6
1.6
R354β3 P126β2/P254β2
E4β2/E221β2 P133β2
L135γ
1.4
1.4
1.2
1.2
L151γ L7γ Q335β2 K189δ3 L121γ L122γ L76γ/L115γ P123γ2 P91γ2 P334γ2 L275γ K15δ3 R298γ2 K202β3 L20γ P315γ2 R367γ3 I104γ12 K170δ3 P154γ2 Q72β2 R98γ3 L139γ K127δ3 K202δ3 Q365β2 P254γ2P159γ2 I329γ12 E30β2/E274β2 K83δ3 K179δ3 K251β3/K362δ3 P48γ2 Q325β2 L89γ K140δ3 K25δ3 K34δ3/K251β3 E322β2 K200δ3/K296δ3 K142δ3 Q49β2 K6δ3 I348γ12 E22β2 K26δ3 E309β2 E310β2 E288β2 K88δ3 E278β2 K189δ3 K305δ3/I317γ12 R316β3 R78β2 K29δ3/K277δ3 E138β2 R367β2 K175δ3/K219δ3/K326δ3 P48β2 E359β2 E45β2 R66β3
L43γ
32.0
31.5
31.0
30.5
30.0
29.5
29.0
28.5
28.0
27.5
27.0
26.5
31.5 ppm
31.0
30.5
30.0
29.5
29.0
28.5
28.0
27.5
27.0
26.5
26.0
(d)
2.6
2.6
P154γ2
(b)
2.5
2.5
2.4
2.4
Q72β2
P91γ2
1H
1H
Q365β2
[ppm]
2.3
[ppm]
2.3
R298γ2
2.2
2.2
P48γ2
P254γ2
P334γ2
2.1
2.1
P315γ2
2.0
2.0
E30β2/E274β2
Q325β2
P159γ2
K202β3
26.6
28.6
28.4
28.2
28.0
27.8
27.6
27.4
27.2
28.2
28.0
27.8
27.6
27.4
27.2
27.0
26.8
UL P91(γ2)
2.5
SAIL P91(γ2)
(e)
2.4
1H
2.3
2.2
[ppm]
UL P334(γ2)
SAIL P334(γ2)
2.1
2.0
UL P334(γ3)
Figure 2.3 1 H -13 C CT-HSQC spectra of maltose-binding protein (MBP). (a) Aliphatic region of methylene groups in SAIL-MBP. (b) Enlargement of the rectangular region of the methylene group marked in a. Assignments are indicated. (c, d) Corresponding region for uniformly 13C, 15N –labelled MBP. (e), Cross-section at the position indicated in (b and d). The spectra for SAIL MBP and uniformly labelled MBP were acquired under the same conditions and were scaled for equal noise levels
13C
[ppm] 13C
[ppm] 13C
(a)
Isotope Labelling 43
Protein NMR Spectroscopy
1
C
13
C
13
130
H
C
13
131
13
C 13
H
129
1
13
H
1
(c)
H
C
C
132
1
H 8.0
7.0
6.0 129
(d)
D
(b) 13
H
13
C
F315ε
F364ε
C
13
C
131
F316ε F275ε1
D
130
(ppm)
F347ε
H
(ppm)
1
13C
(a)
13C
44
F308ε
F275ε2
D
132
8.0
7.0 1H
(ppm)
(f) 115
115
116
116 Y334ε1
117
118
118
13C
117
(ppm)
(e)
6.0
119
7.0
6.8 1H
6.6
6.4
(ppm)
6.2
Y361ε
7.0
6.8
Y334ε2 Y269ε
6.6 1H
6.4
119
6.2
(ppm)
Figure 2.4 Comparisons of NMR spectra for the SARS-CoV NP CTD between UL and SAIL in the aromatic region. (a, b) Chemical structures of the aromatic rings for UL- (a) and SAILphenylalanine (b). (c, d) Phenylalanine signals of 1 H -13 C HSQC for UL- (c) and SAIL- (d) SARSCoV NP CTD. (e, f) Tyrosine signals of 1H-13C HSQC for UL- (e) and SAIL- (f) SARS-CoV NP CTD. To demonstrate the absence of 1Jcc coupling of aromatic rings for SAIL phenylalanine and tyrosine residues, all 1H-13C HSQC spectra for the aromatic regions were recorded without the constant time technique
a large amount of the 19 other unlabelled amino acids, a reasonable yield of the selectively labelled protein can be obtained. The minimum amount of the SAIL amino acid required for efficient expression varies for different target proteins and labelled amino acids. To achieve low cost, it is desirable to determine the minimum amount of SAIL amino acid required in a small-scale culture (Protocol 4). In favourable cases, an incorporation rate of more than 10 % of the SAIL amino acid into the target protein can be achieved, which is the same level as that in cell-free reactions. The growth rate of E. coli cells becomes slower as the amount of added amino acid decreases, and thus the expression level also decreases.
Isotope Labelling
45
Protocol 4: Optimisation of the Amount of SAIL Amino Acids for the Production of Calmodulin Selectively Labelled by SAIL Phenylalanine 1. Transform the expression vector into an auxotrophic E. coli strain. We often use AB2826 (DE3) strains [78] for selective labelling by SAIL aromatic amino acids. 2. Prepare minimal medium. .
Amino acid-containing M9 medium with different amounts of phenylalanine.
For 1L, combine the following: Na2HPO412 H2 O 15.1 g, KH2PO4 3.0 g, 15 N NH4Cl 1.0 g, NaCl 0.5 g L-Alanine
400 mg, L-Arginine 400 mg, L-Aspartic Acid 250 mg, L-Cystine 50 mg, Acid 400 mg, L-Glycine 400 mg, L-Histidine 100 mg, L-Isoleucine 100 mg, L-Leucine 100 mg, L-Lysine 150 mg, L-Methionine 50 mg, L-Phenylalanine 50 mg, LProline 150 mg, L-Serine 1000 mg, L-Threonine 100 mg, L-Tryptophan 50 mg, L-Tyrosine 100 mg, L-Valine 100 mg. The most important point in this method is that the amount of the target amino acid can be decreased without reducing the yield. For instance, in the case of phenylalanine, the amount can be decreased to around 10 mg/L. L-Glutamic
After autoclaving, add the following: 1M MgSO4 1 ml, 0.1 M CaCl2 1 ml, 8 % thiamine 0.5 ml, 20 % D-glucose 10 ml 3. Pick the colonies and inoculate them into LB medium. 4. Spin down the E. coli cells grown on the LB medium, and resuspend the cells in the minimal medium. 5. Grow the E. coli cells to an OD600 of 0.6–0.7. 6. Induce the expression by adding IPTG. 7. After a defined time, stop the culture and check the protein expression by SDS-PAGE.
2.7
Concluding Remarks
The labelling of proteins with stable isotopes enhances NMR methods for analyses of structure, dynamics and interactions. The choice of a suitable isotope labelling strategy is now very important, along with the optimisation of buffer conditions, in terms of sample optimisation. The selection of the isotope labelling strategy is based on many factors: available production method, cost, yield and intended study. The concept of stable isotope labelling resolves the numerous peaks and sharpens each line, thereby yielding reliable information on protein structure, dynamics and functional aspects.
46
Protein NMR Spectroscopy
Acknowledgements The authors thank our collaborators, Drs. Tsutomu Terauchi, Akira Mei Ono, Toshiya Hayano, Masato Shimizu, Takuya Torizawa, Teppei Ikeya and Peter G€untert, for their contributions to the development of the SAIL method described in this chapter. We are grateful for financial support from Core Research for Evolutional Science and Technology (JST) and the Targeted Proteins Research Program (MEXT).
References 1. Markley, J.L. and Kainosho, M. (1993) Stable isotope labeling and resonance assignments in larger proteins, in NMR of Macromolecules: A Practical Approach (ed. G.C.K. Roberts), Oxford University Press, Oxford. 2. LeMaster, D.M. (1994) Isotope labeling in solution protein assignment and structural analysis. Prog. Nucl. Magn. Reson. Spectrosc., 26, 371–419. 3. Kainosho, M. (1997) Isotope labelling of macromolecules for structure determinations. Nat. Struct. Biol., 4, 854–857. 4. Lian, L.-Y. and Middleton, D.A. (2001) Labelling approaches for protein structural studies by solution-state and solid-state NMR. Progr. Nucl. Magn. Reson. Spectrosc., 39, 171–190. 5. Goto, N.K. and Kay, L.E. (2000) New developments in isotope labeling strategies for protein solution NMR spectroscopy. Curr. Opin. Struct. Biol., 10, 585–592. 6. Ohki, S. and Kainosho, M. (2008) Stable isotope labeling methods for protein NMR spectroscopy. Prog. Nucl. Magn. Reson. Spectrosc., 53, 208–226. 7. Makrides, S.C. (1996) Strategies for achieving high-level expression of genes in Escherichia coli. Microbiol. Rev., 60, 512–538. 8. Baneyx, F. (1999) Recombinant protein expression in Escherichia coli. Curr. Opin. Biotechnol., 10, 411–421. 9. Studier, F.W., Rosenberg, A.H., Dunn, J.J. and Dubendorff, J.W. (1990) Use of T7 RNA polymerase to direct expression of cloned genes. Meth. Enzymol., 185, 60–89. 10. Singh, S.M. and Panda, A.K. (2005) Solubilization and refolding of bacterial inclusion body proteins. J. Biosci. Bioeng., 99, 303–310. 11. Woods, M.J. and Komives, E.A. (1999) Production of large quantities of isotopically labeled protein in Pichia pastoris by fermentation. J. Biomol. NMR, 13, 149–159. 12. Massou, S., Puech, V., Talmont, F. et al. (1999) Heterologous expression of a deuterated membrane-integrated receptor and partial deuteration in methylotrophic yeasts. J. Biomol. NMR, 14, 231–239. 13. Colussi, P.A. and Taron, C.H. (2005) Kluyveromyces lactis LAC4 promoter variants that lack function in bacteria but retain full function in K. lactis. Appl. Microbiol. Biotechnol., 71, 7092–7098. 14. Sugiki, T., Shimada, I. and Takahashi, H. (2008) Stable isotope labeling of protein by Kluyveromyces lactis for NMR study. J. Biomol. NMR, 42, 159–162. 15. DeLange, F., Klaassen, C.H.W., Wallace-Williams, S.E. et al. (1998) Tyrosine structural changes detected during the photoactivation of rhodopsin. J. Biol. Chem., 273, 23735–23739. 16. Creemers, A.F.L., Klaassen, C.H.W., Bovee-Geurts, P.H.M. et al. (1999) Solid state 15N NMR evidence for a complex Schiff base counterion in the visual G-protein-coupled receptor rhodopsin. Biochemistry, 38, 7195–7199. 17. Strauss, A., Bitsch, F., Cutting, B. et al. (2003) Amino–acid-type selective isotope labeling of proteins expressed in baculovirus-infected insect cells useful for NMR. J. Biomol. NMR, 26, 367–372. 18. Hansen, A.P., Petros, A.M., Mazar, A.P. et al. (1992) A practical method for uniform isotopic labeling of recombinant proteins in mammalian cells. Biochemistry, 31, 12713–12718.
Isotope Labelling
47
19. Archer, S.J., Bax, A., Roberts, A.B. et al. (1993) Transforming growth factor beta 1: NMR signal assignments of the recombinant protein expressed and isotopically enriched using Chinese hamster ovary cells. Biochemistry, 32, 1152–1163. 20. Spirin, A.S., Baranov, V.I., Ryabova, L.A. et al. (1988) A continuous cell-free translation system capable of producing polypeptides in high yield. Science, 242, 1162–1164. 21. Kigawa, T., Muto, Y. and Yokoyama, S. (1995) Cell-free synthesis and amino acid-selective stable isotope labeling of proteins for NMR analysis. J. Biomol. NMR, 6 129–134. 22. Kigawa, T., Yabuki, T., Yoshida, Y. et al. (1999) Cell-free production and stable-isotope labeling of milligram quantities of proteins. FEBS Lett., 442, 15–19. 23. Ozawa, K., Headlam, M.J., Schaeffer, P.M. et al. (2004) Optimization of an Escherichia coli system for cell-free synthesis of selectively 15N-labelled proteins for rapid analysis by NMR spectroscopy. Eur. J. Biochem., 271, 4084–4093. 24. Torizawa, T., Shimizu, M., Taoka, M. et al. (2004) Efficient production of isotopically labeled proteins by cell-free synthesis: A practical protocol. J. Biomol. NMR, 30, 311–325. 25. Kainosho, M., Torizawa, T., Iwashita, Y. et al. (2006) Optimal isotope labelling for NMR protein structure determinations. Nature, 440, 52–57. 26. Berrier, C., Park, K.H., Abes, S. et al. (2004) Cell-free synthesis of a functional ion channel in the absence of a membrane and in the presence of detergent. Biochemistry, 43, 12585–12591. 27. Madin, K., Sawasaki, T., Ogasawara, T. and Endo, Y. (2000) A highly efficient and robust cellfree protein synthesis system prepared from wheat embryos: plants apparently contain a suicide system directed at ribosomes. Proc. Natl. Acad. Sci. USA, 97, 559–564. 28. Endo, Y. and Sawasaki, T. (2003) High-throughput, genome-scale protein production method based on the wheat germ cell-free expression system. Biotechnol. Adv., 21, 695–713. 29. Morita, E.H., Shimizu, M., Ogasawara, T. et al. (2004) A novel way of amino acid-specific assignment in 1H-15N HSQC spectra with a wheat germ cell-free protein synthesis system. J. Biomol. NMR, 30, 37–45. 30. Ikura, M., Kay, L.E. and Bax, A. (1990) A novel approach for sequential assignment of proton, carbon-13, and nitrogen-15 spectra of larger proteins: heteronuclear triple-resonance threedimensional NMR spectroscopy. Application to calmodulin. Biochemistry, 29, 4659–4667. 31. Kay, L.E., Ikura, M., Tschudin, R. and Bax, A. (1990) Three-dimensional triple-resonance NMR spectroscopy of isotopically enriched proteins. J. Magn. Reson., 89, 496–514. 32. Clore, G.M. and Gronenborn, A.M. (1994) Multidimensional heteronuclear nuclear magnetic resonance of proteins. Methods Enzymol., 239, 349–363. 33. Wishart, D.S., Sykes, B.D. and Richards, F.M. (1992) The chemical shift index: a fast and simple method for the assignment of protein secondary structure through NMR spectroscopy. Biochemistry, 31, 1647–1651. 34. Wishart, D.S. and Sykes, B.D. (1994) The 13C chemical-shift index: a simple method for the identification of protein secondary structure using 13C chemical-shift data. J. Biomol. NMR, 4, 171–180. 35. Shen, Y., Lange, O., Delaglio, F. et al. (2008). Consistent blind protein structure generation from NMR chemical shift data. Proc. Natl. Acad. Sci. USA, 25, 4685–4690. 36. LeMaster, D.M. and Kushlan, D.M. (1996) Dynamic mapping of E. coli Thioredoxin via 13C NMR relaxation analysis. J. Amer. Chem. Soc., 118, 9255–9264. 37. Castellani, F., van Rossum, B., Diehl, A. et al. (2002) Structure of a protein determined by solidstate magic-angle-spinning NMR spectroscopy. Nature, 420, 98–102. 38. Takeuchi, K., Sun, Z.Y. and Wagner, G. (2008) Alternate 13C-12C Labeling for complete mainchain resonance assignments using Ca direct-detection with applicability toward fast relaxing protein systems. J. Am. Chem. Soc., 130, 17210–17211. 39. Crespi, H.L., Rosenberg, R.M. and Katz, J.J. (1968) Proton magnetic resonance of proteins fully deuterated except for 1H-leucine side chains. Science, 161, 795–796. 40. Markley, J.L., Putter, I. and Jardetzky, O. (1968) High-resolution nuclear magnetic resonance spectra of selectively deuterated staphylococcal nuclease. Science, 161, 1249–1251. 41. Kalbitzer, H.R., Leberman, R. and Wittinghofer, A. (1985) 1H-NMR spectroscopy on elongation factor Tu from Escherichia coli. FEBS Lett., 180, 40–42.
48
Protein NMR Spectroscopy
42. Venters, R.A., Farmer, B.T. II, Fierke, C.A. and Spicer, L.D. (1996) Characterizing the use of perdeuteration in NMR studies of large proteins: 13C, 15N and 1H assignments of human carbonic anhydrase II. J. Mol. Biol., 264, 1101–1116. 43. Gardner, K.H. and Kay, L.E. (1998) The use of 2H, 13C, 15N multidimensional NMR to study the structure and dynamics of proteins. Annu. Rev. Biophys. Biomol. Struct., 27, 357–406. 44. Fiaux, J., Bertelsen, E.B., Horwich, A.L. and W€ uthrich, K. (2004) Uniform and residue-specific 15 N-labeling of proteins on a highly deuterated background. J. Biomol. NMR, 29, 289–297. 45. Paliy, O. and Gunasekera, T.S. (2007) Growth of E. coli BL21 in minimal media with different gluconeogenic carbon sources and salt contents. Appl. Microbiol. Biotechnol., 73, 1169–1172. 46. Markley, J.L., Lu, M. and Bracken, C. (2001) A method for efficient isotopic labeling of recombinant proteins. J. Biomol. NMR, 20, 71–75. 47. Venters, R.A., Huang, C.C., Farmer JII, B.T. et al. (1995) High-level 2H/13C/15N labeling of proteins for NMR studies. J. Biomol. NMR., 5, 339–344. 48. Venters, R.A., Calderone, T.L., Spicer, L.D. and Fierke, C.A. (1991) Uniform 13C isotope labeling of proteins with sodium acetate for NMR studies: application to human carbonic anhydrase II. Biochemistry, 30, 4491–4494. 49. Leiting, B., Marsilio, F. and O’Connell, J.F. (1998) Predictable deuteration of recombinant proteins expressed in Escherichia coli. Anal. Biochem., 265, 351–355. 50. Hochuli, M., Szyperski, T. and W€uthrich, K. (2000) Deuterium isotope effects on the central carbon metabolism of Escherichia coli cells grown on a D2O-containing minimal medium. J. Biomol. NMR, 17, 33–42. 51. Etezady-Esfarjani, T., Hiller, S., Villalba, C. and W€ uthrich, K. (2007) Cell-free protein synthesis of perdeuterated proteins for NMR studies. J. Biomol. NMR, 39, 229–238. 52. Morgan, W.D., Kragt, A. and Feeney, J. (2000) Expression of deuterium-isotope-labelled protein in the yeast Pichia pastoris for NMR studies. J. Biomol. NMR, 17, 337–347. 53. Farmer, B.T. II and Venters, R.A. (1995) Assignment of side-chain 13C resonances in perdeuterated proteins. J. Am. Chem. Soc., 117, 4187–4188. 54. Grzesiek, S., Anglister, J., Ren, H. and Bax, A. (1993) Carbon-13 line narrowing by deuterium decoupling in deuterium/carbon-13/nitrogen-15 enriched proteins. Application to triple resonance 4D J connectivity of sequential amides. J. Am. Chem. Soc., 115, 4369–4370. 55. Venters, R.A., Metzler, W.J., Spicer, L.D. et al. (1995) Use of 1HN-1HN NOEs to determine protein global folds in perdeuterated proteins. J. Am. Chem. Soc., 117, 9592–9593. 56. Takahashi, H., Nakanishi, T., Kami, K. et al. (2000) A novel NMR method for determining the interfaces of large protein- protein complexes. Nature Struct. Biol., 7, 220–223. 57. Nakanishi, T., Miyazawa, M., Sakakura, M. et al. (2002) Determination of the interface of a large protein complex by transferred cross-saturation measurements. J. Mol. Biol., 318, 245–249. 58. Shimada, I. (2005) NMR techniques for identifying the interface of a larger protein–protein complex: cross-saturation and transferred cross-saturation experiments. Methods Enzymol., 394, 483–506. 59. Takeda, M., Ogino, S., Umemoto, R. et al. (2006) Ligand-induced structural changes of the CD44 hyaluronan-binding domain revealed by NMR. J. Biol. Chem., 281, 40089–40095. 60. Takeda, M., Terasawa, H., Sakakura, M. et al. (2003) Hyaluronan recognition mode of CD44 revealed by cross-saturation and chemical shift perturbation experiments. J. Biol. Chem., 278, 43550–53555. 61. Nishida, N., Sumikawa, H., Sakakura, M. et al. (2003) Collagen-binding mode of vWF-A3 domain determined by a transferred cross-saturation experiment. Nat. Struct. Biol., 10, 53–58. 62. Pervushin, K., Riek, R., Wider, G. and W€uthrich, K. (1997) Attenuated T2 relaxation by mutual cancellation of dipole-dipole coupling and chemical shift anisotropy indicates an avenue to NMR structures of very large biological macromolecules in solution. Proc. Natl. Acad. Sci. USA, 94, 12366–12371. 63. Riek, R., Wider, G., Pervushin, K. and W€uthrich, K. (1999) Polarization transfer by crosscorrelated relaxation in solution NMR with very large molecules. Proc. Natl. Acad. Sci. USA, 96, 4918–4923.
Isotope Labelling
49
64. Salzmann, M., Pervushin, K., Wider, G. et al. (1998) TROSY in triple-resonance experiments: new perspectives for sequential NMR assignment of large proteins. Proc. Natl. Acad. Sci. USA, 95, 13585–13590. 65. Fiaux, J., Bertelsen, E.B., Horwich, A.L. and W€uthrich, K. (2002) NMR analysis of a 900K GroEL GroES complex. Nature, 418, 207–211. 66. Pervushin, K., Riek, R., Wider, G. and W€uthrich, K. (1998) Transverse relaxation-optimized spectroscopy (TROSY) for NMR studies of aromatic spin systems in 13C-labeled proteins. J. Am. Chem. Soc., 120, 6394–6400. 67. Tugarinov, V., Hwang, P.M., Ollerenshaw, J.E. and Kay, L.E. (2003) Cross-correlated relaxation enhanced 1H-13C NMR spectroscopy of methyl groups in very high molecular weight proteins and protein complexes. J. Am. Chem. Soc., 125, 10420–10428. 68. LeMaster, D.M. and Richards, F.M. (1988) NMR sequential assignment of Escherichia coli thioredoxin utilizing random fractional deuteration. Biochemistry, 27, 142–150. 69. Torchia, D.A., Sparks, S.W. and Bax, A. (1988) Delineation of a-helical domains in deuterated staphylococcal nuclease by 2D NOE NMR spectroscopy. J. Am. Chem. Soc., 110, 2320–2321. 70. Nietlispach, D., Clowes, R.T., Broadhurst, R.W. et al. (1996) An approach to the structure determination of larger proteins using triple resonance NMR experiments in conjunction with random fractional deuteration. J. Am. Chem. Soc., 118, 407–415. 71. Gardner, K.H., Rosen, M.K. and Kay, L.E. (1997) Global folds of highly deuterated, methylprotonated proteins by multidimensional NMR. Biochemistry, 36, 1389–1401. 72. Muhandiram, D.R., Yamazaki, T., Sykes, B.D. and Kay, L.E. (1995) Measurement of 2H T1 and T1r relaxation times in uniformly 13C-labeled and fractionally 2H-labeled proteins in solution. J. Am. Chem. Soc., 117, 11536–11544. 73. Kay, L.E., Muhandiram, D.R., Farrow, N.A. et al. (1996) Correlation between dynamics and high affinity binding in an SH2 domain interaction. Biochemistry, 35, 361–368. 74. LeMaster, D.M. and Kushlan, D.M. (1996) Dynamical mapping of E. coli thioredoxin via 13C NMR relaxation analysis. J. Am. Chem. Soc., 118, 9255–9264. 75. Tugarinov, V., Kanelis, V. and Kay, L.E. (2006) Isotope labeling strategies for the study of highmolecular-weight proteins by solution NMR spectroscopy. Nature Protoc., 1, 749–754. 76. Muchmore, D.C., McIntosh, L.P., Russell, C.B. et al. (1989) Expression and nitrogen-15 labeling of proteins for proton and nitrogen-15 nuclear magnetic resonance. Methods Enzymol., 177, 44–73. 77. Waugh, D.S. (1996) Genetic tools for selective labeling of proteins with a-15N-amino acids. J. Biomol. NMR, 8, 184–192. 78. Rajesh, S., Nietlispach, D., Nakayama, H. et al. (2003) A novel method for the biosynthesis of deuterated proteins with selective protonation at the aromatic rings of Phe, Tyr and Trp. J. Biomol. NMR, 27, 81–86. 79. Shortle, D. (1994) Assignment of amino acid type in 1H-15N correlation spectra by labeling with 14 N-amino acids. J. Magn. Reson. B., 105, 88–90. 80. Bertini, I., Duma, L., Felli, I.C. et al. (2004) A heteronuclear direct-detection NMR spectroscopy experiment for protein-backbone assignment. Angew. Chem., Int. Ed., 43, 2257–2259. 81. Bermel, W., Bertini, I., Felli, I.C. et al. (2006) 13C-detected protonless NMR spectroscopy of proteins in solution. Progr. NMR Spectrosc., 48, 25–45. 82. Babini, E., Bertini, I., Capozzi, F. et al. (2004) Direct carbon detection in paramagnetic metalloproteins to further exploit pseudocontact shift restraints. J. Am. Chem. Soc., 126, 10496–10497. 83. Kainosho, M. and Tsuji, T. (1982) Assignment of the three methionyl carbonyl carbon resonances in Streptomyces subtilisin inhibitor by a carbon-13 and nitrogen-15 double-labeling technique. A new strategy for structural studies of proteins in solution. Biochemistry, 21, 6273–6279. 84. Kainosho, M., Nagao, H. and Tsuji, T. (1987) Local structural features around the C-terminal segment of Streptomyces subtilisin inhibitor studied by carbonyl carbon nuclear magnetic resonances of three phenylalanyl residues. Biochemistry, 26, 1068–1075.
50
Protein NMR Spectroscopy
85. Uchida, K., Markley, J.L. and Kainosho, M. (2005) Carbon-13 NMR method for the detection of correlated hydrogen exchange at adjacent backbone peptide amides and its application to hydrogen exchange in five antiparallel beta strands within the hydrophobic core of Streptomyces subtilisin inhibitor (SSI). Biochemistry, 44, 11811–11820. 86. Yamazaki, T., Yoshida, M., Kanaya, S. et al. (1991) Assignments of backbone 1H, 13C, and 15N resonances and secondary structure of ribonuclease H from Escherichia coli by heteronuclear three-dimensional NMR spectroscopy. Biochemistry, 30, 6036–6047. 87. Parker, M.J., Aulton-Jones, M., Hounslow, A.M. and Craven, C.J. (2004) A combinatorial selective labeling method for the assignment of backbone amide NMR resonances. J. Am. Chem. Soc., 126, 5020–5021. 88. Wu, P.S., Ozawa, K., Jergic, S. et al. (2006) Amino-acid type identification in 15N-HSQC spectra by combinatorial selective 15N-labelling. J. Biomol. NMR, 34, 13–21. 89. Craven, C.J., Al-Owais, M. and Parker, M.J. (2007) A systematic analysis of backbone amide assignments achieved via combinatorial selective labelling of amino acids. J. Biomol. NMR, 38, 151–159. 90. Kelly, M.J., Krieger, C., Ball, L.J. et al. (1999) Application of amino acid type-specific 1H- and 14 N-labeling in a 2H-, 15N-labeled background to a 47 kDa homodimer: Potential for NMR structure determination of large proteins. J. Biomol. NMR, 14, 79–83. 91. Metzler, W.J., Wittekind, M., Goldfarb, V. et al. (1996) Incorporation of 1H/13C/15N-{Ile, Leu, Val} into a perdeuterated, 15N-labeled protein: Potential in structure determination of large proteins by NMR. J. Am. Chem. Soc., 118, 6800–6801. 92. Arrowsmith, C.H., Pachter, R., Altman, R.B. et al. (1990) Sequence-specific proton NMR assignments and secondary structure in solution of Escherichia coli trp repressor. Biochemistry, 29, 6332–6341. 93. Zhang, H., Zhao, D., Revington, M. et al. (1994) The Solution Structures of the trp RepressorOperator DNA Complex. J. Mol. Biol., 238, 592–614. 94. Yamazaki, T., Lee, W., Revington, M. et al. (1994) An HNCA pulse scheme for the backbone assignment of 15N,13C,2H-labeled proteins: application to a 37-kDa Trp repressor-DNA Complex. J. Am. Chem. Soc., 116, 6464–6465. 95. Yamazaki, T., Lee, W., Arrowsmith, C.H. et al. (1994) A suite of triple resonance NMR experiments for the backbone assignment of 15N, 13C, 2H labeled proteins with high sensitivity. J. Am. Chem. Soc., 116, 11655–11666. 96. Shan, X., Gardner, K.H., Muhandiram, D.R. et al. (1996) Assignment of 15N, 13Ca13Cb and HN resonances in an 15N,13C,2H labeled 64 kDa Trp repressor-pperator complex using triple-resonance NMR spectroscopy and 2H-decoupling. J. Am. Chem. Soc., 118, 6570–6579. 97. L€ohr, F., Katsemi, V., Hartleib, J. et al. (2003) A strategy to obtain backbone resonance assignments of deuterated proteins in the presence of incomplete amide 2H/1H back-exchange. J. Biomol. NMR, 25, 291–311. 98. Yamazaki, T., Tochio, H., Furui, J. et al. (1997) Assignment of backbone resonances for larger proteins using the 13C-1H coherence of a 1Ha-, 2H-, 13C-, and 15N-labeled sample. J. Am. Chem. Soc., 119, 872–880. 99. Coughlin, P.E., Anderson, F.E., Oliver, E.J. et al. (1999) Improved resolution and sensitivity of triple-resonance NMR methods for the structural analysis of proteins by use of a backbonelabeling strategy. J. Am. Chem. Soc., 121, 11871–11874. 100. Vuister, G.W., Kim, S.J., Wu, C. and Bax, A. (1994) 2D and 3D NMR study of phenylalanine residues in proteins by reverse isotopic labeling. J. Am. Chem. Soc., 116, 9206–9210. 101. Kuboniwa, H., Tjandra, N., Grzesiek, S. et al. (1995) Solution structure of calcium-free calmodulin. Nature Struct. Biol., 2, 768–776. 102. Aghazadeh, B., Zhu, K., Kubiseski, T.J. et al. (1998) Structure and mutagenesis of the Dbl homology domain. Nature Struct. Biol., 5, 1098–1107. 103. Medek, A., Olejniczak, E.T., Meadows, R.P. and Fesik, S.W. (2000) An approach for highthroughput structure determination of proteins by NMR spectroscopy. J. Biomol. NMR, 18, 229–238.
Isotope Labelling
51
104. Tugarinov, V. and Kay, L.E. (2004) An isotope labeling strategy for methyl TROSY spectroscopy. J. Biomol. NMR, 28, 165–172. 105. Tugarinov, V., Choy, W.Y., Orekhov, V.Y. and Kay, L.E. (2005) Solution NMR-derived global fold of a monomeric 82-kDa enzyme. Proc. Natl. Acad. Sci. USA, 102, 622–627. 106. Rosen, M.K., Gardner, K.H., Willis, R.C. et al. (1996) Selective methyl group protonation of perdeuterated proteins. J. Mol. Biol., 263, 627–636. 107. Gardner, K.H. and Kay, L.E. (1997) Production and incorporation of 15N, 13C, 2H (1H-d1 methyl) isoleucine into proteins for multidimensional NMR studies. J. Am. Chem. Soc., 119, 7599–7600. 108. Ollerenshaw, J.E., Tugarinov, V., Skrynnikov, N.R. and Kay, L.E. (2005) Comparison of 13CH3, 13 C H2D, and 13C HD2 methyl labeling strategies in proteins. J. Biomol. NMR, 33, 25–41. 109. Goto, N.K., Gardner, K.H., Mueller, G.A. et al. (1999) A robust and cost-effective method for the production of Val, Leu, Ile (d1) methyl-protonated 15N-, 13C-, 2H-labeled proteins. J. Biomol. NMR, 13, 369–374. 110. Isaacson, R.L., Simpson, P.J., Liu, M. et al. (2007) A new labeling method for methyl transverse relaxation-optimized spectroscopy NMR spectra of alanine residues. J. Am. Chem. Soc., 129, 15428–15429. 111. Sprangers, R. and Kay, L.E. (2007) Quantitative dynamics and binding studies of the 20S proteasome by NMR. Nature, 445, 618–622. 112. Ishima, R., Louis, J.M. and Torchia, D.A. (2001) Optimized labeling of 13CHD2 methyl isotopomers in perdeuterated proteins: Potential advantages for 13C relaxation studies of methyl dynamics of larger proteins. J. Biomol. NMR, 21, 167–171. 113. Gardner, K.H., Zhang, X.C., Gehring, K. and Kay, L.E. (1998) Solution NMR studies of a 42 kDa Escherichia coli maltose binding protein/b-cyclodextrin complex: Chemical shift assignments and analysis. J. Am. Chem. Soc., 120, 11738–11748. 114. Tugarinov, V. and Kay, L.E. (2003) Ile, Leu, and Val methyl assignments of the 723-Residue Malate Synthase G using a new labeling strategy and novel NMR methods. J. Am. Chem. Soc, 125, 13868–13878. 115. W€uthrich, K. (1986) NMR of Proteins and Nucleic Acids, John Wiley & Sons, Inc., New York. 116. Senn, H., Werner, B., Messerle, B.A. et al. (1989) Stereospecific assignment of the methyl 1H NMR lines of valine and leucine in polypeptides by nonrandom 13C labelling. FEBS Lett., 249, 113–118. 117. Neri, D., Szyperski, T., Otting, G. et al. (1989) Stereospecific nuclear magnetic resonance assignments of the methyl groups of valine and leucine in the DNA-binding domain of the 434 repressor by biosynthetically directed fractional carbon-13 labeling. Biochemistry, 28, 7510–7516. 118. Atreya, H.S. and Chary, K.V. (2001). Selective ‘unlabeling’ of amino acids in fractionally 13C labeled proteins: An approach for stereospecific NMR assignments of CH3 groups in Val and Leu residues. J. Biomol. NMR 19, 267–272. 119. Tate, S., Ushioda, T., Utsunomiya, N. et al. (1995) Solution structure of a human cystatin A variant, cystatin A2-98 M65L by NMR spectroscopy. A possible role of the interactions between the N- and C-termini to maintain the inhibitory active form of cystatin A. Biochemistry, 34, 14637–14648. 120. Ohki, S., Eto, M., Kariya, E. et al. (2001) Solution NMR structure of the myosin phosphatase inhibitor protein CPI-17 shows phosphorylation-induced conformational changes responsible for activation. J. Mol. Biol., 314, 839–849. 121. Ostler, G., Soteriou, A., Moody, C.M. et al. (1993) Stereospecific assignments of the leucine methyl resonances in the 1H NMR spectrum of Lactobacillus casei dihydrofolate reductase. FEBS Lett., 318, 177–180. 122. Kainosho, M., Ajisaka, K., Kamisaku, M. et al. (1975) Conformational analysis of amino acids and peptides using specific isotope substitution. I. Conformation of L-phenylalanylglycine. Biochem. Biophys. Res. Commun., 64, 425–432.
52
Protein NMR Spectroscopy
123. Kainosho, M. and Ajisaka, K. (1975) Conformational analysis of amino acids and peptides using specific isotope substitution. II. Conformation of serine, tyrosine, phenylalanine, aspartic acid, asparagine, aspartic acid b-methyl ester in various ionization states. J. Am. Chem. Soc., 97, 5630–5631. 124. Kushlan, D.M. and LeMaster, D.M. (1993) Resolution and sensitivity enhancement of heteronuclear correlation for methylene resonances via 2H enrichment and decoupling. J. Biomol. NMR, 3, 701–708. 125. Vitali, F., Henning, A., Oberstrass, F.C. et al. (2006) Structure of the two most C-terminal RNA recognition motifs of PTB using segmental isotope labeling. EMBO J., 25, 150–162. 126. Skrisovska, L. and Allain, F.H. (2008) Improved segmental isotope labeling methods for the NMR study of multidomain or large proteins: Application to the RRMs of Npl3p and hnRNP L. J. Mol. Biol., 375, 151–164. 127. Yamazaki, T., Otomo, T., Oda, N. et al. (1998) Segmental isotope labeling for protein NMR using peptide splicing. J. Am. Chem. Soc., 120, 5591–5592. 128. Yagi, H., Tsujimoto, T., Yamazaki, T. et al. (2004) Conformational change of H þ -ATPase b monomer revealed on segmental isotope labeling NMR spectroscopy. J. Am. Chem. Soc., 126, 16632–16638. 129. Otomo, T., Ito, N., Kyogoku, Y. and Yamazaki, T. (1999) NMR observation of selected segments in a larger protein: central-segmental isotope labeling through intein-mediated ligation. Biochemistry, 38, 16040–16044. 130. Otomo, T., Teruya, K., Uegaki, K. et al. (1999) Improved segmental isotope labeling of proteins and application to a larger protein. J. Biomol. NMR, 14, 105–114. 131. Z€uger, S. and Iwai, H. (2005) Intein-based biosynthetic incorporation of unlabeled protein tags into isotopically labeled proteins for NMR studies. Nat. Biotech., 23, 736–740. 132. Dawson, P.E., Muir, T.W., Clark-Lewis, I. and Kent, S.B. (1994) Synthesis of proteins by native chemical ligation. Science, 266, 776–779. 133. Dawson, P.E. and Kent, S.B. (2000) Synthesis of native proteins by chemical ligation. Annu. Rev. Biochem., 69, 923–960. 134. Muir, T.W. (2003) Semisynthesis of proteins by expressed protein ligation. Annu. Rev. Biochem., 72, 249–289. 135. David, R., Richter, M.P. and Beck-Sickinger, A.G. (2004) Expressed protein ligation. Method and applications. Eur. J. Biochem., 271, 663–677. 136. Xu, R., Ayers, B., Cowburn, D. and Muir, T.W. (1999) Chemical ligation of folded recombinant proteins: Segmental isotopic labeling of domains for NMR studies. Proc. Natl. Acad. Sci. USA, 96, 388–393. 137. Camarero, J.A., Shekhman, A., Campbell, E. et al. (2002) Autoregulation of a bacterial s factor explored by using segmental isotopic labeling and NMR. Proc. Natl. Acad. Sci. USA, 99, 8536–8541. 138. Cotton, G.J., Ayers, B., Xu, R. and Muir, T.W. (1999) Insertion of a synthetic peptide into a recombinant protein framework: A protein biosensor. J. Am. Chem. Soc., 121, 1100–1101. 139. Chong, S., Mersha, F.B., Comb, D.G. et al. (1997) Single-column purification of free recombinant proteins using a self-cleavable affinity tag derived from a protein splicing element. Gene, 192, 271–281. 140. Takeda, M., Sugimori, N., Torizawa, T. et al. (2008) Structure of the putative 32 kDa myrosinase binding protein from Arabidopsis (At3g16450.1) determined by SAIL-NMR. FEBS J., 275, 5873–5884. 141. Takeda, M., Chang, C.K., Ikeya, T. et al. (2008) Solution structure of the C-terminal dimerization domain of SARS coronavirus nucleocapsid protein solved by the SAIL-NMR method. J. Mol. Biol., 380, 608–622. 142. Torizawa, T., Ono, A.M., Terauchi, T. and Kainosho, M. (2005) NMR assignment methods for the aromatic ring resonances of Phenylalanine and Tyrosine residues in proteins. J. Am. Chem. Soc., 127, 12620–12626. 143. Terauchi, T., Kobayashi, K., Okuma, K. et al. (2008) Stereoselective synthesis of triply isotopelabeled Ser, Cys, and Ala: amino acids for stereoarray isotope labeling technology. Org. Lett., 10, 2785–2787.
Isotope Labelling
53
144. Okuma, K., Ono, A.M., Tsuchiya, S. et al. (2009) Asymmetric synthesis of (2S,3R)- and (2S,3S)[2-13C;3-2H] glutamic acid. Tetrahedron Lett., 50, 1482–1484. 145. Ikeya, T., Terauchi, T., G€untert, P. and Kainosho, M. (2006) Evaluation of stereo-array isotope labeling (SAIL) patterns for automated structural analysis of proteins with CYANA. Magn. Reson. Chem., 44, S152–S157. 146. Takeda, M., Ikeya, T., G€untert, P. and Kainosho, M. (2007) Automated structure determination of proteins with the SAIL-FLYA NMR method. Nat. Protoc., 2, 2896–2902.
3 Resonance Assignments Lu-Yun Lian and Igor L. Barsukov
3.1
Introduction
The assignment of the resonances in the spectrum of a protein to individual amino-acids in the sequence is an essential first step towards detailed studies of the protein by NMR. For determination of the high-resolution structure of the protein, complete proton assignments are required. For more specific studies – especially when the structure of the protein is known – such as identifying regions of interactions with other molecules, characterising the dynamics of the protein, and determination of pKa values, less complete assignments may be sufficient. The first protein resonance assignments were carried out using unlabelled proteins with a strategy based on firstly making through-bond (scalar coupling) connections between protons within an amino acid, followed by sequential through-space (NOE-based) connections between protons in neighbouring amino acids [1]. By the early 1990s several significant advances occurred which changed the way protein NMR spectra were assigned and revolutionised the use of NMR for biomolecular studies. The advances included: the ease of producing quantities of recombinant proteins labelled with 13 C and 15 N, the advent of three-dimensional NMR, major improvements in spectrometer hardware and the development of stable, progressively higher-field magnets. It became possible to assign the NMR spectrum of a protein by simply making covalent connections between the protons, nitrogen and carbon atoms, thereby linking almost all the atoms of entire polypeptide chain without
Protein NMR Spectroscopy: Practical Techniques and Applications, First Edition. Edited by Lu-Yun Lian and Gordon Roberts. 2011 John Wiley & Sons, Ltd. Published 2011 by John Wiley & Sons, Ltd.
56
Protein NMR Spectroscopy
recourse to through-space NOE connectivities [2]. These advances opened up the applicability of NMR for the high-resolution studies of larger proteins; the increased dimension of the NMR data to a large extent resolved the resonance overlap problem, and using connectivities based on covalent interactions to connect two adjacent residues significantly alleviated the problem of rapid transverse relaxation of protons which is characteristic of large proteins. Further progress was made in the late 1990s when TROSYexperiments were introduced [3]; again these experimental techniques were accompanied by technological developments with the availability of NMR spectrometers at field strengths of 18.4 T and above, and the possibility and affordability of making highly deuterated 13 C, 15 N labelled proteins. In addition site-specific selective labelling became more readily available, facilitating side-chains assignment even for very large proteins. In this chapter we describe first the NOE-based resonance assignment strategy which is appropriate for both unlabelled proteins (Section 3.2) and when only 15 N-labelled proteins are available (Section 3.3). We then describe the most common approaches based on triple resonance HSQC and TROSY experiments, the latter being required for proteins over 30 kDa. Section 3.4 describes the resonance assignment of the polypeptide backbone and Section 3.5 that of the side-chains. The triple resonance approach is very efficient and almost all the automatic resonance assignment programmes make use of data from these experiments. It is useful to stress that in both the NOE-based method and the main-chain directed method from the triple resonance 3D data, the essential first step is the identification of spin-systems from individual amino-acid residues and the types of amino acid from which they arise. For some amino acids the chemical shifts of the protons and carbons follow unique patterns. It is advisable when undertaking resonance assignments, to have available a list of proton and carbon chemical shifts of all twenty amino acids; these are readily available from standard text-books or databases. In the second step, the NOE-based method relies on two adjacent residues being close in space whereas the main-chain directed method relies on the presence of covalent linkage (through-bond scalar coupling) between 15 N and 13 C atoms in adjacent amino-acid residues in the polypeptide chain.
3.2
Resonance Assignment of Unlabelled Proteins
It is possible to assign the spectrum of a protein of molecular mass up to about 8kDa using proton-only experiments, without recourse to stable isotope labelling. Although most proteins studied are recombinant proteins, expressed predominantly is E. coli, there are sometimes situations which necessitate the use of unlabelled materials. Examples of these include purified native proteins such as toxins, peptides containing unusual amino acids such as the lantibiotics, and synthetic peptides. In situations when only 15 N-labelled proteins can be made or when 13 C-labelled proteins are prohibitively expensive to produce, 3D 15 N-edited experiments are used; in this case, the strategy for the assignment procedure is identical to that described for unlabelled proteins with the extra 15 N dimension being particularly advantageous for resolving resonance overlap, especially in proteins with very high helical contents. The assignment of the second IgG-binding domain of Streptococcus Protein G (henceforth referred to as GB1) is used here to illustrate the NOE-based sequential resonance assignment strategy [4]. For spin-system identification, Double-Quantum filter COSY
Resonance Assignments
57
(DQF-COSY), and TOCSY data were collected; for sequential assignment, NOESY experiments were acquired. 3.2.1
Spin System Assignments
As indicated above, the overall strategy is, firstly, to identify all the spin systems and identify the type of amino acid from which each originates, and, secondly, to link the proton resonances from each amino acid in such a way as to match the protein or peptide sequence. Spin systems are delineated using through-bond coupling connections whereas sequential links are achieved by making through-space dipolar coupling connections. The protons within an individual amino acid constitute a ‘spin system’. Hence, a spin system is defined as a recognisable pattern of chemical shifts belonging to a particular amino acid. The most convenient way to obtain information on complete spin systems is to use a TOCSY spectrum but firstly, the DQF-COSY spectrum is examined to identify the direct three-bond scalar connectivities. The first region of the spectrum to examine is the CaH-NH region (Figure 3.1); this is called the fingerprint region. The success of the assignment process relies upon obtaining the maximal number of cross-peaks in this region. GB1 has 55 amino acids; hence, the expected number of cross-peaks is 54 (55 minus N-terminal Thr; there are no prolines in GB1). All the expected CaH-NH cross-peaks are observed. Once all these connectivities are identified, TOCSY spectra at 60 and 120 ms mixing times are used to delineate almost all the spin systems by developing the spin system from the NH resonance. The TOCSY pattern of the complete spin system for each representative type of amino acid from GB1 is shown in Figure 3.2; there are no arginine, methionine, histidine, serine, cysteine or proline residues in GB1. Figure 3.3 shows the aromatic side-chain spin system for tyrosine, phenylalanine and tryptophan for data acquired in H2O. Also appearing in this region of the spectrum are the Asn and Gln amide NH2 resonances which are present in the TOCSY but not COSY spectrum. Generally, alanine, threonine and glycine have unique, easily identifiable spin systems. Valine. isoleucine and leucine are in principle distinguishable from each other. However, due to their long side-chain, and the decreased efficiency in coupling magnetisation transfers, the entire spin system may not be easily delineated from a single TOCSY spectrum. Amino acids Asp, Asn, Phe, Trp, Tyr, Cys, Ser show a simpler AMX spin system as far as the NH, CaH and CbH protons are concerned. Of the remaining long side-chains, Lys is distinguishable from the others by the presence of the CeH methylene resonance at around 3.02 ppm. For larger proteins, it is sometimes not possible to delineate the complete spin systems for the long side-chain amino acids by developing only from the amide proton. This is often due either to unfavourable intervening scalar coupling constants or to substantial attenuation of the CaH-NH cross-peaks by amide exchange. For these, the amide to side-chain connectivities are identified as far along the chain as possible and the identification of the spin system is completed by using the aliphatic part of the spectrum. For instance, the methyl region of the TOCSYand DQF-COSY spectra are particularly useful for completing the spin systems of the Leu and Ile residues by obtaining relayed connectivities from the methyl groups to the methylene resonances and ‘meeting up’ with the partial spin system developed from the amide resonances.
58
Protein NMR Spectroscopy
Figure 3.1 NH amide/aromatic (F2 x-axis)- aliphatic (F1 y-axis) region of the [1 H , 1 H ] COSY (left) and [1 H , 1 H ] TOCSY (right) of the second IgG-binding domain of Protein G (GB1) at 1.5 mM in 90 % H2O/10 % D2O at 25 C, pH4.2. These spectra form the starting point for the sequential assignment of the proton spectrum of GB1. The expected 53 cross-peaks from all the amino acids (except the proline and N-terminus amino acids) are observed in the COSY spectrum. These cross-peaks represent the fingerprint region and from these, and in combination with the TOCSY spectrum, spin systems will be developed to obtain the resonance positions for as many resonances for each amino acid as possible. Another GB1 sample in 99.96 % 2 H 2 O was also prepared; this sample was required mainly to simplify the spectrum of the aromatic region where side-chain NH2 resonances from the asparagine and glutamine side-chains complicated the assignment of the aromatic side-chain resonances. Due to potential resonance overlaps, and in order to shift the water resonance to allow resonances near the water peak to be observed, datasets were also collected at two pH values, 4.2 and 3.1 and for each pH value, at two different temperatures, 25 C and 37 C. For all the experiments, water suppression was achieved using gradient pulses
Although identification of the CbH2 resonances of Asn and Cg H2 resonances of Gln are possible from the TOCSY spectra alone, it is also possible to make use of the additional NOE connectivities between the amide side-chain signals with their respective aliphatic side-chain resonances to simplify or confirm these spin systems. This method will distinguish Asn from Asp and Gln from Glu side-chain resonances. Similarly, once the aromatic side-chain resonances are assigned from the distinct spin system for each of the four aromatics rings – Trp, Tyr, Phe and His – in the COSY and TOCSY spectra, these assignments can assist with the identification of the CbH2 resonances from the aromatic residues using NOE connectivities with their respective side-chain aromatic protons as shown in Figure 3.4.
Resonance Assignments
59
Figure 3.2 [1 H , 1 H ] TOCSY spectrum showing connectivities which make up the spin system for a particular amino acid. The mixing times for the TOCSY experiments were 60 and 120 ms. TOCSY spectra at the shorter mixing time showed shorter range scalar connectivities; at the longer mixing time, it is often possible to observe all the scalar-coupled cross-peaks which will define the complete spin system of an amino acid. It is important to ensure that sufficient number of datapoints is collected in the indirect dimension in order to obtain good resolution for the cross-peaks. Missing peaks are generally due to chemical exchange broadening. Shown are spin systems from each of the different amino acids found in GB1. The pattern of cross-peaks, that is the chemical shifts of the protons, can be unique for some of the amino acids; such recognisable patterns are very useful for assigning peaks to amino acid types prior to full sequential assignment
3.2.2
Sequence-Specific Assignments
The sequence-specific resonance assignment is achieved through the interresidue throughspace connectivities obtained from the NOESY spectrum. The most convenient starting points for GB1 are the unique Trp, Ile and Gln residues. The complete spin systems of each of these residues can be identified from a combination of the TOCSY and NOESY experiments. The most useful NOE effects for sequential assignment involve the CaH, CbH of residue i and the NH of adjacent residue i þ 1, daN(i,iþ1) and dbN(i,iþ1), and the NH of residues i and i þ 1, dNN(i,iþ1). These interresidue sequential NOE connectivities connect the CaH, CbH and NH of residue i and the NH of residue i þ 1. Figure 3.5 shows examples of daN(i,iþ1) and dNN(i,iþ1) connectivities that allowed the complete sequential assignment of GB1.
60
Protein NMR Spectroscopy
Figure 3.3 The aromatic region of GB1 showing three-bond covalent coupling (COSY) and relayed covalent coupling (TOCSY, 120 ms mixing) patterns for a sample dissolved in 90 % H2O/10 % D2O. Also shown are the amido NH2 side-chain TOCSY cross-peaks. Assignment to the different aromatic amino acid type is based on the unique cross-peak pattern for each residue type
The GB1 domain used in these experiments does not contain a proline residue. However, if a proline is present, its sequence-specific assignment will be based on the NOE connectivities between ProCdH and CaH(i-1). 3.2.3
Possible Difficulties
The most common difficulties encountered in the approach described above are overlapping and missing peaks. Missing peaks could result from line-broadening due to chemical exchange (see Chapter 7), while resonance overlap is particularly likely to be a problem in proteins with a very high helical content. In the latter case most of the CaH resonances are between 3.5 and 4.5 ppm and the NH between 7 and 8.5 ppm. Steps that can be taken to alleviate these problems include using various temperatures, and pH values (see Chapter 1).
3.3
15
N-Edited Experiments
In the unusual situation when it is only possible to make 15 N-labelled proteins rather than 15 N, 13 C-doubly labelled proteins (for example where poor expression makes 13 C labelling prohibitively expensive), the approach for resonance assignments is identical to the one described for the unlabelled protein. The only difference is that the spin system can be developed from the 15 N-1 HN two-dimensional cross-peak, thereby significantly improving the resolution and overcoming resonance overlap cause by degenerate amide proton resonances. Even in the unfortunate cases when both the 15 N and 1 H resonances are degenerate for a set of residues, the extra 15 N dimension would still reduce the possible
Resonance Assignments
61
Figure 3.4 [1 H , 1 H ]NOESY spectrum of GB1 showing the connectivities between the sidechains of aromatic residues, Gln and Asn which are useful for assigning AMX spin systems. For the aromatic residues, the NOESY data is best acquired for a sample dissolved in 2 H 2 O in order to simplify the spectrum and remove many of the connectivities involving labile amide protons. In addition, the 2 H 2 O data will distinguish the aromatic connectivities from the Gln and Asn, since the NH2 groups from Gln/Asn will be absent. For sequential assignment, NOESY experiments with mixing times of 100 ms and 150 ms were acquired
candidate residues at these positions to a very small number; often changing the temperature and/or pH would cause small chemical shift changes sufficient to resolve these signal overlap. For the aromatic side-chains and proline residues, it is still necessary to analyse the COSY, TOCSY and NOESY spectra of the unlabelled protein. Figure 3.6 illustrates slices through the 15 N-dimension of the of 15 N-edited TOCSY spectrum and Figures 3.7 and 3.8 show the use of 15 N-edited NOESY for the sequential resonance assignments, using strips for a-helical region A23-N37 and the b-strand region K4-L12. Figures 3.7 and 3.8, respectively, show clearly the characteristic strong dNN (i, i þ 1) and d aN (i, i þ 1) connectivities for the a-helical and b-sheet regions.
62
Protein NMR Spectroscopy
Figure 3.5 Examples of daN(i,iþ1) (upper) and dNN(i,iþ1) (lower) connectivities of the 150 ms NOESY spectrum of GB1. The HN(i þ1)-Ha(i) sequential connectivities for some of the residues are indicated; peaks are labelled using the format shown (e.g. I7/V6 ¼ I7NH/V6Ha). For HN (i þ1)-NH(i) connectivities, the position of the NH peaks are indicated in the diagonal, and the sequential HN(i þ 1)-NH(i) connectivities are indicated in the off-diagonal cross-peaks
3.4 3.4.1
Triple Resonance 3D Triple Resonance
If uniform 13 C, 15 N (and when necessary 2 H) isotope-labelling is feasible, powerful triple-resonance experiments are used to assign backbone resonances. The principles of the experiments are extensively described in NMR textbooks [5,6] and reviews [7,8] and are not discussed here (see Chapter 4). Most of the variations of the triple resonance experiments are incorporated into the standard pulse sequence libraries. The main spectra used for assignments correlate 1 H and 15 N resonances of each NH group with one or more carbon or proton resonances of given or preceding residue, as summarised in Figure 3.9. The 15 N-1 HN correlations are common in all the experiments and are detected independently in the 1 H,15 N-HSQC spectra. The triple-resonance correlations can be divided into two categories depending on the coherence transfer pathways. The first category is based on the N-C0 transfer and is restricted to the connection between the NH group of residue i and
Resonance Assignments
63
Figure 3.6 Example of the TOCSY connectivities observed in the 3D [15 N ] HSQC-TOCSY spectrum of GB1. The residue assigned to the spin system is indicated at the top of each slice. The number at the bottom left-hand corner of each slice is the nitrogen chemical shift of the amide resonance and the position on the x-axis, the corresponding proton shift. Extra peaks in each slice are due to overlapping peaks in the amide nitrogen and proton dimensions. The arrows for T18 and T16 indicate that the Hg resonances are present although not plotted whereas for T17, the Hg resonance is not observed due to line-broadening
0 the carbon atoms of the presiding residue i-1. This generates correlations Hi Ni Ci1 b a a in HNCO, Hi Ni Ci1 in HN(CO)CA and Hi Ni Ci1 =Ci1 in CBCA(CO)NH experiments. The second type involves N-Ca transfer that is possible between Ni and Cai of the same residue, as well as between Ni and Cai1 of the preceding residue, leading to the correlations b b 0 Hi Ni Ci0 =Ci1 in HN(CA)CO, Hi Ni Cai =Cai1 in HNCA and Hi Ni Cai =Ci =Cai1 =Ci1 in HNCACB experiments. Cross-peaks corresponding to the sequential correlations are usually substantially weaker or absent because of a smaller value of the two-bond N-Ca coupling. The cross-peaks corresponding to Ca and Cb atoms are of opposite sign in HNCACB experiments and this helps to distinguish the Ca and Cb resonances in Ser and Thr residues; however, it may lead to the cancellation of cross-peaks if these chemical shifts are very similar. The assignment procedure can be summarised as matching intra-residue 13 C chemical shifts identified from one NH group with the sequential chemical shifts identified from a different NH group. Normally the procedure is subdivided into two relatively distinct stages – (i) identification of the correlations for each HN group and grouping them into spin-systems, similar to the concept described for the assignment of unlabelled proteins; and (ii) assembling this information in a sequential order that follows residue connections by through-bond coupling, rather than through-space coupling, in the polypeptide chain. The main steps of the assignment protocol are listed in Figure 3.10 and described in detail in the following text.
64
Protein NMR Spectroscopy
Figure 3.7 Example of the dNN(i,iþ1) connectivities for residues A23 to N37 observed in the 3D [15 N ] HSQC-NOESY spectrum of GB1. This region forms an alpha helix secondary structure. The assigned residue is indicated at the top of each slice. The numbers at the bottom are the nitrogen and proton chemical shifts of the amide resonance. Starting from 1 H , 1 H diagonal of either unique or easily identifiable spin systems (and hence residue type), the NOE peaks to the NH of the adjacent residue is identified as shown by the dash arrows. By arranging the slices from adjacent residues next to each other, it is possible to ‘walk’ along the polypeptide backbone, enabling sequence specific resonance assignments to be made. For each slice, both dNN(i,iþ1) and dNN(i,i1) connectivities are observed. The dNN(i,i1) connectivities are shown as solid arrows
3.4.1.1 Identification of Spin Systems Spin-system grouping is usually achieved by starting from a ‘root’ experiment and locating cross-peaks in other triple-resonance experiments that have the same 1 HN and 15 N chemical shifts, as schematically illustrated in Figure 3.11. The best spectra to use as a ‘root’ are those of highest sensitivity and resolution, preferably containing a single cross-peak per residue. For small to medium size proteins 1 H, 15 N-HSQC is a usual root spectrum as most of the cross-peaks are normally well resolved. In case of a strong overlap, as is common for large or unfolded proteins, the HNCO spectrum provides a good alternative. Cross-peaks in the root spectrum are picked automatically and a separate spin-system will be developed from each of the cross-peaks. Obvious noise and artefact cross-peaks need to be removed prior to further steps. Cross-peaks in other triple-resonance spectra are then picked automatically or manually at the positions close to those of the root peaks. When using the automatic peakpicking procedure, the range for the peak search is set equal to the line-width at half height for 1 H and 15 N-dimensions, while for the manual peak-picking orthogonal slices are best displayed at the 1 H and 15 N-coordinates of the root peaks for the interactive peak selection.
Resonance Assignments
65
Figure 3.8 Example of the daN(i,iþ1) connectivities for residues K4 and L12 observed in the 3D HSQC-NOESY spectrum of GB1. This region forms a beta-sheet secondary structure. The assigned residue is indicated at the top of each slice. The numbers at the bottom are the nitrogen and proton chemical shifts of the amide resonance. Starting from the NOESY cross-peak between Leu12NH(residue i þ 1) and T11 Ha(residue i) on the right-hand panel, the horizontal arrows link the aN(i,i þ 1) cross-peak with the intraresidue aN cross-peak. The vertical arrows link the intraresidue cross-peak to the next interresidue cross-peak. By arranging the slices from adjacent residues next to each other, it is possible to ‘walk’ along the polypeptide backbone, enabling sequence specific resonance assignments to be made. There are two aH cross-peaks for G9 but for simplicity, only one of the two daN(i,iþ1) connectivities is shown
The choice between manual and automatic peak picking depends primarily on the quality of the spectra. For high sensitivity, artefact-free spectra the automatic method is fast and reliable, while for spectra with low signal-to-noise ratio automated peak-picking may either generate too many noise peaks or miss some real peaks with low intensities. In practice, a combination of the two methods usually works best, with automated peak-picking performed at a high threshold level, followed by manual inspection at a contour level set close to that of the noise. Even if manual peak picking is not used, it is best to check manually the results of peaks picked automatically to ensure that no cross-peaks have been missed due to overlap or low intensity, as having noise-free and reliable peak tables can save substantial time at the later stages. Peaks from different spectra are combined into spin systems by correlating them with the root cross-peaks. For well-resolved root peaks the correlation can be done automatically by grouping together cross-peaks that are within a certain tolerance from the root peak in the 1 H and 15 N-dimensions. The tolerances are normally set to a fraction of the line width (in practice 0.02–0.03 ppm in the 1 H and 0.2–0.3 ppm in the 15 N dimension) and can be determined by inspecting strips corresponding to several well-resolved root peaks. The results of the peak correlation need to be checked manually by going through all the root
66
Protein NMR Spectroscopy
peaks and displaying corresponding slices in all spectra. For well-resolved root peaks a single-slice orientation, normally the 1 H,13 C view, is sufficient, while the overlapped peaks need to be examined carefully using both orientations of the slices (1 H,13 C and 15 N,13 C). Any inappropriately grouped peaks must be corrected manually at this stage and new root peaks and spin systems created if needed. The grouped peaks are also compared with the expected number in each of the spectra. In particular, a spin system should contain: one cross-peak in each of HNCO and HN(CO)CA spectra, two cross-peaks in each of HNCA and HN(CA)CO spectra, of which one corresponds to the cross-peak in the corresponding CO-based experiment, two cross-peaks in CBCA(CO)NH spectra and four cross-peaks in HNCACB spectra, with opposite phases for CaH and CbH peaks. Some of the expected cross-peaks may not be present because of their low intensity, but a number of peaks that is larger than expected would indicate an overlap between different spin systems. The intensities of the cross-peaks need to be checked for large deviations. Presence of very intense sharp peaks corresponds to dynamic unstructured regions of the proteins, while small peaks could indicate impurities or multiple states of the protein.
Figure 3.9 Correlations observed in triple-resonance experiments. All experiments record resonances of NH groups correlated to different 13 C or 1 H intra-residue or sequential resonances, as marked by circles for each experiment type. Maximum number of cross-peaks in each experiment for individual residues corresponds to the number of encircled non-HN atoms. Experiments that provide complimentary intra-residue and sequential correlations are arranged horizontally
Resonance Assignments
67
Figure 3.10 Main steps of the assignment protocol based on triple resonance experiments
The chemical shifts of 13 Ca and 13 Cb strongly depend on the residue type as summarised in Figure 3.12. In particular, Gly, Leu, Ile, Val, Thr, Ser, Ala and Pro residues tend to have highly characteristic chemical shift values that can be used to identify them once the spin systems are assembled. This is achieved either by manual inspection of the spin systems or automatically by assigning scores based on the agreement between the observed values and
68
Protein NMR Spectroscopy
Figure 3.11 Spin system identification in triple-resonance spectra. Cross-peaks in a 1 H , 15 N -HSQC spectrum are picked automatically and used as roots. For each root peak 1 H ,13 C planes corresponding to the 15 N chemical shift are selected in triple-resonance spectra and cross-peaks are peaked at the 1 H chemical shift position. In this example all four cross-peaks are present in the HNCACB experiment with the sequential cross-peaks of significantly lower intensity than the intra-residue cross-peaks. Positive cross-peaks represented by black and negative by grey contours. Notice opposite sign of C a and C b cross-peaks for HNCACB spectrum that helps with correlation type identification. For overlapping HSQC cross-peaks that have close 1 H chemical shifts as two peaks highlighted by a box, orthogonal 15 N ,13 C planes in the triple-resonance experiments are used to resolved correlations
the value expected for each type of residue. Comparison of the number of spin systems identified for each residue type with the number of residues of this type in the sequence shows the quality of the data and serves to highlight problems at this early stage. If the number of spin systems corresponding to unambiguously identified residue types is larger than expected from the protein sequence, multiple protein forms or impurities may be present. The dominant form can usually be identified from the intensities of the cross-peaks. On the other hand, a smaller than expected number indicates missing cross-peaks or incorrectly assembled spin systems. 3.4.1.2 Sequential Assignment At the second stage of assignment, spin-systems are arranged sequentially by matching intra-residue peaks of one spin systems with sequential peaks of another, as illustrated in Figure 3.13a. The sequential matching of the spin systems is normally performed within the spectral analysis software which generates a list of possible connections for each spin system ranked according to the matching quality. The high scoring connections are checked
Resonance Assignments
69
Figure 3.12 Mean values and standard deviations of 13 C a and 13 C b chemical shifts for amino acids in nonparamagnetic proteins (data from http://www.bmrb.wisc.edu/). Amino acids with distinct chemical shifts are marked in italic bold. This information is used to determine residue types
graphically by displaying strips drawn through the corresponding spin systems. Particular attention must be paid to the accuracy of the chemical shift match and similarities of the peak shapes. Computer matching is necessarily based on tolerances that are higher than can be detected visually to account for possible noise and peak distortions. This can generate incorrect matches that are straightforward to identify graphically. Additionally, the sequential spin systems are checked for similarities in the intensities and line widths, as their dynamic properties are usually very similar. Large differences may indicate that the spin systems are not sequential or the presence of multiple forms. Although any spin system can be use to start sequential matching, it is often beneficial to use a spin system with distinct 13 C chemical shift values, as illustrated in Figure 3.13b. Since each spin system has information on intra-residue and sequential shifts, their values define a dipeptide fragment, which may be unique in the protein sequence. In such case the spin system can be assigned to a specific position in the protein sequence, such as the last spin system in Figure 3.13b that corresponds to the unique AA fragment. For the selected spin system, sequential connections to the preceding and following spin systems are checked and, if unique, used to generate a sequence of linked spin systems. The process is repeated for the newly joined spin systems until sequential connections become ambiguous. Residue types in the stretch of connected spin systems determined from the 13 C chemical shift values are checked against the protein sequence. Stretches of spinsystems that match unique positions in the sequence are assigned to the specific residues. The partial sequence-specific assignment reduces the number of unassigned residues, while the identified sequential connections reduce the number of spin systems available for the assignments, decreasing the complexity of the system. The procedure is repeated with the unassigned residues and unconnected spin systems.
70
Protein NMR Spectroscopy
Figure 3.13 Sequence-specific assignment. (a) Finding sequential spin systems. Intra-residues peaks are identified for spin system A in HNCACB (left) and HN(CA)CO (right) spectra shown in grey. 13 C shifts of these peaks are matched against sequential peaks in complimentary CBCA (CO)NH (left) and HNCO (right) spectra shown in black for all other spin systems. Four different spin systems B-E with at least one matching shift are shown. Only spin system B has all three matching sequential shifts and is selected as following spin system A in the protein sequence.
Resonance Assignments
71
The complete assignment procedure is illustrated in Figure 3.14 for a hypothetical four residue peptide fragment LQTE using schematic spectra for clarity. (i) Planes are selected in 3D HNCACB and CBCA(CO)NH spectra on the basis of 15 N chemical shifts in the root HSQC spectrum and peaks corresponding to intra-residue and sequential 13 Ca and 13 Cb nuclei are identified in the planes. Spin system B has highly characteristic intra-residue chemical shifts of Thr or Ser and can be assigned to the position in the sequence that includes only a single Thr residue. (ii) Spin system preceding to B is identified by comparing CBCA (CO)NH chemical shifts of this spin system with HNCACB chemical shifts of other spin systems. Only spin system D has matching chemical shifts, resulting in a unique sequential connectivity. (iii) Spin system C that follows B is identified by comparing HNCACB chemical shifts of this spin system with CBCA(CO)NH chemical shifts of other spin systems. Matching chemical shifts demonstrate that spin system A precedes D, corresponding to the first residue of the fragment. (iv) Strips from the HNCACB and CBCA(CO)NH spectra are arranged in the sequential order demonstrating matching connectivities. In addition, chemical shifts of spin system A have values characteristic for Leu, in agreement with the sequence. Often in the course of the assignment it becomes apparent that the spin system list is incomplete as no sequential connections can be identified for some of the spin systems. The main reasons for this are: (i) no signals are detected for some residues because of the resonance broadening and/or low signal-to-noise ratio; (ii) spin systems not identified correctly due to resonance overlap; (iii) incomplete peak picking in some of the spectra, particularly the root spectrum. The problem can be resolved by the direct analysis of the triple resonance spectra, as illustrated in Figure 3.15. If found, new spin-system(s) are added to the list and the assignment procedure is resumed. If no new matching spin-systems are found this way more sensitive experiments may be required to detect the signals for the particular region of the protein. Note that residues following and preceding prolines will have missing sequential spin-systems. Spin-systems of the residues that follow prolines can often be identified by the characteristic chemical shift values of sequential Ca and Cb cross-peaks. For small globular proteins, backbone resonance assignment is usually a straightforward process that can be often completed with just a complementary pair of HNCACB/CBCA (CO)NH experiments. For larger proteins some regions may remain unassigned after following the standard procedure because of overlapping or missing cross-peaks. In such cases additional experiments can be used to improve and/or validate the assignments. Highly sensitive complementary data are available from [1 H,15 N]-NOESY-HSQC/ TOCSY-HSQC experiments discussed above. When a [13 C,15 N]-labelled sample is available it is more effective to start with triple-resonance experiments because of their higher 3 Figure 3.13 (Continued) Note good shift match for spin system D in HNCACB/CBCA(CO)NH, but a clear mismatch in HN(CA)CO/HNCO pair. Use of multiple spectra often helps in resolving ambiguity of sequential matches. (b) Assigning sequentially linked spin systems to the positions in the protein sequence. Five spin system are connected sequentially through chemical shift matches in HNCACB/CBCA(CO)NH (left) and HN(CA)CO/HNCO (right) experiments. Three of the spin systems highlighted in italic bold have distinct chemical shifts that can is used identify sequence TxxAA corresponding to this stretch of spin systems. The protein sequence has only one corresponding fragment TDEAA, allowing unambiguous assignment of these spin systems to the positions in the protein sequence
72
Protein NMR Spectroscopy
Figure 3.14 Illustration of the assignment procedure for a hypothetical four residue peptide fragment LQTE. Cross-peaks are schematically shown by circles. CBCA corresponds to HNCACB and CBCACO to CBCA(CO)NH spectra. Larger and smaller circles in the HNCACB spectrum represent intra-residue and sequential cross-peaks, respectively. (i) Selection of strips in the 3D spectra corresponding to the root HSQC peaks. Cross-peaks of the 3D spectra are assembled into spin systems A–D. (ii) Identification of the spin system preceding spin system B through the match between CBCA(CO)NH peaks of B and HNCACB peaks of other spin systems. Chemical shifts of B correspond to Ser or Thr residue. (iii) Identification of the spin system following spin system B through the match between HNCACB peaks of B and CBCA(CO)NH peaks of other spin systems. (iv) Strips from the 3D spectra arranged in the sequential order based on the matching chemical shifts
Resonance Assignments
73
Figure 3.15 Identifying sequential connections by direct analysis of HNCACB (grey) and CBCA(CO)NH (black) spectra. From left to right: to locate spin-system that follows spin-system A, [1 H , 15 N ]-planes of CBCA(CO)NH spectrum are displayed at the chemical shifts corresponding to intraresidue Ca and Cb (marked respectively). Cross-peaks with the same [1 H ,15 N ] coordinates in both planes are identified (dashed lines) and [1 H ,13 C ] strip corresponding to these cross-peaks is compared with the strip containing spin-system A. In this example only one spin-system B has cross-peaks present in both Ca and Cb planes. Similar procedure is applied to identify preceding spin system using sequential chemical shifts of the Ca i-1 and Cb i-1 to select the planes and HNCACB spectrum to identify spin-systems. For large proteins or relaxation broadening one of the peaks in the Ca and Cb planes may be missing, in which case spinsystems corresponding to the single peaks are checked
resolution and lower ambiguity and use the [1 H,15 N]-based experiment for validation and for difficult regions. Even if the triple-resonance assignment is straightforward, the independent validation improves the reliability of the assignments. Additionally, a complementary pair of HBHANH/HBHA(CO)NH triple-resonance experiments can be used to resolve ambiguities and derive assignments for Ha and Hb protons. These experiments provide selective correlations between the signals of the NH group and intraresidue and/or sequential Ha and Hb protons and may allow their detection more reliably than [1 H,15 N]-NOESY-HSQC/TOCSY-HSQC experiments. The sensitivity of the experiments is the main limiting factor of their usage, particularly for HBHANH. As a guidance, the expected sensitivity is at least 50 % lower that of the corresponding HNCACB/ CBCA(CO)NH experiments due to the fast relaxation of proton magnetisation. If sensitivity is a limiting factor, HA(CA)NH/HA(CO)NH experiments can be used to detect Ha correlations. In summary, on completion of the triple-resonance assignments it is best to validate the results using (i) 1 H,15 N NOESY-HSQC/TOCSY-HSQC and (ii) HBHANH/ HBHA(CO)NH experiments.
74
Protein NMR Spectroscopy
3.4.1.3 Proline Residues Due to the absence of amide protons no intra-residue correlations are observed for proline residues in the triple-resonance experiments. However, the Ca/Cb and Ha/Hb resonances of prolines followed by a non-proline residue can be detected and assigned in CBCA(CO)NH and HBHA(CO)NH spectra, respectively. Poly-proline stretches can be assigned using experiments that correlate CaH signals with intra-residue or sequential resonances. The most sensitive complementary pair comprises the HACAN/HACA(CO)N experiments. Note, that in contrast to the sequential CACB(CO)NH, the HACA(CO)N experiment correlates the CH resonances with the 15 N resonance of the following, rather than preciding residue. Since many proteins contain short polyproline sequences, detection of a single sequential connection is often sufficient for unambiguous assignment. For longer stretches additional triple-resonance experiments can be used, although they have reduced sensitivity. Alternatively, the [1 H,13 C]-NOESY-HSQC experiment can be applied to detect sequential NOEs from the CdH2 group. With respect to the NOE connectivities, this group has characteristics similar to the backbone NH group of a nonproline residue, with dH2i/aHi1 and dH2i/dH2i1 NOEs detectable for the majority of the residues. In practice, chemical shifts of CdH2 and CaH groups are similar, with the corresponding cross-peaks located near the diagonal, and the signal dispersion is lower than for the HN signals. This makes the cross-peaks difficult to resolve and often requires the [1 H,13 C]-NOESY-HSQC spectrum collected in 100 % 2 H2 O to minimise the baseline distortions from the water signal. 3.4.2
4D Triple Resonance
Small and medium size globular proteins can be usually fully assigned with 3D triple resonance spectra. In some cases, however, signal separation in the 3D spectra may be insufficient for establishing unambiguous sequential connectivities. The use of 4D spectra may help to resolve ambiguity, although often at the expense of sensitivity and measurement time. Triple-resonance 4D spectroscopy offers two types of improvements. The first helps with resolving individual spin-systems, as schematically illustrated in Figure 3.16a using HNCOCA correlations as an example. If both 1 H and 15 N resonances of two residues overlap, only a single cross-peak is observed in the 1 H-15 N HSQC spectrum and the 13 Ca chemical shift of the HN(CO)CA spectrum cannot be associated with the 13 CO chemical shift detected in the HNCO spectrum as all the cross-peaks will have identical 1 H and 15 N coordinates. In the 4D HNCOCA spectrum the peaks have four coordinates – 1 H, 15 N, 13 Ca and 13 CO and the correct combination of CO and Ca chemical shift is automatically detected. All triple resonance experiments that involve two or more different 13 C nuclei can be acquired in a 4D mode, allowing detection of Hi NiCOi1 Cai1 correlations in 4D HNCOCA, Hi Ni COi1 Cai1 Cbi1 in 4D HNCOCACB and Hi NiCai Cbi in 4D HNCACB spectra. In practice, complete overlap of both 1 H and 15 N resonances is rare and the ambiguity in the spin-systems can be resolved by checking different combinations of chemical shifts, with only a single combination usually showing sequential connectivities. The second type of 4D spectrum allows direct correlation of HN resonances of two sequential residues in 4D HN(COCA)NH experiment as illustrated in Figure 3.16b. The coherence transfer involves two relatively small couplings: J(Ni,COi1) and J(Ni1,Cai1 ), which strongly decreases the sensitivity of the experiment as the molecular weight increases. For that reason, it is only applicable to small, unfolded or deuterated proteins.
Resonance Assignments
75
Figure 3.16 Use of 4D spectra in resonance assignment. (a) Resolving spin systems with simultaneous overlap of 1 H N and 15 N resonances. In this case both 1 H ,15 N -HSQC (left) crosspeaks of the spin-systems 1 and 2 have the same coordinates and cross-peaks of the tripleresonance experiments cannot be separated into individual spin-systems, as schematically illustrated in the figure for the 3D HN(CO)CA and HNCO experiments. Each of the corresponding slices contains two cross-peaks and it is not possible to decide which one of the HN(CO)CA and HNCO cross-peaks belongs to the same spin-system. The 13 C a and 13 C O chemical shifts can be correlated in the 4D HNCOCA experiment, allowing separation of the cross-peaks into spin-systems. (b) Direct sequential correlation of the 1 H ,15 N -HSQC cross-peaks with 4D HN (COCA)NH experiment. The 1 H ,15 N -HSQC (left) cross-peaks of the sequential spin-systems are labelled i and i-1. The [Hi,Ni] plane of the 4D experiment contains a single cross-peak with coordinates [Hi1,Ni1] corresponding to the 1 H ,15 N -HSQC cross-peaks of the preceding spinsystem. These coordinates are used to select the next 4D plane and to identify the next sequential spin-system (right). By repeating the plane selections spin-systems are directly aligned in a sequential order. Note that the correlations of the spin-system X are not present in the selected planes despite of the overlap of the 1 H resonances
When sensitivity is sufficient, the experiment is extremely powerful, as sequential 1 H-15 N cross-peaks can be directly and unambiguously identified from the coordinates of the 4D cross-peak. For resolved HSQC peaks the assignment process is reduced to displaying the HiNi plane in the 4D spectrum, identifying the Hi1Ni1 cross-peak, displaying the Hi1Ni1 plane, identifying the Hi2Ni2 cross-peak and continuing the process until the
76
Protein NMR Spectroscopy
sequential connections become ambiguous. As each cross-peak in the 4D spectrum is directly associated with an 1 H-15 N HSQC peak, the correlation of the peaks in the 4D spectrum automatically generates a set of sequential spin-systems. This set can then be associated with the protein sequence using 13 C chemical shifts detected in 3D tripleresonance experiments. In addition to low sensitivity, 4D experiments suffer from limited digital resolution, which increases the cross-peak overlap and introduces additional ambiguity. To overcome this, the 4D spectrum is normally combined with 3D tripleresonance experiments. 3.4.3
Computer-Assisted Backbone Assignments
Algorithmically, backbone assignment based on triple-resonance data is reduced to matching well-defined chemical shifts sequentially and comparing them to the values expected for different residue types. Such a procedure is highly suitable for automation, which led to the development of a number of systems with a various degree of manual contribution. Graphical spectral analysis packages such as Sparky [9] and CCPN Analysis [10] have internal modules to assist with the manual assignment. The user is presented with a list of sequentially matching spin-systems and possible positions for the spin systems in the sequence. Based on this information the spin-systems can be connected into fragments and matched against the proteins sequence. Once unique positions are found the chemical shifts are assigned to the appropriate residues with a single mouse click. Such assistance significantly accelerates the assignment process, but the user still has to make decisions on the uniqueness of the sequential connectivities and positions in the proteins sequence. High ambiguity due to low chemical shift dispersion and strong resonance overlap can make the manual procedure unreliable. The assignment procedure can be enhanced by fully automated approaches with the programs such as AUTOASSIGN [11] or MARS [12]. In the automated procedures the connectivities and the positions in the protein sequence are optimised simultaneously for all spin systems using all the information available, rather than a subset related to a small number of spin systems that are possible to analyse manually. This improves the reliability of the assignments, particularly when some of the spin systems are missing. The drawback of the automated procedures is often a long-range effect of incorrectly assigned spin systems, where assignment mismatch occurs in several regions of the protein sequence. This is particularly prominent when a minor form of the same protein is present, leading to duplication of spin systems. The results of the automated assignment need to be checked graphically for mismatch in the sequential connectivities and between the residue type and chemical shifts of the spin system. In addition, the intensities of the cross-peaks corresponding to the residues in sequential positions should be comparable, unless selective resonance broadening indicates intensity loss due to relaxation. 3.4.4
Unstructured Proteins
Unstructured proteins (see Chapter 9) are characterised by low chemical shift dispersion, leading to strong resonance overlap and assignment ambiguities. At the same time the relaxation properties are favourable for the detection of triple resonance experiments at high resolution and sensitivity. Often the resolution in the HSQC experiment is sufficiently high
Resonance Assignments
77
to separate the majority of the cross-peaks and most of the ambiguities are related to the overlap of 13 C resonances. To resolve such ambiguities it is often sufficient to collect spectra which have high resolution in the 13 C dimension, such as the HNCO/HN(CA)CO spectra. Additionally, the HBHANH experiment can be recorded with high sensitivity and resolution, offering additional sequential connectivities in combination with the HBHA(CO)NH experiment. If the increased resolution and detection of additional sequential connectivities is still insufficient for the unambiguous assignment, the 4D HN(COCA)NH experiment can be recorded to observe direct correlation between HN resonances. The overlap between HN resonances can be reduced by measuring triple-resonance experiments at different temperatures. 3.4.5
Large Proteins
The sensitivity of triple-resonance experiments for large proteins is low due to fast relaxation. The uniform deuteration of the protein in combination with TROSY method for the relaxation compensation has been successfully used to overcome the relaxation limitations and led to the backbone assignment of the proteins as large as 100 kDa [13]. The assignment strategy is similar to that of the smaller proteins, with the chemical shift information derived from the corresponding TROSY variants of the triple-resonance experiments [14]. The main challenge in the assignment procedure is the ambiguity caused by the large number of resonances and the lack of unique sequential combinations of residues. The ambiguities in the sequential connection of the spin systems can be resolved in 4D spectra, with the 4D HN(COCA)NH often sensitive enough due to deuteration. The effectiveness of the assignment is dramatically improved with the use of automated assignments software that can analyse a large number of assignment combinations. Expression of proteins in 2 H2 O leads to deuteration of amide groups (see Chapter 2 for more details of protein deuteration). Most of these groups become protonated on transfer to H2O-based buffer, although a number of groups may remain deuterated for a significant time in case of a stable protein fold. The resonances of these residues cannot be detected until significant fraction of the group is protonated, leading to missing signals in the tripleresonance experiments. In such cases an additional set of spectra needs to be recorded once all the groups are protonated. In some cases the full exchange may require special buffer conditions. Nonetheless, the reduced number of signals in the spectra at the initial stage may be beneficial for resolving ambiguities in the assignments.
3.5
Side-Chain Assignments
On completion of backbone assignments using triple-resonance experiments both 1 H and 13 C chemical shifts of CaH and CbH groups are normally known. If the assignment was done by using only 13 C chemical shifts, the corresponding 1 H shifts can be determined in HBHA (CO)NH/HBHANH experiments. Starting from CaH and CbH groups, the rest of the sidechain can be assigned using a combination of H(C)CH-TOCSY and (H)CCH-TOCSY spectra, as illustrated in Figure 3.17. These experiments correlate a cross-peak in a [1 H,13 C]-HSQC experiment with all proton or carbon resonances within the same sidechain, respectively. The H(C)CH-TOCSY spectrum is usually best measured with the
78
Protein NMR Spectroscopy
Resonance Assignments
79
acquisition dimension corresponding to that of HSQC to maximise separation of resonances correlated to different groups. With sufficient sensitivity all resonances are observed within the CaH and CbH slices and the only uncertainty is the relation between the 1 H and 13 C resonances. The carbon chemical shifts can normally be assigned to specific positions in the side-chain on the basis of the chemical shifts and to a certain degree this is possible for protons as well. The assignments are validated by displaying the slice corresponding to the newly assigned resonances alongside the slices with the confirmed assignments, as well as checking the presence of the cross-peak in the [1 H,13 C]-HSQC spectrum. If matches are not found, a different combination of 1 H and 13 C shifts is tested. On completion of the procedure a set of matching strips is identified in the spectra, validating the side-chain assignments (Figure 3.17). This assignment procedure is quick and straightforward to apply for small globular proteins, but for larger proteins can be complicated by resonance overlap and missing crosspeaks. In the case of the missing cross-peaks additional resonances can be identified while displaying slices further along the side-chain. In particular, for Leu and Ile strong correlations are observed between CaH and methyl groups, but other correlations may be absent in these slices because of the fast relaxation for non-methyl groups. However, in the slices corresponding to the methyl groups such correlations are present due to a shorter transfer path and can be identified once the methyl groups are assigned. The more sensitive HCCH-COSY experiment is beneficial for detection of missing correlations and identification of direct correlations, although the cross-peak separation in this experiment is lower than in HCCH-TOCSY. For large proteins the information from HCCH-TOCSY experiments may be too ambiguous for reliable assignments, in particularly for long side-chains. In such cases 15 N-separated NOESY-HSQC can be used to resolve ambiguities. This experiment benefits from a high dispersion of HN resonances and high sensitivity even for large proteins due to efficiency of the NOE transfer. To aid in the assignment the slices of 15 N-separated NOESY-HSQC corresponding to the intra-residue and sequential NH groups are displayed alongside the H (C)CH-TOCSY strips and peaks are cross-checked. The majority of intra-residue and sequential NOEs are usually observed in the NOESY-HSQC spectra, particularly for large 3 Figure 3.17 Side-chain assignments using a combination of HCCH-TOCSY (top) and (H)CCHTOCSY (bottom) experiments. The example illustrates assignment of the Ile side-chain. Slices on the left corresponding to CaH and CbH groups are selected on the basis of chemical shifts determined in the course of the backbone assignments. These slices display cross-peaks associated with other 1 H and 13 C atoms of the side-chain, as marked. Combinations of chemical shifts corresponding to the specific CH groups are used to select additional slices in the 3D experiments and match them against the CaH and CbH slices (right). In this example 1 H chemical shifts determine the position of the plane, while 13 C chemical shifts are used to identify the cross-peaks in the plane. The 13 C resonances can usually be assigned to the specific atoms in the side-chain on the basis of the chemical shift, while assignment of the 1 H resonances may be ambiguous. The ambiguity is resolved by checking different combinations of the 1 H and 13 C chemical shits in the 1 H ,13 C -HSQC and 3D HCCH-TOCSY/(H)CCH-TOCSY spectra. Incorrect combination often does not correspond to any cross-peak in the 1 H ,13 C -HSQC spectrum and has no cross-peaks matching CaH and CbH slices. Some of the cross-peaks in the 3D spectra may be missing due to low sensitivity
80
Protein NMR Spectroscopy
proteins. As the result, the cross-peaks present in H(C)CH-TOCSY spectra that do not have a corresponding NOESY-HSQC peak usually belong to a different side-chain. The 15 N-separated NOESY-HSQC spectra may also help to detect missing side-chain resonances. For unstructured proteins with high internal mobility, resonance overlap is the largest difficulty in the side-chain assignments. However, 15 N-HSQC still has sufficient peak separation and H(CCCO)NH, H(CCCA)NH, C(CCO)NH and C(CCA)NH experiments that correlate side-chain resonances with the signals of the backbone NH-groups provide a powerful assignment strategy. In these experiments all intra-residue or sequential 13 C or 1 H chemical shifts can be observed within a single slice. The assignment of the resonances to the specific groups in the side-chain is done on the basis of the chemical shifts and validation against 13C-HSQC and HCCH-TOCSY spectra. Relatively low sensitivity of the experiments restricts their usability to unstructured or small proteins, unless proteins are deuterated to reduce relaxation. Once completed, the assignments can be validated in 3D 13 C-separated NOESY-HSQC measured in H2O. In addition to correlations detected in H(C)CH-TOCSY experiment, the NOESY spectra contain well-resolved correlations between the side-chains and HN-groups that can be reliably detected even in large proteins. In most cases Ha, Hb and methyl proton resonances exhibit intra-residue and sequential correlations with the resonances of the corresponding HN protons. Missing cross-peaks may be caused by unfavourable relaxation properties, but if neither intra-residue nor sequential peaks are present while cross-peaks to other HN resonances are observed from the same group the assignment is likely to be incorrect. The validation is best conducted on a residue basis by simultaneously displaying 13 C-NOESY-HSQC strips corresponding to all 1 H resonances of the same side-chain. The positions of the intra-residue 1 H chemical shifts, as well as sequential 1 HN are marked on the slices and cross-peak detection is analysed. The procedure is repeated for all residues in a sequential order. Following the assignments of the aliphatic protons, side-chain NH2 groups of Asn and Gln residues can be assigned using CBCA(CO)NH and HBHA(CO)NH experiments. Intraresidue correlations between resonances of NH2 and CaH/CbH2 or CbH2/Cg H2 groups for Asn and Gln, respectively, are identified in the experiments and compared against chemical shifts determined at the previous stage to generate the assignments. Use of both 1 H and 13 C chemical shifts is normally sufficient to resolve ambiguities. When optimised for detection of the backbone resonances these experiments do not generate any transferable magnetisation for the NH2 groups. However, 5–10 % 2 H2 O present in the solvent leads to the corresponding fraction of HN2H groups with correlations that are detectable without any parameter adjustments, although the sensitivity of the experiment is reduced proportionately. The cross-peaks corresponding to the HN2H groups are observed in 15 N-HSQC spectra as low-intensity satellites displaced vertically from the main NH2 signals by the isotope shift value. As the result, the cross-peaks in the triple-resonance experiments are similarly displaced from the main HSQC peaks, so the spectral display planes have to be adjusted accordingly to detect the peaks. If the intensity of the satellite peaks is too low, the experiments are acquired with adjusted parameters allowing detection of the main NH2 correlations. As an alternative or additional assignment approach 15 N-NOESY-HSQC spectra can be used to detect intra-residues NOE correlations of NH2 groups, with the most intense correlations normally corresponding to the interactions with CbH2 or Cg H2 groups of Asn and Gln, respectively (see Section 3.2.1 above and Figure 3.4).
Resonance Assignments
81
A number of experiments have been proposed for the resonance assignments of aromatic rings based on the coherence transfer through 13 C spin couplings that correlate CbH2 groups with aromatic 1 H and 13 C resonances [15]. However, multistage transfer pathways and often unfavourable relaxation properties of the aromatic groups make these experiments applicable only to small proteins at high concentrations. Even if 13 C- labelled samples are available the assignments of the aromatic groups are still mainly based on the intra-residue NOEs. The aromatic resonances are best resolved in a [1 H,13 C] constant-time TROSY experiment [16]. The intra-ring resonance systems can be identified using 3D HCCH-COSY and HCCH-TOCSY spectra, although in some cases a better signal separation can be achieved in 2D homonuclear NOESY and TOCSY experiments. The NOE correlations are resolved in 13 C-NOESY-HSQC spectra with the HSQC dimensions corresponding to either aromatic or aliphatic groups. The choice of the experiment depends on the resonance overlap in the spectra and in difficult cases both spectra have to be used to cross-check the correlations.
References 1. Wuthrich, K., Wider, G., Wagner, G. and Braun, W. (1982) Sequential resonance assignments as a basis for determination of spatial protein structures by high-resolution protein nuclear magnetic resonance. J. Mol. Biol., 155, 311–319. 2. Clore, G.M. and Gronenborn, A.M. (1991) Applications of 3-dimensional and 4-dimensional heteronuclear NMR-spectroscopy to protein-structure determination. Prog. NMR Spectrosc., 23, 43–92. 3. Pervushin, K., Reik, R., Wider, G. and Wuthrich, K. (1997) Attenuated T-2 relaxation by mutual cancellation of dipole-dipole coupling and chemical shift anisotropy indicates an avenue to NMR structures of very large biological macromolecules in solution. Proc. Natl. Acad. Sci. USA, 94, 12366–12371. 4. Lian, L.Y., Yang, J.C., Derrick, J.P. et al. (1991) Sequential 1 H NMR assignments and secondary structure of an IgG-binding domain from Protein G. Biochemistry, 30, 5335–5340. 5. Cavanagh, J., Fairbrother, W.J., Rance, M. et al. (2007) Protein NMR Spectroscopy: Principles and Practice, Second Edition, Academic Press, Amsterdam. 6. Rule, G.S. and Hitchens, T.K. (2006) Fundamentals of Protein NMR Spectroscopy, Springer. 7. Permi, P. and Annila, A. (2004) Coherence transfer in proteins. Prog. NMR Spectrosc., 44, 97–137. 8. Sattler, M., Schleucher, J. and Griesinger, C. (1999) Heteronuclear multidimensional NMR experiments for the structure determination of proteins in solution employing pulsed field gradients. Prog. NMR Spectrosc., 34, 93–158. 9. Goddard, T.D. and Kneller, D.G. (2008) SPARKY 3, University of California, San Francisco. 10. Vranken, W.F., Boucher, W., Stevens, T.J. et al. (2005) The CCPN data model for NMR spectroscopy: Development of a software pipeline. Proteins – Struct. Funct. Bioinform., 59, 687–696. 11. Zimmerman, D.E., Kulikowski, C.A., Huang, Y.P. et al. (1997) Automated analysis of protein NMR assignments using methods from artificial intelligence. J. Mol. Biol., 269, 592–610. 12. Jung, Y.S. and Zweckstetter, M. (2004) Mars – robust automatic backbone assignment of proteins. J. Biomol. NMR, 30, 11–23. 13. Tzakos, A.G., Grace, C.R.R., Lukavsky, P.J. and Riek, R. (2006) NMR techniques for very large proteins and RNAs in solution. Ann. Rev. of Biophys. Biomol. Structure, 35, 319–342. 14. Zhu, G. and Yao, X.J. (2008) TROSY-based NMR experiments for NMR studies of large biomolecules. Prog. NMR Spectrosc., 52, 49–68.
82
Protein NMR Spectroscopy
15. Prompers, J.J., Groenewegen, A., Hilbers, C.W. and Pepermans, H.A.M. (1998) Two-dimensional NMR experiments for the assignment of aromatic side chains in C-13-labeled proteins. J. Mag. Res., 130, 68–75. 16. Pervushin, K., Riek, R., Wider, G. and Wuthrich, K. (1998) Transverse relaxation-optimized spectroscopy (TROSY) for NMR studies of aromatic spin systems in C-13-labeled proteins. J. Am. Chem. Soc., 120, 6394–6400.
4 Measurement of Structural Restraints Geerten W. Vuister, Nico Tjandra, Yang Shen, Alex Grishaev and Stephan Grzesiek
4.1
Introduction
Structure determination by high-resolution NMR spectroscopy in solution has traditionally relied on the use of Nuclear Overhauser Enhancement (NOE) derived distance restraints, complemented by dihedral restraints obtained from the analysis of J-couplings [1]. This approach has proven to be very successful in solving the three-dimensional structures of proteins up to the 20–30 kDa range [2]. More recently, additional sources of structural information have become available. These include chemical shifts [3,4], reliable information about hydrogen bonding [5], residual dipolar couplings (RDCs) [6,7] and small-angle X-ray (SAXS) [8,9] or neutron scattering (SANS) data [10]. The different types of data are highly complementary and their combined use has led to strongly improved NMR structures. Since the information from NOE, J-couplings and chemical shifts is inherently short-range in nature, cumulative errors from such restraints can translate into inaccurate long-range structural behaviour [11]. This is particularly the case for non-globular biomolecules, such as larger DNA-molecules or protein (complexes) made up of loosely associated modules. In contrast, RDCs and SAXS/SANS data provide long-range information by reporting on the orientation of local groups relative to an overall molecular frame and constraining the overall shape of the molecule. The combination of both types of data has opened avenues for the accurate study of a much wider range of systems, in particular large complexes, dynamic domain–domain interactions and partially or completely unfolded proteins.
Protein NMR Spectroscopy: Practical Techniques and Applications, First Edition. Edited by Lu-Yun Lian and Gordon Roberts. 2011 John Wiley & Sons, Ltd. Published 2011 by John Wiley & Sons, Ltd.
84
Protein NMR Spectroscopy
In this chapter, we describe the use of these parameters in structure determination. For each parameter, we discuss the physical basis, the practical experimental setup, how structural information is extracted from the data and potential caveats.
4.2 4.2.1
NOE-Based Distance Restraints Physical Background !
A magnetic nucleus with spin I and gyromagnetic ratio g I generates a magnetic dipolar ! field Bd , which is given by ( ) ! ! ! ! gI hm 0 ! r ðI r Þ Bd ¼ I 3 ð4:1aÞ r2 4pr3 where ! r is the distance vector to the nucleus and m0 is the vacuum permeability. Another ! magnetic nucleus, with spin S and gyromagnetic ratio g I, will feel this field as a dipoledipole interaction (Figure 4.1) with energy ( ) ! ! ! ! ! ! g Sg I h2 m 0 ! ! ðS r ÞðI r Þ ! ! Hd ¼ mS Bd ¼ ðg S hS Þ Bd ¼ S I 3 ð4:1bÞ r2 4pr3 In an isotropic solution, rapid rotational motion averages the dipolar interaction to zero, and hence it does not contribute to the energy levels of the system. However, the fluctuating
S r I
!
Figure 4.1 The magnetic dipole field associated with the spin I of a magnetic nucleus exerts a ! force onto a second magnetic nucleus with spin S in an orientation- and distance-dependent manner
Measurement of Structural Restraints
85
dipolar field generated by the molecular motion constitutes an effective relaxation mecha! ! nism for the spins I and S , which induces transitions between the different energy levels and alters their populations. Transitions between the longitudinal spin states (z-magnetisation) occur in the form of auto-relaxation (affecting only one spin) and cross-relaxation (affecting both spins). Longitudinal auto-relaxation is also referred to as T1 or spin-lattice relaxation, whereas longitudinal dipolar cross-relaxation is the source of the Nuclear Overhauser enhancement (NOE) and gives rise to cross-peaks in a NOESY spectrum. Relaxation matrix theory. Longitudinal relaxation in a system of N spins is governed by a set of coupled differential equations of the form [12,13] (boldface indicates matrices or vectors) d Iz ðtÞ ¼ RðIz ðtÞI0z Þ dt
ð4:2Þ
where Iz(t) represents the time-dependent vector of the z-magnetisations of all N spins, I0z is its value at thermal equilibrium and R is the relaxation matrix, which contains the auto- and cross-relaxation rate constants. For a system of dipolar-coupled, homonuclear spin-1/2 nuclei the diagonal and off-diagonal elements of R are given by 1 4 2 m0 2 X g h Jij ð0Þ þ 3Jij ðw0 Þ þ 6Jij ð2w0 Þ þ Rii;leak i ¼ 6 j 10 4p m 2 1 Rij ¼ sij ¼ g 4 h2 0 Jij ð0Þ6Jij ð2w0 Þ for i 6¼ j 10 4p
Rii ¼ rii ¼
ð4:3aÞ ð4:3bÞ
where rii denotes the auto-relaxation rate with Rii,leak as additional term to account for all non-dipolar relaxation mechanisms, sij represents the cross-relaxation rate between spins i and j, and w0 is the Larmor frequency. The dipolar field fluctuations enter into Equation 4.3a in the form of the spectral density function Jij(w), which corresponds to the Fourier transform of the autocorrelation of the fluctuating dipolar field between spins i and j. Thus the dipolar relaxation rates are proportional to the squares of the dipole-dipole interaction (Equation 4.1b), and dipolar relaxation is most effective for high-g nuclei; i.e. amongst 1 H, 15 N, 13 C it is most effective for 1 H. For an isotropic rigid rotator with rotational correlation time tc, the spectral density Jij(w) is given by Jij ðwÞ ¼
1 tc r6ij 1 þ ðwtc Þ2
ð4:4Þ
which shows the proportionality of the NOE effect to the inverse sixth power of the internuclear distance, rij. For molecules in the slow-tumbling limit, i.e. w0tc 1, which applies to most biomolecules with a molecular weight >3–4 kDa, the Jij(2w0) spectral density term of Equation 4.3b can safely be neglected, resulting in sij
1 g 4 h2 m0 2 tc 10 r6ij 4p
ð4:5Þ
86
4.2.2
Protein NMR Spectroscopy
NMR Experiments for Measuring the NOE
The NOE effect described by the cross-relaxation rate sij is best measured using a transient NOE experiment. For this purpose, a deviation from equilibrium z-magnetisation is created at a certain point in the pulse sequence and magnetisation is then allowed to exchange via cross relaxation during a subsequent NOE mixing period. Denoting by DIz ðtÞ ¼ Iz ðtÞI0z the deviation from thermal equilibrium, the magnetisation after the NOE mixing time tm is given by the formal solution of Equation 4.2 DIz ðtm Þ ¼ expðRtm ÞDIz ð0Þ
ð4:6Þ
where DIz ð0Þ is the deviation at the beginning of the mixing period and the matrix exponential expðRtm Þ is the NOE transfer matrix. In the conventional two-dimensional (2D) version of this NOESY (NOE SpectroscopY) experiment, the frequencies of protons before and after the NOE transfer are measured yielding a 2D (F1,F2) NOE spectrum, which identifies the cross relaxing protons i and j at frequency positions (FHi,FHj) and (FHj,FHi). The intensities of the peaks are proportional to DIz ðtÞ thus yielding a measure for sij and rij via Equations 4.3–4.6. Obviously, the correct assignment of an NOE cross-peak is essential for use as a structural restraint, and misinterpreted NOEs can have disastrous effects on the structure determination. The introduction of 13 C and 15 N isotope labelling has greatly facilitated correct NOE assignment via editing the 2D NOESY in additional dimensions by the heteronuclei attached to the interacting protons. A commonly used experiment is the 3D 15 N-edited NOESY-HSQC (Figure 4.2a). The sequence consists of a concatenation of an NOE- and a 1 H-15 N-HSQC building block. Selective water flip-back pulses assure scan-to-scan preservation of the water z-magnetisation, thus avoiding saturation transfer to the protein that would otherwise result in signal attenuation [15]. The resulting 3D NOESY-HSQC spectrum will yield H $ HN (Figure 4.2b) cross-peaks at frequencies (FHi,FNj,FHj), where Hj,Nj constitute an amide proton, nitrogen pair connected by 1 J HN coupling. Since NOEs between two carbon-attached protons are not detected in such a 15 N-edited NOESY-HSQC, this crucial information for side chain contacts is usually obtained from a complementary 3D 13 C-edited NOESY-HSQC or NOESY-HMQC experiment. The latter yields H $ HC cross-peaks at frequencies (FHi,FCj,FHj), where Hj,Cj now constitute a directly bonded proton, carbon pair (Figure 4.2b). The increase to four dimensions appears as a natural extension of the 3D NOESYs for the unambiguous identification of NOE cross-peaks. Such an extension is easily achieved by the addition of a further HMQC or HSQC element (Figure 4.2b). Three complementary versions exist, detecting either NOEs between two nitrogen-attached protons (NH $ HN), between a carbon-attached proton and a nitrogen-attached proton (CH $ HN) or between two carbon-attached protons (CH $ HC). However, practical considerations reduce the usefulness of 4D NOESYexperiments: (i) for realistic total experimental times on the order of several days, the additional sampling of the fourth dimension limits the achievable digital resolution in all indirect dimensions, i.e. the maximal acquisition times in the indirect t1, t2 and t3 dimensions, below the physical limit of the effective lifetime of resonances (see below). (ii) The addition of an extra dimension reduces the sensitivity by H2 relative to the 3D counterpart because of the necessity to sample both the real and the imaginary part of the fourth dimension. (iii) The additional pulses and delays needed for transfer to the 13 C
Measurement of Structural Restraints Frequency labeling
(a)
NOE transfer
φ1 1
y
t1
H
τm
δ
φ3
(b)
HSQC
y -x
δ
φ2
15N
t2 2
t2 2
1
3D:
1
H
1
H
H
1
H
δ δ
N C
H
H
15
N
15
N C
H
N H C 1H 13 C 1H
1
Dec.
4D:
13C
15 13
1
1
Cα C’
φ3
NOE
2D: -x -x
t3
87
15
13
1
1
H 1 H
13
Grad. G1
G2
G3
G4 G5
G5
Figure 4.2 (a) Pulse sequence of the 3D 15N-edited NOESY-HSQC experiment (H $ HN) for 15 N- or 15N/ 13C-labelled samples. Narrow and wide pulses have flip angles of 90 and 180 , respectively and are applied with phase x unless specified otherwise. The 1H, 15N, and 13C carriers are set to 4.7 (H2O), 116.5, and 70 ppm, respectively. 1H pulses have an RF field strength of 28 kHz with the exception of the water-selective flip-back pulses applied as 2 ms sinc 90 pulses (semi-ellipses) and low-power, rectangular 1 ms 90 pulses (smaller rectangles). The 15 N pulses have an RF field strength of 6.25 kHz. 15N WALTZ-16 decoupling during acquisition is applied at an RF field strength of 1.5 kHz. The 90 13C pulses have a strength of 22 kHz, the 13C 180 pulse (semi-ellipse) is implemented as a 400–500 ms hyperbolic-secant pulse. For t1 values shorter than the length of this pulse, the pulse can be applied as overlapping with the 1H 90 pulses or prior to the first 1H pulse. 13C a and 13C 0 180 decoupling pulses are applied at 56 and 177 ppm, respectively, and have an RF field strength of 14.0 kHz. On cryo probes, 13C and 15 N 180 pulses are usually not applied simultaneously, but back-to-back to avoid arcing. As an alternative, the final soft 90 x –hard 180 x –soft 90 x WATERGATE scheme can be replaced by a 3–9–19 sequence (see [14] for details). Gradient durations (z-direction, sine bell shaped; 30 G/cm at centre): G1,2,3,4,5 ¼ 6.0, 5.0, 3.0, 2.0, 0.4 ms. Delays: d ¼ 2.25 ms, tm see text. Phases: f1 ¼ (45 ,225 ); f2 ¼ 2x,2(x); f3 ¼ 4x,4y,4(x),4(y); receiver ¼ x,2(x),x. Quadrature detection in the indirect dimensions is achieved by incrementing phases f1 (1H) and f2 (15N) in the usual States-TPPI manner. (b) Schematic overview of nuclei connected in 2D, 3D and 4D NOE experiments
or 15 N nuclei result in further losses. For these reasons, it is often advisable to limit the dimensionality of an NMR experiment. Thus three-dimensional projections of the 4D NOESYs such as N $ HN, C $ HN, or C $ HC are often more useful as a complement to the ‘regular’ H $ HN or H $ HC 3D-NOESY experiments. Recent developments employing sparse sampling techniques partially overcome some of the limitations with respect to a decrease in digital resolution [16,17]. 4.2.3
Set-up of NOESY Experiments
4.2.3.1 Estimation of T2s For the proper set-up of any NMR experiment, it is essential to have an approximate idea of the relaxation times of the different nuclei involved in the magnetisation transfer pathways and in the frequency detection periods. A simple estimate of amide proton T2s can be obtained from the 1–1 spin-echo experiment (Figure 4.3a [18]). This experiment is a selective spin-echo with good water suppression that can be used also on unlabelled samples in H2O and hence is very suitable for general, fast characterisation of any
Protein NMR Spectroscopy
88
Δ1 = 0.1 ms Δ2 = 2.9 ms I2 /I1 = 0.65
(a)
(b)
Cγ 6
8-10 6
H C H β
φ1 -φ1 1
H τ
15
12
φ2 -φ2 Δ
Δ
2τ
N
42-50 acq Dec.
10
8
6
4
2
0
1H
14-20
N
Cα
C
H 13
H8
O
50-60
(ppm)
Figure 4.3 Estimation of T2 values in proteins from a 1-1 spin echo experiment. (a) Result of 1-1 Echo experiment applied to the HIV-1 Nef protein. The inset shows the pulse sequence of the experiment. The carrier is set on water. The delay t is adjusted such that the excitation maximum is in the centre of the 1HN region: |nmax| ¼ (4t)1 where nmax denotes the frequency offset from the water. Phases: f1 ¼ x,y,-x,-y; f2 ¼ 4x,4y,4(-x),4(-y); receiver ¼ x,-y,-x,y, -x,y,x,-y. The delay D is varied to estimate T2. The Nef spectrum was recorded with delays D1,2 of 0.1 and 2.9 ms, respectively. The intensity ratio of the left amide proton resonances is about 0.65, yielding a T2-value of 13 ms and a correlation time tc of about 15 ns (see text). (b) Typical T2 values ([ms] at 600 MHz 1H) of backbone and side-chain nuclei in a protein with a 15-ns tc
biomacromolecule. It works as follows: the 1 H carrier is set to the water frequency. The two pairs of 1–1 or jump-return 90f /90f pulses, which are separated by delays t and 2t, respectively, leave the water along its equilibrium position in the positive z-direction. For nuclei which resonate at a frequency offset of (4t)1 from the water, the first and second 1-1 pair act as selective 90 and 180 pulses, respectively. The delay t is set such that the excitation maximum is around 8.5 ppm, i.e. the centre of the 1 HN region. The two D periods surrounding the selective 180 pulse are the spin-echo relaxation delays during which 1 HN transverse relaxation is monitored. Two experiments are carried out with different D values and T2 is calculated from the intensity ratios (Recipe 4.1). It should be noted that the experiment also decouples 1 Ha protons from the HN protons, since the selective 180 pulse does not excite the 1 Ha in the vicinity of the water frequency. For a rigid, isotropically tumbling molecule, the rotational correlation tc can be estimated from the 1 HN T2 as tc[ns] 1/(5T2 [s]). In the slow tumbling limit, the T2s of all nuclei of a rigid rotator are proportional to each other. Thus the T2s of the various 1 H, 13 C and 15 N nuclei of a protein can be estimated from the 1 HN T2. Figure 4.3b shows typical T2s for a 15-ns tumbler that can serve as a guide for other proteins. Recipe 4.1: 1–1 Echo Experiment 1. Assure proper shimming and tuning. Calibrate a high-power 1 H 90 pulse. Set 1 H carrier to water. Set 15 N carrier and decoupling for 15 N-labelled samples. 2. Set t such that the excitation maximum is around 8.5 ppm (82 ms at 800 MHz) and D to a small value (100 ms).
Measurement of Structural Restraints
89
3. Set the power of all proton pulses and the length of the two first pulses of the 1–1 pairs (90f1 , 90f2 pulses) as calibrated. Tweak the length and the phases of the second pulses (90f1 , 90f2 ) such that minimal water excitation results. Typically, the pulse lengths are slightly shorter (0.05 ms) than for the first pulses, since radiation damping during the t periods already moves the water in the direction of the positive z-axis. Typical phase variations are 1–2 depending on the accuracy of the water frequency setting. 4. Run the 1–1 echo experiment with a short relaxation delay (D1 100 ms) and with a longer delay (D2). Estimate the T2 from the intensity ratio of resonances in the downfield part of the 1 HN spectrum as T2 ¼ 2(D2 D1)/ln(I1/I2). Note that the downfield part usually corresponds to strongly hydrogen-bonded amide protons in the folded part of the protein. The best sensitivity for T2 estimation is obtained when D2 is close to T2/2. Recipe 4.2: Set-up of Optimal Acquisition Times The acquisition periods of an NMR experiment should be set up such that the resolution is maximised according to the available signal. Note that the resolution achieved is a function of the maximal acquisition time tmax and the signal-to-noise ratio. Good results can be obtained by following simple rules: 1. The acquistion time tmax in the directly observed dimension should be about 3 T2 of the observed resonances. In this case, good digital filtering is obtained when using a 60 shifted sine-square bell function. 2. For decaying signals in an indirectly detected dimension, the acquistion time tmax should be set to a value where the signal has decayed to about 1/3. This corresponds approximately to the T2 of the observed resonances, when the decay is solely caused by relaxation. Note, however, that the decay can also be due to unresolved J-couplings, for example JCC for 13 C resonances. Also in this case, decay to about 1/3 is a good compromise. Good digital filtering is obtained for such a signal with a 60 -shifted sine bell function. 3. Rule 2 obviously does not apply to constant-time experiments, where no decay is occuring. For such cases, tmax should be set to the maximal achievable time in the constant time period. 4. The initial delay t0 of the first sampled time point requires additional consideration. During t0 chemical shift evolution occurs, which necessitates an appropriate phase correction during processing. For conventional quadrature detection and phasing definitions, usually ph1 ¼ 360 t0/Dt, where ph1 is the first order phase and Dt is the time increment. The zero order phase ph0 usually is then given as ph0 ¼ 0.5 ph1, if there is no additional contribution to ph0 from the hardware, phase shifts or BlochSiegert effects. Note that on some systems (e.g. nmrPipe [19]) the sign of the phases is inverted. Furthermore, the initial delay t0 is best set to values of 0, 1/2 Dt or 1 Dt, since all other values result in curved spectral baselines due to the intricacies of the discrete Fourier transform [20,21]. For the calculation of t0, all initial chemical shift evolution needs to be taken into account, for example also the evolution during the flanking 90 pulses in a typical P90-t1-P90 evolution period. For rectangular 90 -pulses of duration P90, the chemical shift evolution time is equivalent to 2 P90/p. Finally, during processing the first data point needs to be multiplied by the factor 0.5, 1, or 1 for the initial delay value t0
90
Protein NMR Spectroscopy
of 0, 1/2 Dt or 1 Dt, respectively. In the last case (t0 ¼ Dt), a constant baseline correction in the frequency domain is required because the signal at time zero (¼ integral over frequency domain) is missing. Recipe 4.3: Set-up of a 3D 15N-Edited NOESY Experiment (Figure 4.2a) 1. Assure proper shimming and tuning. Calibrate 1 H, 13 C and 15 N pulses. 2. For best water suppression, optimise the water-gate scheme [22] at the end of the HSQC part in a separate experiment. For this, apply a single-scan soft 90 (x) – hard 180 (x) – soft 90 (x) – acquire sequence without any gradients. Set the 1 H carrier to the water resonance. Set power levels and lengths of the pulses as calibrated (90 hard pulse: 7–10 ms, 90 soft pulse: 1 ms). Optimise the duration and phase of the soft pulses for minimal water excitation. Use thesevalues for the full 3D 15 N-NOESY-HSQC experiment. 3. Use the same 1 H carrier, pulse durations, phase values for the 15 N-NOESY-HSQC experiment. Set the 15 N carrier to the middle of the 15 N region (116.5 ppm). For 13 C-labelled samples set 13 C carrier for 180 during t1 to 70 ppm, and to 56/177 ppm for the two 180 pulses during t2. Checking the individual 2D planes 4. Record an HSQC test plane of the 15 N-edited NOESY. For this, omit the t1-incrementation. For largest signal, set t1 0 and omit the f3 decoupling pulses on 15 N and 13 C. Other settings: tm ¼ 80 ms, 1 H (f3) spectral width 14–16 ppm (to cover all 1 HN resonances), 15 N (f2) spectral width 20–30 ppm, recycle delay 0.9–1.0 s. Set the number of increments in t2 and t3 such that the maximal acquisition times tmax match about 1x and 3x, respectively, the estimated T2 of the resonances (Recipes 4.1/4.2). Recording the experiment with a small number of scans (2 or 4) should yield a spectrum with sensitivity similar to a normal HSQC. Check that the phases in the indirect f2 dimension correspond to the settings according to the initial time delay (Recipe 4.2 point 4). Initially, the phases in the directly detected t3 dimension may be arbitrary. From the respective ph1 value obtained, a correction for the initial time delay can be calculated as Dt3 ph1/360 , where Dt3 is the dwell time. The sweep width in the f2 dimension may be optimised to minimise peak overlap. However, the total acquisition time t2,max should be kept at about T2. Note that some reduction in sweep width (leading to folding) may be beneficial to minimise the total required experimental time. 5. Record a 1 H(f1)-1 HN (f3) test plane of the 15 N-edited NOESY. For this, omit the t2incrementation. For largest signal, set t2 0 and omit the 13 C decoupling pulses in the t2evolution interval. Set the 1 H (f1) spectral width to about 10 ppm and the maximal acquisition time t1,max to the estimated T2. Other parameters as under point 4. Record the experiment with a small number of scans. Transform the spectrum: mainly the diagonal will be visible, since cross-peak intensity is usually on the order of a few percent of the diagonal. Check that the phases of the diagonal peaks in the indirect f1 dimension correspond to the settings according to the initial time delay (Recipe 4.2 point 4). An additional þ or 45 is required on ph0 due to the phase offset the f1-pulse settings. The sweep width in the f1 dimension may be optimised in a similar way to that indicated under point 4.
Measurement of Structural Restraints
91
Optimising the mixing time 6. Mixing times of about 80–100 ms are usually a good compromise for proteins in the 10–40 kDa range for maximising the intensity of the cross-peaks while ensuring only moderate levels of spin diffusion. For other situations (deuterated, larger or smaller proteins), maximal cross-peak sensitivity can be obtained by setting the mixing time to approximately the value of the decay time of the diagonal peaks, i.e. the selective T1,sel. An estimate of T1,sel can be determined by recording the first increment of the 15 N-edited NOESY with a short (e.g. 20 ms) and a long mixing time (e.g. 120 ms) and a sufficient number of scans. Since most of the signal arises from diagonal resonances, T1,sel can be estimated from the ratio of intensities of the 1D spectra after Fourier transform. As in Recipe 4.1, the downfield side of amide resonances should be used, since they represent the folded protein. Using this method, optimal NOE mixing times on the order of several hundred ms are found for protons on the background of a deuterated protein [23]. Run the final experiment 7. Run the 3D experiment with mixing time, sweep widths, and acquisition times as optimised in the previous steps. Since the indirect dimension acquisition times are limited to > < U ¼ 0:33 nm for medium NOEs > > : 0:50 nm for weak NOEs Recipe 4.6: Extraction of Distances Using the Two-Spin Approximation 1. Identify NOE cross-peaks between protons in fixed geometry (e.g. Tyr, Phe, Trp aromatic ring protons) and in regular secondary structural elements (e.g. those corresponding to daN(i,i þ 1) ¼ 0.21 nm in b-sheets and daN(i,i þ 3) ¼ 0.34 nm in a-helices). 2. Using the cross-peaks of step 1, generate a simple calibration curve of cross-peak intensity (I peak ) vs distance (D). 3. Using the calibration curve of step 2, derive D from I peak for every cross-peak of interest. 4. Define a lower (L) and upper-bound (U) by using a relative error such as L ¼ 0.8 D, U ¼ 1.2 D. 4.2.5
Information Content of NOE Restraints
Modern NMR structure determination often includes restraints from NOEs, J-couplings, chemical shifts, residual dipolar couplings and other observables. However, NOE-derived distances still constitute probably the most important source of structural information. Figure 4.4 illustrates why these restraints are so important. The observation of an NOE immediately implies a short distance ( jn JCH j > jn JCC j > jn JCN j for similar bond geometries. 4.3.2
NMR Experiments for Measuring J-Couplings
The requirement to measure small J-couplings (typically 1.5/JSX; the splitting in F2 should be sampled with the highest possible resolution, i.e. t2,max ¼ (1–3) T2 of the relevant nuclei.
Measurement of Structural Restraints
99
2. Process with sufficient digital resolution in the dimension from which the J-coupling is to be measured, i.e. use zero filling at least by a factor of 4. 3. Pick peaks using a peak picker that interpolates between data points for best resolution. Pick only well-resolved peaks with sufficient signal-to-noise ratio, for example >4. 4. Assure that peak frequency data files are written with sufficient precision, for example