ACCELERATION AND IMPROVEMENT OF PROTEIN IDENTIFICATION BY MASS SPECTROMETRY
Acceleration and Improvement of Protein I...
46 downloads
819 Views
7MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
ACCELERATION AND IMPROVEMENT OF PROTEIN IDENTIFICATION BY MASS SPECTROMETRY
Acceleration and Improvement of Protein Identification by Mass Spectrometry Edited by
WILLY VINCENT BIENVENUT Biochemistry Institute, Protein Analysis Facility, Lausanne University, Switzerland
A C.I.P. Catalogue record for this book is available from the Library of Congress.
ISBN 1-4020-3318-4 (HB) ISBN 1-4020-3319-2 (e-book)
Published by Springer, P.O. Box 17, 3300 AA Dordrecht, The Netherlands. Sold and distributed in North, Central and South America by Springer, 101 Philip Drive, Norwell, MA 02061, U.S.A. In all other countries, sold and distributed by Springer, P.O. Box 322, 3300 AH Dordrecht, The Netherlands.
Printed on acid-free paper
All Rights Reserved © 2005 Springer No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Printed in the Netherlands.
"Conformity is the jailer of freedom and the enemy of growth." John F. Kennedy
DEDICATION
To my parents for their help and support during all the years.
TABLE OF CONTENTS
TABLE OF CONTENTS PREFACE ACKNOWLEDGEMENT LIST OF CONTRIBUTOR
vii xvii xix xxi CHAPTER 1
WV. Bienvenut Introduction: Proteins analysis using mass spectrometry 1. Introduction: from genome to proteomic analysis 2. Proteins separation 2.1. Introduction 2.2. Electrophoretical separation 2.2.1. Gel separation 2.2.1.1. Molecular mass separation 2.2.1.2. Isoelectric focalisation separation technique 2.2.1.3. Bi-Dimensional separation technique 2.2.1.4. Visualization/staining methods for gel separated proteins Organic dyes Metallic ions staining Covalently immobilized dyes Radioisotope labelling 2.2.2. Capillary electrophoresis separation 2.3. Liquid chromatography 2.4. Mutidimentional chromatography separation 2.5. Conclusion 3. Proteins electroblotting from gel to polymere membrane 3.1. Introduction 3.2. Transfer systems 3.3. Composition and influence of the blotting buffer and solvents 3.3.1. Buffers composition 3.3.2. Effects of the SDS and methanol contained in the buffer solution 3.3.3. Others influencing parameters
vii
1
1 3 3 3 4 5 5 6 6 7 9 10 11 11 12 14 14 14 14 15 18 18 20 20
viii 3.4. Membranes staining 3.4.1. Non-denaturing organic staining process 3.4.2. Radiolabelled protein detection 3.4.3. Denaturing staining process 3.5. Conclusion 4. Proteins identification 4.1. Introduction 4.2. Nuclear proteins identification procedures 4.3. Enzymatic cleavage of proteins 4.3.1. Introduction 4.3.2. Enzymatic cleavage description 4.3.2.1. Treatment and digestion of transblotted proteins 4.3.2.2. Treatment and digestion of gel separated proteins 4.3.2.3. Utilisation of immobilized endoproteinases 4.3.3. Trypsin 4.3.3.1. Enzymatic activity measurement 4.3.3.2. Cleavage specificity 4.3.4. Endoproteinase Lys-C 4.3.5. Chymotrypsin 4.3.6. Pepsin 4.3.7. Bacterial endopeptidases 4.3.8. Conclusion 4.4 Chemicals cleavage of the proteins. 4.4.1. Introduction 4.4.2. Acidic hydrolysis 4.4.3. Cyanogen bromide 4.4.4. Cleavage at the carbonyl side of t he Trp 4.4.5. Cleavage at Cys residues 4.4.6. Conclusion 4.5. Sample preparation and clean-up for MALDI-MS analysis 4.5.1. Chromatographic treatment 4.5.2. Preparation of Samples for MALDI-MS analysis 4.5.2.1. Dry droplet method 4.5.2.2. Spin-coated drying 4.5.2.3. Slow crystallisation 4.5.2.4. Fast evaporation method 4.5.2.5. Crystalline germ method 4.5.2.6. Sprayed matrix 4.5.3. Sample desalting procedures 4.5.4. Conclusion
21 21 23 24 25 25 25 25 27 27 27 28 28 29 29 31 31 33 34 35 36 37 37 37 37 38 38 40 40 41 41 42 42 43 43 43 44 44 44 45
ix 4.6. Proteins identification using mass spectrometry 45 4.6.1. Protein identification using PMF technique. 46 4.6.1.1. Method description 46 4.6.1.2. MALDI-TOF-MS analysis technique 47 The laser 47 The matrix 49 The co-matrix 55 4.6.1.3. MALDI ionisation mechanism 55 4.6.1.4. Time of flight separation of the ions and its improvement 58 4.6.1.5. Signal detection and data acquisition 59 4.6.1.6. Separation and detection using ICR-FT 59 4.6.1.7. Signal reproducibility 60 4.6.1.8. Suppression effects 60 4.6.1.9. Quantification by mass spectrometry 63 4.6.1.10. Data treatment for protein identification 65 MALDI-MS spectrum calibration 65 Identification tools 66 Principal affecting factors during data processing 67 Other possible criteria for protein Identification not directly integrated to identification tools 72 Interpretation of results and limits of validity 72 4.6.2. Protein identification from internal peptide sequence 73 4.6.2.1. Introduction 73 4.6.2.2. ESI-MS/MS analysis 75 5. Advances techniques for protein identification 79 5.1. Introduction 79 5.2. Chemical modifications 80 5.2.1. Introduction 80 5.2.2. Reaction involving free amino groups of the peptides/proteins 80 5.2.2.1. Acetylation of the amino groups 80 5.2.2.2. Lys specific reactions 81 5.2.2.3. Iso-thiocyanate treatment of the free amino groups for N-Terr cleavage (Edman type reaction) 83 5.2.3. Reaction involving free carboxylic groups of the peptides/proteins84 5.2.4. Labile hydrogen atoms exchange to deuterium atoms 85 5.2.5. Cysteine alkylation 86 5.2.6. Peptides modification using charged modifications 87 5.2.6.1. Positively charged modifications 87 5.2.6.2. Negatively charged modifications 88 89 5.2.6.3. Conclusion 5.2.7. Stable isotope labelling during the digestion 89 5.2.8. Conclusion 90
x 5.2. Biochemical approach 5.3. In-vivo labelling 6. Automated approach
91 94 94 CHAPTER 2
Molecular scanner development: Toward clinical molecular scanner for proteome research: Parallel protein chemical processing before and during western-blot. Reprinted with permission from Bienvenut, W., Sanchez, J., Karmime, A., Rouge, V., Rose, K., Binz, P., et al. (1999). Toward a clinical molecular scanner for proteome research: parallel protein chemical processing before and during western blot. Anal Chem, 71(21), 4800-4807. Copyright 1999, American Chemical Society. Abstract Keywords 1. Introduction 2. Experimental section 2.1. Reagents 2.2. Covalent attachment of trypsin and blocking of the IAV membrane 2.3. Activity measurement of trypsin covalently bound to the IAV membrane 2.4. 1-DE and 2-DE separation 2.5. In Gel Digestion 2.6. On membrane Digestion 2.7. OSDT process 2.8. PIGD 2.9. DPD combined method 2.10. MALDI-TOF-MS 2.11. Post-acquisition processing and software identification tools 3. Results 3.1. Activity measurement of trypsin covalently bound to the IAV membrane 3.2. IGD 3.3. OMD 3.4. OSDT 3.5. PIGD 3.6. DPD applied to 1-DE 3.7. Comparative digestion between OSDT, PIGD and the DPD applied to 2-DE 3.8. DPD applied to 2-DE
119 120 121 122 122 122 123 123 124 124 124 125 125 125 126 126 126 128 128 129 130 130 131 132
xi 4. Discussion and conclusion 5. Acknowledgments 6. References
132 136 136 CHAPTER 3
Quantitation during electroblotting step: Enhanced protein recovery after electrotransfer using square wave alternating voltage. Reprint by permission of Elsevier Science from Bienvenut, W., Deon, C., Sanchez, J., & Hochstrasser, D. (2002). Enhanced protein recovery after electrotransfer using square wave alternating voltage. Anal Biochem, 307(2), 297-303. Copyright 2002. Abstract Keywords 1. Introduction 2. Material and methods 2.1. Mono-dimensional electrophoresis (1-DE) 2.2. Electroblot 2.3. Detection, quantification and statistics 2.4. [14C] signal linearity and influence of the accumulation time 3. Results and discussion 3.1. Comparison of the electric field and buffer composition effects 3.2. Statistical test for the transfer reproducibility
3.3. Gel residual protein after transblotting process 4. Concluding remarks 6. Acknowledgement 7. References
139 139 140 140 140 141 142 142 143 143 145
146 148 149 149
CHAPTER 4 Signal traitment and virtual imaging (1/2): A molecular scanner to highly automated research and to display proteome images. Reprinted with permission from Binz, P., Muller, M., Walther, D., Bienvenut, W., Gras, R., Hoogland, C., et al. (1999). A molecular scanner to automate proteomic research and to display proteome images. Anal Chem, 71(21), 4981-4988. Copyright 1999, American Chemical Society. Abstract Keywords 1. Introduction
151 152 153
xii 2. Experimental section 2.1. Materials and reagents 2.2. Description of the method 3. Results and discussion 3.1. Representation of the analysis of a 1-dimensional scan of 1-DE 3.2. Representation of the analysis of a two-dimensional scan from a single band of 1-DE 3.3 Identification by two-dimensional scan of human plasma proteins separated by 2-DE 4. Discussion 5. Conclusion 6. Acknowledgement 7. References
155 155 155 158 158 160 161 163 166 166 167
CHAPTER 5 Signal traitment and virtual imaging (2/2): Visualization and analysis of molecular scanner peptide mass spectra. Reprint by permission of Elsevier Science from Mueller, M., Gras, R., Appel, R. D., Bienvenut, W. V., & Hochstrasser, D. F. (2002). Visualization and analysis of molecular scanner peptide mass spectra. J Am Soc Mass Spectrom, 13(3), 221-231. Copyright 2002, by the American Society of Mass Spectrometry. Abstract 1. Introduction 2. Methods 3. Results and discussion 3.1. Visualization of spectra 3.2. Chemical noise 3.3. Calibration 3.4. Identification and clustering of masses 4. Conclusion 5. Acknowledgements 6. References
169 170 172 173 173 174 176 178 185 187 187
xiii
CHAPTER 6 Improvement in the peptide mass fingerprint protein identification (1/2): Hydrogen/deuterium exchange for higher specificity of protein identification by peptide mass fingerprinting Reprinted by permission of John Wiley & Sons, Inc., from Bienvenut, W., Hoogland, C., Greco, A., Heller, M., Gasteiger, E., Appel, R., et al. (2002). Hydrogen/deuterium exchange for higher specificity of protein identification by peptide mass fingerprinting. Rapid Commun. Mass Spectrom., 16(6), 616-626. Copyright 2002. Abstract 189 1. INTRODUCTION 191 2. Methods 193 2.1. Chemicals 193 2.2. protein separations 193 2.3. In-gel protein digestion 195 2.4. MALDI-ToF MS analysis 195 2.5. H/D exchange on the MALDI sample plate 196 3. Results and discussion 196 3.1. Visualization of spectra 197 3.2. Chemical noise 197 3.3. Calibration 198 3.4. Identification and clustering of masses 199 3.5. Application of the technique to tryptic bovine serum albumin digest 199 3.6. Application of the technique to an unknown protein digest 200 4. Discussion and conclusion 202 4.1. Influence of the matrix compound 202 4.2. Influence of the physico-chemical characteristic of the solvent 203 4.3. Influence of the amino acid composition of the peptide 203 4.4. Application of the technique as a validating and discriminating method 204 5. Challenge and future developments 204 6. Acknowledgements 205 7. References 205
xiv
CHAPTER 7 Improvement in the peptide mass fingerprint protein identification (2/2): MALDI-MS/MS with high resolution and sensitivity for identification and characterization of proteins Reprinted by permission of Wiley-Liss, Inc, a subsidiary of John Wiley & Sons, Inc., from Bienvenut, W., Deon, C., Pasquarello, C., Campbell, J., Sanchez, J., Vestal, M., et al. (2002). Matrix-assisted laser desorption/ionization-tandem mass spectrometry with high resolution and sensitivity for identification and characterization of proteins. Proteomics, 2(7), 868-876. Copyright 2002. Abstract Keywords 1. Introduction 2. Materials and methods 2.1. Reagents and apparatus 2.2. Protein solubilisation for preparative 2-D PAGE 2.3. 2-D PAGE 2.4. Image analysis 2.5. Protein digestion 2.6. Sample preparation 2.7. Database interrogation 3. Results 3.1. Peptide sequences discrimination 3.2. De Novo sequencing 3.3. Tryptophan oxidation 4. Conclusion 6. Acknowledgements 5. References
209 209 210 210 210 211 211 212 212 212 214 216 216 218 220 222 222 222
CHAPTER 8 Proteomic and mass spectrometry: Some aspects and recent developments. Bienvenut, W. V., Mueller, M., Palagi, P. M., Gasteiger, E., Heller, M., Jung, E., Giron, M., et al. (2001). Proteomic and mass spectrometry: some aspects and recent developments, In J. N. Housby (Ed.), Mass spectrometry and genomic analysis (1st ed., Vol. 2, pp. 93-145). Dordrecht: Kluwer academic press. 1. Introduction to proteomics 2. Protein biochemical and chemical processing followed by
225
xv mass spectrometric analysis 2.1. 2-DE gel protein separation 2.2. Protein identification using peptide mass fingerprinting and robots 2.2.1. MALDI-MS analysis 2.2.2. MS/MS analysis 2.2.2.1. MALDI-RETOF-PSD MS analysis 2.2.2.2. ESI-MS/MS analysis 2.2.3. Improvement of the identification by chemical modification of peptides 2.2.3.1. Esterification 2.2.3.2. H/D exchange: quantitation of labile protons on peptides 2.3. The molecular scanner approach 2.3.1. Double parallel digestion process 2.3.2. 14C quantitation of the transferred product and diffusion 2.3.2.1. Comparison of the influence of the electric field on the protein recovery 2.3.2.2. DPD quantification test 3. Protein identification using bioinformatics tools 3.1. Protein identification by PMF tools using MS data 3.1.1. Peak detection 3.1.2. Identification tools 3.2. MS/MS Ions Search 3.3 De novo sequencing 3.4 Other tools related to protein identification 3.5. Data storage and treatment with LIMS 3.6. Concluding remarks 4. Bioinformatics tools for the molecular scanner 4.1. Peak detection and spectrum intensity images 4.2. Protein identification 4.3. Validation of identifications 4.4. Concluding remarks 5. Conclusion 6. Acknowledgements 7. References
226 227 229 231 234 235 236 239 240 241 247 247 248 249 250 252 252 254 254 259 260 261 262 264 265 265 267 267 273 273 274 274
CHAPTER 9 Conclusions and perspectives
283 APPENDIX
Abbreviations used in this book Abbreviations for usual amino acids and chemical constants
285 289
xvi
Index
291
PREFACE
Now that the human genome has been fully sequenced, the need for efficient protein analysis and characterization tools has never been so critical. Firstly, computer algorithms have been used to predict genes and it is accepted that as much as 10% of them might have been missed. Only final gene products, i.e. the proteins, prove that gene sequences with signal sequences, introns and exons are correct. Secondly, it is nearly impossible at present to predict with high accuracy the final polypeptide product and its co- and post-translational modifications. Then, a protein’s partial characterization allows a definite identification of the protein’s processing such as the amino-acid sequence modification induced by the editing of the mRNA during alternative splicing. Thirdly, there is lack of correlation between the expression levels of mRNA and proteins. Their respective half-lives are very different as well as their levels of expression. The major difficulty in analysing proteins is the tremendous diversity of their chemical and other properties. Their concentrations vary by more than 12 orders of magnitude in body fluids and by more than 7 orders of magnitude in cells. For example in blood, the concentration of albumin is in the millimolar range and Tnf (tumour necrosis factor) in the femtomolar range. While the pI (isoelectric point) of DNA or mRNA is around 4.2 to 4.5, the pI of proteins extends from less than 3 to more than 12. Whereas the solubility of nucleic acids is excellent, proteins, and especially membrane proteins, can be excessively hydrophobic. Consequently, no single method is available to fully analyse a complex mixture of proteins. In addition, no amplification process such as PCR or RT-PCR exists in the protein world. Therefore, extremely sensitive methods are required to detect the lowabundance proteins. Many methods to separate, identify and partially characterize polypeptides have been available for a long time. Until recently, many of them required a relatively high concentration/amount of proteins/material. Miniaturization of the analytical techniques does not necessarily solve the difficulty in detecting low-abundance proteins. For example, if in equipment with attomole sensitivity one injects a volume of nanolitres, the limit of detection in concentration does not exceed micromolar, way above the concentration of interesting physiologically relevant proteins. Consequently, the critical step in working with complex protein samples is to select efficient pre-fractionation and separation techniques. Often the best methods are based on affinity pre-purification and a combination of chromatography and/or electrophoresis. Many approaches could be used to detect the proteins and some of their modifications. Several developments in the field of mass spectrometry offer a new avenue, especially in the area of large-scale protein identification and partial characterization. Multi-compartment equipment allows the precise selection of xvii
xviii precursor ions (peptides), their efficient fragmentation and final characterization (sequence and modifications) and can also provide accurate quantification methods. The latest improvements are at both the hardware and software levels to provide fully automated and rapid identification methods. This book is timely. It reviews in a concise form most techniques that should be known by scientists working in a proteomics laboratory or analysing proteins of interest. It first reviews the electrophoretic and chromatographic separation methods. It then summarizes the quantification and identification methods such as immunoblotting, protein chemistry, peptide fingerprinting or sequencing by fragmentation. Several chapters highlight fascinating developments in the field of mass spectrometry and related techniques. The text shows the reader the perspective of this relatively new field of proteomics. Finally, the book lists numerous references to critical work done many years ago and unavailable on computer databases. It should therefore be part of every laboratory’s library.
Prof. Denis F. HOCHSTRASSER
ACKNOWLEDGEMENTS
First of all, I would like to thank all the people, scientists and non-scientists alike, who have contributed to the development of this work. Secondly, I would like to give special thanks to: - Professor Denis F. Hochstrasser from the Medical University Department of Pathology, Science University at the Department of Pharmacology and responsible for the Clinical Chemistry Central Laboratory at the cantonal hospital of Geneva (Switzerland) for accommodating a research position in his laboratory, thereby improving my knowledge of the chemist’s role in protein chemistry, biochemistry and mass spectrometry techniques; - Dr Jean-Charles Sanchez, responsible for the bi-dimensional electrophoresis laboratory at the cantonal hospital of Geneva for integrating me to his research group and for his critical approach; - Professor Jean-Luc Veuthey, from Geneva University of Science, responsible for the Pharmacology Section of the Pharmacy and Pharmaceutical Analysis Unit, who accepted the co-direction of my thesis project; - Professor Jacques Weber, dean of the science faculty, and Dr Jérôme Garin, Research Director at the CEA centre (Grenoble, F), who took time to judge this thesis; - Véronique Converset, Abderahim Karmime, Gérald Rossellat and Salvo Paesano, technicians at the University Cantonal hospital of Geneva who conducted some of the experiments involved in this project; - Dr Séverine Frutiger-Hughes from the Pathology Department and Dr Graham Hughes from the Biochemistry Department of the Medical University for their excellent knowledge of protein chemistry and helpful discussions; - Danièle Roiron head of Prof. Hochstrasser’s secretariat, and Dr Catherine Zimmerman from the Clinical Chemistry Central Laboratory at the cantonal hospital of Geneva (Switzerland), for their discussions; - Professor Keith Rose, Scientific Director of GeneProt (Geneva, CH), for his help in organising my work; - Professor Darryl Pappin from Applied Biosystems (Framingham , MA, USA) who received me into his laboratory (ICRF, London, UK) and taught me the techniques of peptides chemical modifications; xix
xx -
Dr Manfredo Quadroni for his help during the preparation of this manuscript;
Finally, I would like to thank all of the R&D laboratory as well as personnel from the Clinical Chemistry Central Laboratory for their help over the past 5 years and I address a large thank-you to my parents for all the sacrifices they had to make throughout the years...
LIST OF CONTRIBUTORS Appel Ron D.:
Binz Pierre-Alain: Campbell Jennifer M.: Déon Catherine:
Diaz Jean-Jacques:
Gasteiger Elisabeth:
Gays Steven:
Giron Marc:
Gras Robin:
Greco Anna:
Heller Manfred: Hochstrasser Denis F.:
Hoogland Christine:
Hughes .Graham J.: Jung Eva E.:
Swiss Institute of Bioinformatics, University Medical Centre, Rue Michel-Servet 1, CH-1211 Geneva 4, Switzerland Swiss Institute of Bioinformatics, CH-1211 Geneva 14, Switzerland Applied Biosystems, 500 Old Connecticut Path, Framingham, MA 01701 Central Clinical Chemistry Laboratory, Pathology department, Geneva University Hospital, Rue Michelidu-Crest 24, CH-1211 Geneva 14 INSERM U369, Faculté de Médecine Lyon-R.T.H. Laennec, 7, Rue Guillaume Paradin, 69372 LYON CEDEX 08, France Swiss Institute of Bioinformatics, University Medical Centre, Rue Michel-Servet 1, CH-1211 Geneva 4, Switzerland Swiss Institute of Bioinformatics, University Medical Centre, Rue Michel-Servet 1, CH-1211 Geneva 4, Switzerland Swiss Institute of Bioinformatics, University Medical Centre, Rue Michel-Servet 1, CH-1211 Geneva 4, Switzerland Swiss Institute of Bioinformatics, University Medical Centre, Rue Michel-Servet 1, CH-1211 Geneva 4, Switzerland INSERM U369, Faculté de Médecine Lyon-R.T.H. Laennec, 7, Rue Guillaume Paradin, 69372 LYON CEDEX 08, France Central Clinical Chemistry Laboratory, Geneva University Hospital, CH-1211 Geneva 14, Switzerland Central Clinical Chemistry Laboratory, Pathology department, Geneva University Hospital, Rue Michelidu-Crest 24, CH-1211 Geneva 14 Swiss Institute of Bioinformatics, University Medical Centre, Rue Michel-Servet 1, CH-1211 Geneva 4, Switzerland University Medical Centre, Rue Michel-Servet 1, CH1211 Geneva 4, Switzerland Swiss Institute of Bioinformatics, University Medical Centre, Rue Michel-Servet 1, CH-1211 Geneva 4, Switzerland xxi
xxii Central Clinical Chemistry Laboratory, Geneva University Hospital, CH-1211 Geneva 14, Switzerland Müller Markus: Swiss Institute of Bioinformatics, University Medical Centre, Rue Michel-Servet 1, CH-1211 Geneva 4, Switzerland Swiss Institute of Bioinformatics, University Medical Palagi Patricia M.: Centre, Rue Michel-Servet 1, CH-1211 Geneva 4, Switzerland Pasquarello Carla: Central Clinical Chemistry Laboratory, Pathology department, Geneva University Hospital, Rue Michelidu-Crest 24, CH-1211 Geneva 14 Rose Keith: University Medical Centre, Rue Michel-Servet 1, CH1211 Geneva 4, Switzerland Converset Véronique: (previously Rouge Veronique) Central Clinical Chemistry Laboratory, Geneva University Hospital, CH-1211 Geneva 14, Switzerland Sanchez Jean-Charles: Central Clinical Chemistry Laboratory, Pathology department, Geneva University Hospital, Rue Michelidu-Crest 24, CH-1211 Geneva 14 Vestal Marvin L.: Applied Biosystems, 500 Old Connecticut Path, Framingham, MA 01701 Karmime Abderahim:
CHAPTER 1 INTRODUCTION Protein analysis using mass spectrometry
WV. Bienvenut
1. INTRODUCTION: FROM GENOME TO PROTEOMIC ANALYSIS Development during the 1980s of new techniques of mass spectrometry such as “Matrix-Assisted Lased Desorption/Ionisation” (MALDI) (Karas & Hillenkamp, 1988; Tanaka et al., 1988) or “ElectroSpray Ionisation” (ESI) (Aleksandrov et al., 1984; Fenn, Mann, Meng, Wong, & Whitehouse, 1989; Yamashita & Fenn, 1984) allowed the analysis of large organic polymers. Both techniques have the advantage of producing stable molecular ions for biomolecules such as proteins and oligonucleotides. Limits of sensitivity are as low as 0.1 to 100 fmol, which correspond to very sensitive tools compatible with low-abundance substrates. 1200
Number of articles
1000
972
800
755
600
400
352
200 137 0 1993
0
3 1995
8
35
5 50
1997
1999
2001
2003
Years
Figure 1. Occurrence of the words "proteome” and “proteomic" in the PubMed databank.
1 W. V. Bienvenut (ed.), Acceleration and Improvement of Protein Identification by Mass Spectrometry, 1–118. © 2005 Springer. Printed in the Netherlands.
W.V. BIENVENUT
2
Mr Standard (kDa)
Gel A 1
2
Gel B 3
4
5
Mr Standard 6
(kDa)
PHS2 (98) BSA (65)
OVAL (45) PHS2 (98)
CAH2 (31)
BSA (65)
ITRA (24) OVAL (45)
LYC (14)
Figure 2. Nucleolar protein separation by SDS-PAGE followed by Coomassie blue staining (from Dr JJ Diaz (Scherl et al., 2002)). Acrylamide gel concentration was 12.5 % for gel A, which allows preferential separation of high MW protein and 8 % acrylamide for gel B able to preferentially separate low MW proteins. Proteins are distributed in a range of 10 kDa up to 120 kDa for gel A, whereas for gel B they are distributed from 30 kDa up to 150 kDa. MW standard proteins are visible in lanes 1 and 6 and can be used to estimate the Mr of the separated protein visible in lanes 3 and 4. Lanes 2 and 5 are empty.
Such techniques completely changed the protein characterisation approach. In 1993, five groups around the world (Henzel et al., 1993; James, Quadroni, Carafoli, & Gonnet, 1993; Mann, Hojrup, & Roepstorff, 1993; Pappin et al., 1995; Yates, III,
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
3
Speicher, Griffin, & Hunkapiller, 1993) demonstrated the possibility of utilising the peptide mass fingerprint obtained after a specific enzymatic cleavage, e.g. by trypsin, to identify the target protein. This simple technique was sufficient for confident protein identification. The process was based on the comparison of the theoretical protein fragment masses, generated in silico from databases, with the experimental values obtained by mass spectrometry. The genome sequencing resources are extremely important for such an approach (Lander et al., 2001; Venter et al., 2001). In 1994, during the first congress “From Genome to Proteome” in Siena, Marc Wilkins proposed the use of the term proteome to describe the proteins expressed by the genome (Wilkins, Pasquali et al., 1996; Williams & Hochstrasser, 1997) at a particular time in a given tissue, species, etc. Since 1994, the words “proteome” and “proteomic” have been used increasingly (Figure 1), illustrating the scientific interest in the translation of the genome: the proteins or proteomes. Nevertheless, although protein characterisation and identification techniques are robust and are distributed worldwide, purification of the protein mixtures is required before such analyses. 2. PROTEIN SEPARATION 2.1. Introduction All cells contain complex protein mixtures in term of pI, MW and hydrophobicity and in a huge range of concentrations. As an example, biological fluids as simple as milk or human serum contain mainly casein and albumin, respectively, but hundreds or thousands of different proteins are also present in such mixtures at various concentrations. In human plasma, albumin concentration is around 35–50 g per litre, corresponding to 500–750 PM (Doumas, Watson, & Biggs, 1971), whereas vitamin D binding protein concentration is around 200 mg per litre, corresponding to 4 PM (Dahl et al., 2003). In some fluids, such as vaginal secretions, Gaucherand et al. (Gaucherand, Guibaud, Rudigoz, & Wong, 1994) have quantified D-fetoprotein concentrations for the diagnosis of premature rupture of membranes. The threshold was determined to be 30 Pg/l of this protein, which corresponds to 500 fM concentration. Purification and/or separation steps are needed before the characterisation step. 2.2. Electrophoretic separation One of the most frequently used techniques for protein separation is based on their amphoteric characteristics. Depending of the sample pH, proteins carry negative and/or positive net charges. Macroscopically, proteins can have a net positive charge (basic pH) or a net negative charge (acidic pH) or can be neutral if the matrix pH is identical to the isoelectric point (pI). Thus, depending on the sample pH, proteins can be charged and they can then migrate under the influence of an electric field.
W.V. BIENVENUT
4
Usually a solid support (polyacrylamide gel) or liquid medium (buffer in a silica capillary) is used for such separation.
Figure 3. Bi-dimensional separation of proteins contained in a human plasma sample followed by silver staining. pI range is from 4 to 10 and MW range is 5 to 200 kDa (Copyright SWISS 2D-PAGE, http://ch.expasy.org/cgi-bin/map2/noid?PLASMA_HUMAN).
2.2.1. Gel separation Two main techniques are used for protein separation and both of them involve the physical and chemical properties of the proteins being analysed: - Separation as a function of the proteins’ molecular volume, which is usually considered to be sufficiently similar to the protein molecular weight, - Separation as a function of proteins’ isoelectric points.
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
5
If, both techniques can be used separately, their combination allows the efficient separation of hundreds to thousands of proteins in a single process due to the orthogonal approach. 2.2.1.1. Molecular mass separation Protein separation using molecular volume (or mass equivalent) is one of the oldest partition techniques (Shapiro, Vinuela, & Maizel, 1967; Weber & Osborne, 1969). However, all the proteins contained in a sample show a wide range of pI (typically from pH 2–3 to 11–12) so that, at the same pH value, the proteins do not all show a similar charge density ratio (charge per mass unit). To obtain protein migration depending only on the molecular mass, all proteins must have a similar charge density ratio. To achieve this, proteins contained in a sample are denatured and mixed with a detergent such as sodium dodecyl sulfate (SDS). These charged molecules interact with the protein backbones at the rate of 1.4 g of SDS per gram of protein, which corresponds approximately to 1 molecule of SDS to every 2 AAs (Pitt-Rivers & Impiombato, 1968). This separation technique requires a matrix of polyacrylamide obtained by copolymerisation of acrylamide and a cross linker such as piperazine diacrylyl. The concentration of the cross linker will define the pore size, which directly influences the separation efficiency. Cross linker concentration can be homogeneous in the whole gel (e.g. Figure 2) or may vary to produce a “y-axis” gradient (e.g. Figure 3). The latter preparation is much more difficult to produce, but such gels can more accurately separate proteins across a wide range of mass (typically from 5 to 200 kDa). A few minor modifications have been adopted since Laemmli (Laemmli, 1970) first described this technique. After the staining step, separated proteins appear as parallel bands on mono-dimensional gels (Figure 2) that contain from one to dozens proteins (Scherl et al., 2002) or as spots for 2-DE (Figure 3). 2.2.1.2. Isoelectric focusing separation technique Proteins are amphoteric molecules, which means that proteins have both acidic and basic properties; at particular pH values corresponding to the isoelectric points (pI), such components have a net charge equivalent to zero. Their solubility is the lowest at that pH, and under such conditions proteins precipitate and a further solubilisation step is necessary if the sample is to be used for a second separation technique. IsoElectric Focussing (IEF) is a protein/polypeptide separation technique based on the amphoteric chemical properties corresponding the protein pI. Protein samples are mixed with a buffer able to form a charge on all of the material (usually at basic pH) and then loaded on a pH gradient. In an electric field, proteins migrate and concentrate to their pI, where they stop and usually precipitate. Older techniques used polyacrylamide gel containing a mobile buffer, also called Carrier Ampholytes (CAs). Under the effect of the electric field, the CAs create a pH gradient (Seiler, Thobe, & Werner, 1970) on which proteins can be separated. This technique is very powerful but its major drawback was the reproducibility of
6
W.V. BIENVENUT
the pH gradient due to the mobility of the CAs during the focusing step (also influenced by temperature). New buffering substances called Immobilines™ are copolymerised with the acrylamide (Bjellqvist et al., 1982; Bossi, Righetti, & Chiari, 1994; Rosengren, Bjellqvist, & Gasparic, 1976), which improves the reproducibility of the pH gradient and also the reproducibility of the separation. A drawback of such a technique is the limited amount of protein separated in the case of preparative gels in comparison to the CA system. This separation technique can be used alone (Etienne et al., 1999; Towbin, Staehelin, & Gordon, 1979), but the process is usually combined with a second separation technique such as SDS-PAGE whereby proteins are separated according to their molecular volume, a good approximation of the molecular weight. 2.2.1.3. Bi-Dimensional separation technique This separation technique is a combination of the two previously described techniques: - Isoelectric focussing, - SDS-PAGE. The result of the combination is a strong increase of the resolving power such that thousands of proteins/polypeptides can be separated in a single process. This method was proposed historically in 1970 by Kenrik & Margolis, who used it for native protein separations (Kenrik & Margolis, 1970). Because native protein separations are not so frequent, the method was adapted for denatured samples. This development was conducted by O’Farrell, Klose and Scheele in 1975 (Klose, 1975; O'Farrell, 1975; Scheele, 1975). By convention, the pH gradient corresponds to the X-axis and the MW separation to the Y-axis. The method is highly efficient in the pH ranges from 3.5 to 10 and/or from 4 to 7 (Gorg, Postel, & Gunther, 1988). Narrower pH domains spanning only one pH unit were successfully used by Tonella et al. (Tonella et al., 1998) to visualise the proteomic expression in limited ranges. Such an approach involving a few 2-DE separations is able to separate a larger number m of proteins than the use of a single 3.5 to 10 pH range 2-DE separation. Nevertheless, such approach is limited to the commercial availability of such IEF gradients. As an example, pH values higher than 9-10 are difficult to reach and results are not always reproducible. 2.2.1.4. Visualisation/staining methods for gel separated proteins Protein visualisation is important because it directly influences protein detection and the subsequent processing such as excision of proteins for PMF. Two different approaches are used for protein staining using metallic ions, e.g. silver or zinc. Alternatively, organic dyes, namely Coomassie brilliant blue (CBB), SYPRO® and Amido-Black (AB) are commonly used. A non-exhaustive list of few staining agent is available in Table 1.
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
7
Organic dyes Coomassie staining is probably the most widely used method for protein detection after SDS-PAGE. Two Coomassie dyes can be used: - Coomassie brilliant blue (CBB R250) - Coomassie brilliant blue colloidal (CBB G250), which differ only by one methyl group. The limit of detection of these staining techniques is 8-10 ng for colloidal CBB staining version and 50-100 ng for the standard. However, this staining protocol does not allow direct quantification of materials contained in the gel because of to the large variability in staining intensity observed for different proteins (Chrambach, Reisfeld, Wyckoff, & Zaccar, 1967; Neuhoff et al., 1990). The dye molecules bind to proteins owing to an interaction of dye sulfonate groups with basic residues of the polypeptides, e.g. the H-amino groups of lysine residues. Indeed, the staining response is more likely linked to the concentration of basic sites at the surface of the proteins (Salih & Zenobi, 1998), but the hydrophobicity interaction also acts during the staining process. In some cases, dye molecules could be aggregated on a single basic position (Tal, Silberstain, & Nusser, 1985). However, staining by CBB is quite reproducible and shows good linearity within a limited range, so that the amount of a given protein can be determined fairly accurately given a calibration curve and a good scanner for densitometry. Finally, CBB staining is compatible with mass spectrometry for protein characterisation, i.e. easy to carry it out, fast and cheap. All theses factors account for its popularity (Galvani, Bordini, Piubelli, & Hamdan, 2000; Shevchenko, Loboda, Shevchenko, Ens, & Standing, 2000). SYPRO® dyes are recently developed protein staining agents. Protein visualisation is obtained by interaction between the protein and a complex of europium or ruthenium and an organic ligand, e.g. bathophenanthroline. These stains are fluorescent and enable a lowering of the limit of detection. In the case of SYPRO® ruby (Malone, Radabaugh, Leimgruber, & Gerstenecker, 2001), limits of detection are below the silver stain level with 0.25-8 ng of protein (Yan, Harry, Spibey, & Dunn, 2000). Other SYPRO® stains are available, namely: - SYPRO® red (Steinberg, Haugland, & Singer, 1996; Steinberg, Jones, Haugland, & Singer, 1996): similar limit of detection to SYPRO® ruby (0.5–10 ng protein), - SYPRO® orange (Malone et al., 2001; Steinberg, Haugland et al., 1996; Steinberg, Jones et al., 1996): limit of detection between 4 and 10 ng protein, - SYPRO® tangerine (Steinberg et al., 2000): limit of detection between 5 and 25 ng protein.
W.V. BIENVENUT
8
Table 1. Comparison of different stains for protein detection on gel or membranes and compatibility with different proteomic analyses. Staining agent
Sensitivity (ng/band) 50–100 1–10
Reversibility NA No
PMF Comp. NA No
1.5
NA
Yes
NA
NA
Yes
50–100
NA
yes
Coomassie colloidal blue (CBB G250)
8–10
NA
Yes
Copper
NA
NA
Yes
2-Methoxy-2,4-diphenyl3(2H)-furanone Niles red
NA
NA
No
5–25
NA
NA
Radioisotope labelling
NA
NA
Yes
SYPRO® orange
4–10
NA
Yes
SYPRO® red
0.5–10
NA
Yes
SYPRO® ruby SYPRO® tangerine Zinc
0.25–8 4–10 7–15
NA NA Yes
Yes Yes Yes
Amido-Black Colloidal silver with glutaraldehyde Colloidal silver without glutaraldehyde PMF Compatible silver (commercial kit from Pharmacia) Coomassie brilliant blue (CBB R250)
References (Chrambach et al., 1967) (Rabilloud, 1990, 1992; Switzer, Merril, & Shifrin, 1979) (Shevchenko, Jensen et al., 1996) (Yan et al., 2000)
(Chrambach et al., 1967; Shevchenko, Jensen et al., 1996) (V Neuhoff, Amold, Taube, & Ernhardt, 1988; Neuhoff et al., 1990) (Lee, Levin, & Branton, 1987) (Alba & Daban, 1998) (Daban, 2001; Daban, Bartholomé, & Samsó, 1991) (S. Patterson, Thomas, & Bradshaw, 1996) (Malone et al., 2001; Steinberg, Haugland et al., 1996; Steinberg, Jones et al., 1996) (Steinberg, Haugland et al., 1996; Steinberg, Jones et al., 1996) (Malone et al., 2001) (Steinberg et al., 2000) (Fernandez-Patron et al., 1994)
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
9
The major advantage of such dyes is their good compatibility with subsequent MS analysis with the exception of SYPRO® tangerine (Lauber et al., 2001), which shows yields of recoveries similar to silver staining without glutaraldehyde. Furthermore, since detection is based on emission rather than adsorption, the linear range for quantitation is greatly improved. The first major drawback of SYPRO® dyes is their cost, especially for SYPRO® ruby. A study (Malone et al., 2001) that compared MALDI spectra quality and cost per gel highlighted SYPRO® orange, which is less sensitive (1–10 times depending of the proteins) but much more cost effective ($6.00/gel compared to $133.00/gel for SYPRO® ruby). Moreover, no differences were observed in the mass spectra of stained proteins. Another more general problem with fluorescent dyes is that visualisation has to be performed under ultraviolet light; therefore special scanners are required to maximize detection performance. Manual inspection of gels is also sometimes difficult and it is more difficult to verify spot excision accuracy when working with spot cutters. Amido-Black is a popular dye for staining proteins transferred onto membranes. It was used also for in-gel staining (Chrambach et al., 1967) but, due to the poor sensitivity, Coomassie blue is generally preferred. Niles red is an unusual dye (Daban et al., 1991) that binds proteins mostly by hydrophobic interactions. An advantage of this dye is the ability of the protein to be electroblottable after the staining step. This dye has been used directly for in-gel protein staining, followed by electrotransfer to a PVDF membrane (Daban, 2001). India ink (Lek, Yang, Wang, & Cheng, 1995) and Ponceau red (Gianazza et al., 1995) are also used as protein stains but mostly for visualisation of proteins transferred onto PVDF or nitrocellulose membranes (Breggren et al., 1999). Interactions between proteins and dyes are mainly due to non-covalent interactions, electrostatic and non-specific interactions such as hydrogen bonds and van der Waals bounds (Salih & Zenobi, 1998). Ionic interactions are involved between the sulfonate group of the dye and basic residues such as His, Lys and Arg. Hydrophobic interactions are obtained between the phenyl groups of the stains and hydrophobic parts of the proteins. As an example, Tal et al. (Tal et al., 1985) clearly show that a single molecule of lysozyme is able to bind up to 48 molecules of CBB R-250, whereas only 28 basic residues were available in this sequence. Also, it must be noted that the intensity is not always related to protein concentration but also depends to the number of basic residues, the hydrophobic part of the sequence and, additionally, the size of the stained polypeptide. Since the protein is unknown, it is very difficult to determine quantitatively the amount of material. Metallic ion staining Silver staining is a widely used protein visualisation technique and is considered as a denaturing method. The principle is to use the ability of the carboxylic groups of the proteins to bind silver ions, which are then reduced to metal, producing a brownblack metallic blur at the position of the focussed protein. A large number of
10
W.V. BIENVENUT
different protocols have been reported in the literature, corresponding to a large range of sensitivity. As an example, colloidal silver staining using glutaraldehyde as a sensitiser and cross-linking agent could detect down to 1–10 ng of separated protein (Rabilloud, 1990, 1992; Sanchez & Hochstrasser, 1998; Switzer et al., 1979). This staining protocol is one of the most sensitive, but is a long and tricky procedure. To be compatible with mass spectrometric analysis, such staining methods must be conducted without glutaraldehyde (Galvani et al., 2000; Jungblut & Seifert, 1990; Shevchenko, Wilm, Vorm, & Mann, 1996; Yan et al., 2000), since this reagent cross links the amino groups of the proteins, producing complex and unidentifiable peptides (Lauber et al., 2001). Several kits for fast MS-compatible silver staining are commercially available. The negative staining process using zinc ions with imidazol buffer is also compatible with MS techniques (Fernandez-Patron et al., 1994) and can be achieved in less than 15 minutes (Fernandez-Patron, Calero et al., 1995). The principle is that zinc cations create insoluble complexes with imidazol molecules (Fernandez, Gharahdaghi, & Mische, 1998), producing a white background all over the SDSPAGE gel. At the position of focussed proteins, the SDS bound to the protein by hydrophobic interaction inhibits the formation of the Zn/imidazol complex. As a result, the protein’s positions appear transparent on a white gel. To perform this staining, the gel is incubated in a solution containing SDS/imidazol to improve staining contrast (Ortiz et al., 1992), after which the gel is incubated in a zinc cation solution. The limit of detection of this technique is 7–15 ng protein loaded, mostly depending of the protein concerned (Fernandez-Patron et al., 1994). Another advantage of this procedure is the reversibility of the staining reaction. Indeed, to liberate the polypeptides for further analysis, e.g. PMF protein identification, the whole gel or gel plug can be incubated in a zinc-chelating solution (typically EDTA or citric acid), which disrupts the complexes. The major drawback of this staining process (despite the SDS incubation step) is the low contrast of the gel image that sometimes makes it difficult to localize precisely spot positions, and the impossibility of performing quantitation by densitometry. In 1995, Fernandez-Patron (Fernandez-Patron, Hardy, Sosa, Seoane, & Castellanos, 1995) proposed a double staining protocol using first Coomassie and then Zn/imidazol. Again, the advantages of this combined technique are the speed of the staining procedure and the compatibility with MS analyses and the fact that it allows visualisation of protein using two different staining process; the Zn/imidazol process especially highlights proteins that were not visible with CBB G-250 staining. Covalently immobilized dyes Dye molecules can also be covalently linked to the protein. As an example Alba and Daban (Alba & Daban, 1998) proposed the use of 2-methoxy-2,4-diphenyl-3(2H)furanone, a non-fluorescent compound. These molecules react with primary amino groups to produce fluorescent derivatives. This technique has the great advantage of
INTRODUCTION: PROTEIN ANALYSIS USING MASS SPSECTROMETRY
11
showing low background since only bound dye molecules gives a signal. However, due to the modification of the H-amino group of the lysine, subsequent trypsin digestion proceeds with very poor efficiency. One of the emerging methods, based on the covalently linked dyes, for protein visualisation is “differential in-gel electrophoresis” (DIGE). This technique was proposed in 1997 when Unlu et al. (Unlu, Morgan, & Minden, 1997) reported a study in which two samples were separated on the same gel. Both samples, e.g. from a diseased and a healthy patient (Zhou et al., 2002), are treated before gel separation with different fluorescent dyes that binds covalently to H-amino groups of Lys. The emission wavelength is different for each dye. The gel is scanned twice and two images are recorded and superimposed. This technique is very interesting for the accurate determination of differential protein expression, since the approach overcomes all problems related to reproducibility of 2D-PAGE migration. Disadvantages of DIGE are the high cost of the reagents and scanners required for the analysis as well as some concerns about positional accuracy for spot cutting since cysteine modifications induces a small but significant mass shift. Labelling of proteins is kept at sub-stoichiometric levels (below 5 %) to maximize recovery and prevent spot spreading. Radioisotope labelling Radioisotope-labelled AAs are frequently used for protein visualisation. In most cases, such a technique can be applied only if the sample is obtained from cell or bacterial cultures (Patterson et al., 1996). (See section 5.4). To conclude, the silver staining process is probably one of the preferred methods for visualisation of proteins previously separated by PAGE with a good sensitivity. However, such chemical modifications are not easily compatible with MS analysis. SYPRO ruby shows similar sensitivity to the silver staining but with the advantage that it responds linearly to the protein concentration over a larger range than does silver stain, and moreover that protein samples are compatible with MS analysis. The commercially available stain SYPRO® ruby is expensive compared to CBB R250 or G250, Sypro® orange or negative zinc staining, which are also compatible with MS analysis. 2.2.2. Capillary electrophoresis separation Although bi-dimensional electrophoresis (2-DE) is a powerful technique for the simultaneous separation of hundreds to thousands of proteins/polypeptides contained in complex biological samples (Herbert, Sanchez, & Bini, 1997), it is timeconsuming and expensive, and gel to gel reproducibility is not easy to obtain. Capillary electrophoresis has a lower resolution than the 2-D PAGE separation on complex samples but the technique allows separation of proteins/polypeptides in few minutes. As for SDS-PAGE separation, the analytes are separated under high electric fields (1000 V/cm) in capillary columns of 40 kDa) but suffered from a major drawback: loss of resolution for low molecular weight polypeptides (< 60 kDa) through diffusion during the digestion process. The third method examined was the combination of PIGD and OSDT procedures. This combination call "Double Parallel Digestion" (DPD), led to greatly improved digestion of high molecular weight and basic proteins without losses of low MW polypeptides. Peptides liberated during transblotting of proteins through the immobilised trypsin membrane were trapped on a PVDF membrane and identified by mass spectrometry in scanning mode (see Chapter 4 (Binz, Wilkins et al., 1999)).
119 W. V. Bienvenut (ed.), Acceleration and Improvement of Protein Identification by Mass Spectrometry, 119–137. © 2005 Springer. Printed in the Netherlands.
120
BIENVENUT ET AL.
KEYWORDS Parallel protein digestion, semi-dry electroblot, 2-DE, MALDI-TOF-MS, peptide mass fingerprint, double parallel digestion, molecular scanner, automation, IAVtrypsin, proteome
MOLECULAR SCANNER DEVELOPMENT
121
1. INTRODUCTION Several genomes have already been fully sequenced and many others will be in the near future. Although it is possible to extract from genome data a complete set of the potentially expressed protein amino acid sequences, in many cases this information is not sufficient to unravel the function of a newly discovered gene product: the proteins also need to be identified and characterised (Hochstrasser, 1998; Williams & Hochstrasser, 1997). The task of identifying and characterising all proteins expressed by a genome is tremendous (Williams & Hochstrasser, 1997). The word proteome has been coined to refer to the expressed protein complement of a genome (Wilkins, Pasquali et al., 1996). Massively parallel protein identification and characterisation techniques are required. Several groups around the world have developed methods using liquid chromatography mass spectrometry (LC-MS) to sequentially identify and partially characterise proteins from complex biological samples (Figeys, Ducret, Yates, & Aebersold, 1996; Wilm & Mann, 1996). Matrix Assisted Laser Desorption/Ionisation-Time of Flight (MALDI-TOF) techniques have been developed to analyse intact proteins or their peptide fingerprints (Jungblut et al., 1996; Pappin, Coull, & Koster, 1990; Scheler et al., 1998; Shevchenko, Wilm et al., 1996). Several software programs have been developed to assist protein identification by comparison of mass spectra obtained from mass spectrometry (MS) or MS-MS experiments with theoretical spectra from proteins and DNA databases ( Binz, Wilkins et al., 1999). Recently, Eckerskorn et al. (Eckerskorn et al., 1997) demonstrated the possibility of scanning a transblotted membrane with a MALDITOF mass spectrometer (MALDI-TOF-MS) equipped with an infrared laser and detecting intact proteins. The detection sensitivity was equal to or better than that obtained by silver-staining. Ogorzalek Loo et al. (Ogorzalek Loo et al., 1997a) analysed proteins directly from a polyacrylamide gel with good sensitivity and mass accuracy. Peptide mass fingerprinting (PMF), a method of choice in proteome studies, requires specific chemical or enzymatic digestion followed by MS of the resulting peptides. Up to now, the digestion step has been a sequential process where robotics can be used for spot excision such as the "spot picker" proposed by Traini et al. (Traini et al., 1998b). We wished to investigate if it was possible to digest all separated proteins on a bi-dimensional electrophoresis (2-DE) gel simultaneously, and if it was possible to transfer all resulting protein fragments to a membrane without loss of spatial resolution and subsequent mass spectrometric sensitivity. Were this all to be possible, proteins separated on 2-DE could be digested in parallel, transblotted to a membrane, and MALDI-TOF-MS scanning of the membrane would then provide a massively parallel way to rapidly and partially
122
BIENVENUT ET AL.
characterise thousands of proteins with an appropriate integrated software system (Hochstrasser, 1998) able to treat mass spectra and to create a fully annotated image. In this article, we present three approaches to parallel sample preparation. The first method (OSDT) was developed to digest proteins previously separated by 1-DE or 2-DE during the transblotting process. The second one (PIGD) involves applying the standard IGD procedure to the whole gel, followed by transblotting. The third method, called double parallel digestion (DPD), is the combination of the two previous methods. These three methods are compared with the standard sequential methods for protein digestion, i.e. in-gel (IGD) or on-membrane digestion (OMD). 2. EXPERIMENTAL SECTION 2.1. Reagents Sequencing-grade modified trypsin was purchased from Promega (Madison, WI, USA). Immobilon™ AV membranes were purchased from Millipore (Bedford, MA, USA). Acrylogel-PIP 2.6%C solution was purchased from BDH (Poole, England). Trans-Blot® PVDF membrane and Broad range Mr sodium dodecylsulfate polyacrylamide gel electrophoresis (SDS-PAGE) standards containing bovine pancreatic trypsin inhibitor (BPT1, 6.5 kDa), chicken lysozyme (LYC, 14.3 kDa), soybean trypsin inhibitor (ITRA, 20.1 kDa), bovine carbonic anhydrase (CAH2, 28.9 kDa), chicken ovalbumin (OVAL, 42.7 kDa), bovine serum albumin (ALBU, 66.4 kDa), rabbit phosphorylase b (PHS2, 97.2 kDa), E. Coli E-galactosidase (BGAL, 116.4 kDa) and rabbit myosin (MYSS, 223 kDa) were purchased from Bio(Richmond, CA, USA). Trifluoroacetic acid (TFA), Rad tris(hydroxymethyl)aminomethane (Tris), 3-[cyclohexylamino]-1-propanesulfonic acid (CAPS), trypsin (type IX from porcine pancreas, dialysed and lyophilised) and D-cyano-4-hydroxy-trans-cinnamic acid (ACCA) were purchased from Sigma (StLouis, MO, USA) and were of analytical grade. Acetonitrile (ACN), calcium chloride, ethanolamine, glycine, D-tosyl-L-arginine methylester (TAME) and SDS were purchased from Flucka (Buch, Switzerland) and were of analytical grade except for ACN (preparative HPLC grade). Ethanol, hydrochloric acid, methanol, sodium bicarbonate, sodium chloride, sodium dihydrogenophosphate, polyoxyethylene sorbitan monolaurate (Tween 20) were purchased from Merck (Darmstadt, Germany). MilliQ water (Millipore) was used when necessary. Immobilised pH gradient strips were purchased from Amersham Pharmacia Biotech (Uppsala, Sweden). 2.2. Covalent attachment of trypsin and blocking of the IAV membrane IAV membrane is a commercially available modified PVDF membrane. Its activated carboxyl groups are reactive towards nucleophiles such as amine groups of proteins
MOLECULAR SCANNER DEVELOPMENT
123
or peptides. Trypsin was immobilised on this membrane according to the manufacturer’s instructions (Immobilon Tech Protocol: TP014, TP015, TP018). Briefly, a 10x12 cm2 IAV membrane was wetted in a solution of trypsin (2.0 mg/ml in 20 mM sodium dihydrogenophosphate buffer, pH 7.8) and then incubated in a rotating hybridiser HB-2D (Techne, Cambridge, England) at room temperature for 3 hours. The membrane was washed 3 times rapidly and vigorously in 10 ml of PBSTween solution (20 mM of sodium dihydrogenophosphate, 140 mM sodium chloride and 0.5% Tween 20, pH 7.4) to remove unreacted trypsin, then incubated 3 hours with 10 ml of ethanolamine (1 M in 1 M sodium bicarbonate buffer pH 9.5, final pH 10.5) at 4°C to block the remaining actived carbonyl groups of the membrane. After this capping step, the membrane was washed 3 times rapidly and vigorously in 10 ml of PBS-Tween solution and then twice 30 minutes in 10 ml of PBS-Tween solution. Membranes were stored at 4°C in a 46 mM Tris-HCl, 1 mM calcium chloride and 0.1% sodium azide buffer solution, pH 8.1. 2.3. Activity measurement of trypsin covalently bound to the IAV membrane The tryptic activity of the IAV-trypsin membrane was determined using the trypsin assay reagent TAME. One cm2 of IAV-trypsin membrane was immersed in a mixture composed of 2.6 ml of 460 mM Tris-HCl, 11.5 mM calcium chloride, pH 8.1 solution, 0.3 ml of 10 mM TAME solution and 0.1 ml of 1 mM HCl solution. After 40 seconds of vigorous stirring, the optical density of the solution was measured at 247 nm with an UV-Visible spectrophotometer (Ultrospec III, Amersham Pharmacia Biotech). A second measurement was performed after 3 minutes of constant vigorous stirring. The value of 'A247/min was used to calculate the equivalent amount of active trypsin (expressed per unit surface area) as described previously14. 2.4. 1-DE and 2-DE separation 1-DE was conducted essentially according to Laëmmli (Laemmli, 1970) with 12% T and 2.6% C for linear polyacrylamide. Protein migration was carried out using MiniProtean II electrophoresis apparatus (Bio-Rad) operated at 200 V for 45 minutes. For mini 2-DE, protein separation from human plasma was conducted according to Sanchez et al. (Sanchez & Hochstrasser, 1998) using immobilised pH gradient strips 3.5-10 and 5-5.5. When necessary, the gel was stained with Coomassie Brilliant Blue (CBB) R250 (0.1% w/v), methanol (30% v/v) and acetic acid (10% v/v) for 30 minutes and destained with repeated washes of methanol (40% v/v) and acetic acid (10% v/v).
124
BIENVENUT ET AL.
2.5. In Gel Digestion Protein spots were excised from the gel and then digested with trypsin using previous published procedures (Sanchez & Hochstrasser, 1998; Shevchenko, Wilm et al., 1996) and modified as described below. The piece of gel was first destained with 200 Pl of 50 mM ammonium bicarbonate, 50% ACN during 1 hour at 37°C. Destaining solution was removed and the gel was dried in a vacuum centrifuge (Speed Vac, Savant). Gel pieces were reswollen with 20 Pl of 20 mM ammonium bicarbonate and 4 Pl of 0.1 Pg/Pl trypsin. After over-night incubation at room temperature, the gel was dried under high vacuum to evaporate solvent and volatile salts. Then 20-40 Pl of 50% ACN, 0.3% TFA were added and the gel sonicated for 15 minutes to extract peptides. A control extraction (blank) was performed using a piece of the gel from a region between the protein bands. 2.6. On membrane Digestion Proteins previously separated by 1-DE or 2-DE were electroblotted onto a PVDF membrane using the semi-dry method essentially according to a previous description (Jin & Cerletti, 1992) using 10 mM CAPS pH 11 or half-strength Towbin’s (½Towbin) pH 8.4 with both 0.01% SDS in 10% methanol, using a laboratory-made semi-dry apparatus. Transfer was complete after 3 hours at 1 mA/cm2. PVDF membranes were stained with amido black (0.5% w/v), isopropanol (25% v/v) and acetic acid (10% v/v) for 1 minute and destained with repeated washing with deionised water. Tryptic digestion was performed according to previous work (Pappin et al., 1996), modified as described below. Pieces of membrane were excised and destained with 500 Pl of 50% methanol during 2 hours at room temperature. Following removal of the supernatant and drying of the membrane, 10 Pl of 50 mM ammonium bicarbonate 30% ACN and 4 Pl of 0.1 Pg/Pl trypsin were added and incubated over night at room temperature. Supernatant was collected and the membrane extracted with 20 Pl 80% ACN during 15 minutes with sonication to extract the peptides from the PVDF. The extract was pooled with the previous supernatant. After drying in the vacuum centrifuge, the digested material was resuspended in 30% ACN, 0.1% TFA. A control extraction (blank) was performed using a piece of the gel from a region between the protein bands. 2.7. OSDT process Immediately after the SDS-PAGE protein separation, gels were soaked in deionised water for 5 minutes, and then equilibrated 10 minutes in ½Towbin buffer containing 0.01% (w/v) of SDS. Electrotransfer was carried out in a laboratory-made semidry apparatus overnight at room temperature. In order to increase the migration time of
MOLECULAR SCANNER DEVELOPMENT
125
the protein through the IAV membranes during transfer (and thereby allow more time for digestion to take place), we used an asymmetrical alternating voltage. We selected a square wave form alternating voltage: +12.5 V for 125 ms followed by -5 V for 125 ms, repetitively. The transblotting process was completed after 12-18 hours. To perform the digestion during the electroblotting, a double layer of IAVtrypsin membrane was intercalated between the polyacrylamide gel (where the protein resided) and the PVDF membrane (which acted as the collecting surface), to create a transblot-digestion sandwich. After the transfer procedure, the PVDF membranes were washed in deionised water for 5 minutes (and when required were stained). 2.8. PIGD Immediately after SDS-PAGE protein separation, gels were soaked 3 times in deionised water for 5 minutes. The entire wet gel or a selected part of it was air dried at room temperature, over night. The gel was then rehydrated and incubated at 35°C with a volume (corresponding to 3-5 times the initial volume of the gel) of 0.1 mg/ml trypsin in 10 mM Tris-HCl, pH 8.2. After 30 minutes of incubation for rehydratation and partial protein digestion, the excess of trypsin solution was removed. Then, the gel was incubated for a further 30 minutes at 35°C to complete the digestion. Proteins and peptides contained in the gel were electroblotted onto PVDF membranes using the procedure described above. 2.9. DPD combined method After migration, gels were soaked and dried as for the PIGD procedure. They were rehydrated with 0.05 mg/ml trypsin in 10 mM Tris-HCl, pH 8.2 during 30 minutes at 35°C. At this stage, the gel was transblotted onto PVDF membrane using the OSDT process. 2.10. MALDI-TOF-MS MS measurement from PVDF membranes and liquid solution were conducted with a MALDI-TOF mass spectrometer Voyager™ Elite (PerSeptive Biosystems, Framingham MA, USA) equipped with a 337 nm nitrogen laser. The analyser was used in the reflectron mode at an accelerating voltage of 20 kV, a delayed extraction parameter of 140 ns and a low mass gate of 850 Da. Laser power was set slightly above threshold for molecular ion production. Spectra were obtained by summation of 10 to 256 consecutive laser shots. For both, IGD and OMD, solutions were used directly without further sample preparation or cleanup prior to MALDI-TOF-MS analysis. One Pl of the digested protein solution was loaded on the MALDI stainless steel sample plate and 1 Pl of 4 mg/ml ACCA in 30% ACN, 0.1% TFA matrix solution used was added and air-dried. Autolysis products of trypsin were used as
126
BIENVENUT ET AL.
internal calibrants (singly protonated peptides 98-107 and 58-77). For PVDF membranes, two different methods were used: sequential or automatic. The sequential method was used only to obtain a single MALDI spectrum from a limited portion of the membrane. Small pieces of PVDF (1x1 to 2x4 mm2) were cut and fixed with silicone grease to an appropriately modified MALDI sample plate. One Pl of matrix solution (5 mg/ml ACCA in 70% MeOH) was deposited onto the PVDF membrane. For internal calibration purposes, the matrix solution also contained two synthetic peptides. Development of an automated procedure which enables scanning of the membrane is described in detail in Binz et al. article (Binz, Muller et al., 1999). 2.11. Post-acquisition processing and software identification tools Measured masses were submitted to the PMF search tool PeptIdent (Binz, Wilkins et al., 1999; Wilkins et al., 1999) (http://www.expasy.ch/tools/) located on the ExPASy server (http://ch.expasy.org). Some restrictions were applied. The apparent masses of the parent protein based on electrophoretic migration were used with a margin of ±20%. The species of origin of the various proteins were also taken as known. No pI limits were introduced to restrict the search for the SDS-PAGE standards. For 2-DE, pI and MW values were determined by gel matching to the human plasma SWISS-2DPAGE master gel available on the ExPASy server (http://www.expasy.ch/ch2d/). These values were used with a tolerance of ±1 pI unit and 30% of the MW. Peptide mass tolerance used was ± 0.2 Da for both 1-DE or 2DE. Cysteine and methionine modifications were chosen on the web submitting form depending on chemical treatment applied to the protein sample. FindMod (http://www.expasy.ch/tools/) was used for peptide identification when comparing digestion efficiency. To compare the efficiency of different techniques, we calculated the percentage of amino acids covered by the identified fragments for each protein. The better the digestion efficiency, the fewer missed cleavages (MC) were found. 3. RESULTS 3.1. Activity measurement of trypsin covalently bound to the IAV membrane The enzyme surface density as determined by TAME test showed 0.90 ± 0.20 Pg of active trypsin per cm2 (53 TAME tests and 18 membranes tested). No correlation was found between the initial trypsin concentration during the membrane preparation and the surface density (Corr.: -0.18). Activity remained stable up to a
OMC
OSDT
PIDG
DPD
Value
±14.3
18.1
45.2
Av.
35.3
ALBU
MYSS
43.6
OVAL
49.4
35.1
CAH2
39.4
64.1
ITRA
BGAL
61.2
LYC
PHS2
60.3
BPT1
82.7
44.8
90.0
82.3
87.5
83.3
100.0
100.0
90
66.7
13.8
33.3
10
11.8
12.5
16.7
0.0
0.0
10.0
33.3
3.5
22.2
0.0
8.8
0.0
0.0
0.0
0.0
0.0
0.0
±16.3
38.8
9.7
29.5
21.3
46.3
42.3
44.2
56.6
60.3
78.9
87.5
81.2
57.4
73.2
85.4
84.2
95.8
66.7
19.1
12.5
18.8
33.2
26.8
8.3
15.8
4.2
33.3
2.0
0.0
0.0
9.4
0.0
6.3
0.0
0.0
0.0
±6.7
23.5
32.4
16.8
12.8
27.5
25.8
25.6
67.1
50.0
67.0
55.2
80.2
75.6
73.6
30.6
50.0
24.4
42.9
19.8
21.4
26.4
2.3
0.0
8.6
15.7
0.0
3.3
0.0
±17.0
34.4
27.0
28.8
35.4
25.1
26.3
26.9
55.6
69.3
69.9
74.4
63.8
56.5
87.5
87.5
71.4
43.2
75.0
25.8
19.9
30.1
21.1
12.5
12.5
28.6
56.8
25.0
4.3
5.7
6.1
22.3
0.0
0.0
0.0
0.0
0.0
±11.9
25.0
8.5
35.0
27.6
25.4
26.0
24.7
13.1
16.6
48.3
78.6
61.5
87.0
61.1
57.1
16.7
100.0
71.4
85.7
100
20.2
38.5
13.0
27.8
42.9
83.3
0.0
28.6
14.3
0.0
2.0
0.0
0.0
11.1
0.0
0.0
0.0
0.0
0.0
0.0
Proteins % Cov 0MC 1MC 2MC % Cov 0MC 1MC 2MC % Cov 0MC 1MC 2MC % Cov 0MC 1MC 2MC % Cov 0MC 1MC 2MC
IGD
Table 1. Results of IGD, OMD, OSDT, PIGD and DPD expressed as percentage of sequence coverage (% Cov.), and efficiency of the digestion (0 to 2 peptides missed cleavages: 0 MC, 1 MC and 2MC)
MOLECULAR SCANNER DEVELOPMENT 127
128
BIENVENUT ET AL.
year when membranes were stored in 46 mM Tris-HCl, 1 mM CaCl2, 0.1% azide buffer at 4°C. Membranes could be reused: tryptic activity decreased slightly after at least 3 cycles of transblot-digestion but was still sufficient. 3.2. IGD Figure 1A shows 1-DE separation of the broad range MW standard proteins. Protein IGD was performed using a gel with similar separation using 1 Pg for each protein band. Results of this digestion are summarised in Table 1. The average value of protein sequence coverage obtained with tryptic digestion is 45.2 ± 14.0 % coverage (9 tested proteins). These results are better than those obtained by Sheler et al. (Sheler et al., 1998) for tryptic IGD for CBB R 250 proteins staining (27.8 ± 3.0% coverage, 4 tested proteins) and similar to their results obtained with tryptic digestion for CBB G 250 proteins staining (39.5 ± 13.5% coverage, 9 tested proteins). Mass spectra gave sufficient information to identify all 9 studied proteins with the PMF technique.
Figure 1. 1-DE (Coomassie Blue stained) separation of broad range molecular weight standards on 12% PAGE (A) or PVDF membranes (amido black stained)obtained after different transblotting processes: Standard transblotting using ½ Towbin buffer without SDS (B) or with 0.01% SDS (C), using CAPS buffer with 0.01% SDS (D) and different digestion methods including OSDT (E), PIDG (F) and DPD (G)
3.3. OMD This technique can afford high recovery of protein during the transblotting process, although high MW and basic proteins present transblotting difficulties. Addition of a
MOLECULAR SCANNER DEVELOPMENT
129
small amounts of SDS helps the high MW proteins migration but has little to no effect on basic proteins (Mozdzanowski & Speicher, 1992). Addition of 0.01% SDS to the ½ Towbin buffer was found to have a positive effect on phosphorylase b and E-galactosidase transfer (Figure 1B-C). For the more basic proteins (pI > 8.5), a more basic buffer such as CAPS pH 11 is much more effective compared to ½ Towbin, pH 8.4. As an example in Figure 1B, myosin (MW = 220 kDa), lysozyme (pI = 9.3) and pancreatic trypsin inhibitor (pI = 9.2) are not detectable when ½ Towbin is used, in spite of the presence of 0.01% of SDS. CAPS buffer is more effective for transblot of basic proteins (Figure 1D), but high MW proteins such as myosin are still not detectable. Due to these difficulties to transblot some proteins from 1-DE or 2-DE to the PVDF membrane, it is not possible to identify all proteins. The results of OMD obtained for transblotted proteins using the best conditions (CAPS buffer, 0.01% SDS) are summarised in Table 1. Mass spectra gave sufficient information to identify 8 of the 9 proteins studied (myosin is missing) with the PMF technique. Results in term of amino acids sequence coverage shows 38.8 ± 16.3% (9 tested proteins). IGD and OMD were thus very similar for sequence coverage and also for digestion efficiency, with 70 to 80% of no MC peptides, around 20% of 1 MC and 0 to 5% of 2 MC. These two methods commonly used for proteins identification by PMF were the reference methods with which our new approaches were compared. 3.4. OSDT The electrotransfer process was optimised using different voltage profiles. The continuous voltages usually applied for the electroblot process were not satisfactory in terms of protein digestion in the range studied (10 to 40 V, data not shown). We postulated that proteins were not staying long enough and "shaked" in the tryptic interface to be efficiently digested. In order to reduce the effective migration rate of protein through the IAV membranes during the transfer, we applied an asymmetrical alternating voltage with square wave form. Due to our choice of square wave form voltage, the effective voltage applied was 3.7 V and the transblotting process was completed after 12-18 hours. During this period where IAV-trypsin membranes were intercalated between the SDS-PAGE and PVDF membrane, only a small loss of protein or polypeptide resolution by diffusion is evident (Figure 1 E, to be compared with the normal transblot, Figure 1D). However, since this process is based on electroblotting, problems of extracting high MW and basic proteins from the gel as described above are still applicable. This is shown in Figure 1E where pancreatic trypsin inhibitor, lysozyme and myosin are not visualised on the collecting PVDF membrane. The results obtained with this combined digestiontransfer method shows lower sequence coverage (23.5 ± 6.7%, 27 tested proteins) as well as lower digestion efficiency (more missed cleavages) than IGD (45.2 ± 14.0% coverage) or OMD (38.8 ± 16.3% coverage). Mass spectra gave sufficient information to identify 6 of the 9 proteins studied with the PMF technique.
BIENVENUT ET AL.
130 3.5.PIGD
The rehydrated gel was transblotted using standard conditions, with CAPS buffer to improve the transfer of digested basic or high MW proteins. Most of the digested proteins were transblotted (Figure 1F) to the PVDF membrane under the best electroblotting conditions. The results obtained with this technique in terms of sequence coverage were good for basic and high MW proteins (69.3% coverage for pancreatic trypsin inhibitor, 55.6% for lysozyme, 35.4% for albumin, 28.8% for PHS2, see Table 1). In term of digestion efficiency, this method is similar to OSDT with 67% of 0 MC, 30% of 1 MC and 2% of 2 MC. Mass spectra gave sufficient information to identify 8 of the 9 proteins studied with the PMF technique. 3.6. DPD applied to 1-DE Peptide masses corresponding to the matched peptides using PeptIdent tool are labelled. The combination of the 2 techniques led to a great improvement of protein digestion and transfer (Figure 1G) of polypeptides fragments to the collecting membrane. The results of this method are summarised in Table 1. In term of sequence coverage, the results obtained are lower than the standard and PIGD methods but similar to OSDT technique. It appears that the digestion quality was similar to the sequential digestion method with 80% of 0 MC, 20% of 1 MC and 04% of 2 MC (compared with IGD and OMD, Table 1). Mass spectra gave sufficient information to identify all 9 studied proteins with the PMF technique. Table 2: Results for PIGD, OSDT and the DPD applied to the mini 2-DE gel * Analyses were performed under the same conditions using the APO A1 spot PIGD
OSDT
DPD
46.5 ± 13.9
38.6 ± 14.8
67.2 ± 11.3
0 MC
29.6
38.3
40
1 MC
63
28.3
50
2 MC
7.4
33.3
10
12 ± 5
10 ± 4
20 ± 3
18500 ± 7500
9500 ± 8000
35000 ± 6000
% Coverage
no. of identified peptides Average intensity of the 5 highest peaks
MOLECULAR SCANNER DEVELOPMENT
131
Figure 2: Fragment of the PVDF amido black stained membrane and MALDI-TOF-MS spectra of APA1 obtained directly from the collecting PVDF membrane with the 3 different digestion processes: PIGD, B) OSDT and C) DPD
3.7. Comparative digestion between OSDT, PIGD and the DPD applied to 2-DE We compared the performance of each technique with proteins of human plasma separated by mini 2-DE (Figure 2). One selected protein (Apolipoprotein A-1, APA1) was analyzed in each experiment. The MALDI-TOF-MS mass spectra obtained with the three different techniques are shown in Figure 2 and results are summarized in Table 2.
BIENVENUT ET AL.
132
This comparison shows clearly the advantage of the combined method in terms of sequence coverage, which was similar for PIGD (46.5 ± 13.9%, 3 samples) and OSDT (38.6 ± 14.8%, 3 samples) but significantly higher for the combined technique (67.2 ± 11.3%, 3 samples). Signal intensity was also higher for the combined technique. 3.8. DPD applied to 2-DE The DPD technique was applied to a mini 2-DE separation of an E. coli sample as described in materials and methods. The same sample was run on 3 gels: one was used for Coomassie blue staining (Figure 3 A) and the second was electrotransferred to a PVDF membrane which was stained with amido black (Figure 3 B). The DPD technique was applied to the last gel (Figure 3 C) and the collecting PVDF membrane was stained with amido black, however, no spot was visible after this destaining step. A 9 x 13 mm area was cut from the collecting membrane that (representing a pI range from 5.1 to 5.2 and a MW range from 35 to 45 kDa) and scanned every 300 Pm by MALDI-MS. The 1536 spectra obtained were used to recreate the MS intensity image (Figure 3 D). Each spectrum was also used for protein identification. Four different proteins located in 6 different positions were identified. These scanning results were confirmed by IGD of the corresponding spot of the Coomassie blue stained gel. Using this technique, the overlap of 3 proteins (IDH_ECOLI, AC: P08200; PGK_ECOLI, AC: P11665: METK_ECOLI, AC: P04384) was clearly visualised with the imaging software MELANIE (Appel, Palagi et al., 1997; Appel, Vargas, Palagi, Walther, & Hochstrasser, 1997). 4. DISCUSSION AND CONCLUSION As a consequence of the increasing importance of proteome mapping for biological and clinical (Hochstrasser, 1997) applications, new high throughput identification methods are needed. The different techniques proposed here highlight IGD as the gold standard for protein identification using PMF technique in terms of percent coverage of the sequence (45.2 ± 14.0%) and the number of proteins identified (9 out of 9). At present, the traditional IGD method, involving sequential gel spot excision and digestion, is a bottleneck for protein identification (Hochstrasser, 1998; Houthaeve et al., 1997) Different ways to speed up the process have been proposed, e.g. using robotics for cutting spots from gels and for automating sample handling (Traini et al., 1998b). Although quicker and more reproducible than the manual procedure, the method remains a sequential one. In addition, the size of the spot to be excised is usually defined a priori and cannot be adapted as a function of protein quantity as well as protein overlapping. In contrast to these sequential approaches we have proposed a parallel method of protein digestion. Proteins of 1-DE or 2-DE gels are treated simultaneously, thus providing a highly parallel digestion technique. Two new approaches were studied separately and also combined. All methods
MOLECULAR SCANNER DEVELOPMENT
133
produce a collection of digested protein fragments on a PVDF membrane after a transblotting process. PVDF membranes stained with amido black are shown in Figure 1 B to G. Intensities of the stained proteins differ depending of the digestion technique, probably due to the different relative staining efficiencies of peptides and proteins. It is well known that protein-dye interactions are mostly electrostatic and non-specific in nature, i.e. Van der Waals or hydrogen bonds (Salih & Zenobi, 1998). Dyes such as amido black (sulfonate derivatives) act mainly through electrostatic interactions with the basic residues (lysine, arginine, histidine and N-terminus amino group) of polypeptides. A minor part of the complex formation is due to low energy interaction. With large proteins, 1 dye molecule reacts with 1 amino group to create a negatively charged multidye (Salih & Zenobi, 1998) to which further dye molecules become associated: in the case of lysozyme with 18 basic residues, 48 dye molecules can be bound (Tal et al., 1985) With small peptides, on the other hand, 1 dye molecule can complex more than one peptide (e.g. 2 peptides for one amido black molecule) and no low energy interactions can be developed (Salih & Zenobi, 1998). Thus, the staining intensity of a protein decreases continuously as a function of the extent of digestion. Figure 2 shows apolipoprotein A1 after 2-DE and parallel digestion. For the OSDT sample, the staining intensity is the highest but MALDIMS spectrum intensity is the lowest, whereas for the DPD sample the spectrum intensity is the highest and the staining intensity of the protein on the collecting membrane is the lowest. In the OSDT approach, proteins are extracted from the gel and digested during the transfer to the PVDF collecting membrane. Trypsin immobilised on IAV membranes was found suitable. These membranes were originally designed for covalent protein microsequence analysis (Coull, Pappin, Mark, Aebersold, & Koster, 1991; Pappin, Coull, & Koster, 1990) and were also used for polypeptide or enzyme immobilisation (Canas, Dai, Lackland, Poretz, & Stein, 1993; Seo et al., 1993). In our procedure, trypsin was attached covalently to the IAV membrane and then used in the transblotting sandwich. Others enzymes could be similarly immobilised to the IAV membrane. A compromise must be found between a buffer which facilitates efficient polypeptide transfer (especially high MW and basic fragments) and one in which trypsin is active. Due to the limited range of pH for optimum trypsin activity, a basic buffer like CAPS buffer cannot be used. With ½ Towbin at a pH suitable for tryptic activity, the transblotting buffer was not at the optimum pH and composition for the transfer of basic and high MW proteins. Low MW proteins (below 60 kDa) showed higher sequence coverage than larger proteins (except BGAL: in spite of its high MW, this protein has transferred under all blotting conditions). Basic pI and high MW proteins generally presented problems for the transblot process and so for high yield digestion. As a consequence, OSDT is particularly well adapted to low MW polypetides ( 40 kDa, PIGD). The methods are thus complementary and the combination of PIGD and OSDT was tested successfully. In this combined method, the first step involved modifying the physical and chemical properties of the proteins by partial enzymatic digestion. The resulting fragments were then transblotted (and further digested) using the OSDT procedure. By comparison to the ”gold standard” (IGD), the percent sequence coverage obtained with the DPD technique is so far lower (25.0 ± 11.9% for the DPD and 45.2 ± 14.0% for the IGD) but was sufficient to allow identification of all 9 proteins in spite of their mixed characteristics (high and low MW, basic and hydrophobic). When this technique was applied to a mini 2-DE gel of E. coli, the
136
BIENVENUT ET AL.
scanning process allowed us to obtain spectra from overlapping 2-DE proteins (IDH_ECOLI, AC: P08200; PGK_ECOLI, AC: P11665: METK_ECOLI, AC: P04384). The spatial resolution of this technique is a result of the relatively narrow dimension of the laser beam dimension compared to the size of gel pieces used for IGD. Another advantage of this method in comparison with IGD is that the digestion is highly parallel: with DPD, thousands of proteins may be digested simultaneously overnight. It might prove possible to add other enzymatic activities to the digestion sandwich. One could envisage using a phosphatase or a glycosidase followed by an endoproteinase. A major drawback of this technique would appear to be the low intensity of peptide staining compared to protein, which limits visualisation of the gel separation after polypeptide electroblotting. However, this parallel digestion approach was developed for detection by MALDI-TOF MS scanning. A molecular scanner providing virtual visualisation of the collecting membrane limits the impact of this problem and is proposed by Binz et al. (Binz, Muller et al., 1999). This scanning method is part of a highly automated integrated system involving automated scanning by MALDI-TOF MS, spectra treatment, identification of proteins by PMF and creation of a fully annotated 2-DE map. 5. ACKNOWLEDGMENTS This work was supported by the Swiss National Fund for Scientific Research (grant 32-49314.96) and the Montus Foundation. PAB acknowledges financial support from the Helmut Horten Foundation. 6. REFERENCES Appel, R., Palagi, P., Walther, D., Vargas, J., Sanchez, J., Ravier, F., et al. (1997). Melanie II--a thirdgeneration software package for analysis of two-dimensional electrophoresis images: I. Features and user interface. Electrophoresis, 18(15), 2724-2734. Appel, R., Vargas, J., Palagi, P., Walther, D., & Hochstrasser, D. (1997). Melanie II--a third-generation software package for analysis of two-dimensional electrophoresis images: II. Algorithms. Electrophoresis, 18(15), 2735-2748. Bienvenut, W., Sanchez, J., Karmime, A., Rouge, V., Rose, K., Binz, P., et al. (1999). Toward a clinical molecular scanner for proteome research: parallel protein chemical processing before and during western blot. Anal Chem, 71(21), 4800-4807. Binz, P., Muller, M., Walther, D., Bienvenut, W., Gras, R., Hoogland, C., et al. (1999). A molecular scanner to automate proteomic research and to display proteome images. Anal Chem, 71(21), 49814988. Binz, P., Wilkins, M., Gasteiger, E., Bairoch, A., Appel, R., & Hochstrasser, D. (1999). In R. Kellner, F. Lottspeich & H. Meyer (Eds.), Microcharacterisation of proteins (2nd ed., pp. 277-300). Berlin: Wiley-VCH. Canas, B., Dai, Z., Lackland, H., Poretz, R., & Stein, S. (1993). Covalent attachment of peptides to membranes for dot-blot analysis of glycosylation sites and epitopes. Anal. Biochem., 211(2), 179182. Coull, J., Pappin, D., Mark, J., Aebersold, R., & Koster, H. (1991). . Anal. Biochem., 194, 110-120. Eckerskorn, C., Strupat, K., Schleuder, D., Hochstrasser, D., Sanchez, J., Lottspeich, F., et al. (1997). Analysis of proteins by direct scanning IR-MALDI-MS after 2-D PAGE separation and electroblotting. Anal. Chem., 69, 2888-2892.
MOLECULAR SCANNER DEVELOPMENT
137
Figeys, D., Ducret, A., Yates, J., & Aebersold, R. (1996). Protein identification by solid phase microextraction-capillary zone electrophoresis-microelectrospray-tandem mass spectrometry. Nature Biotechnology, 14(11), 1579-1583. Hellman, U., Wernsted, C., Gonez, J., & Heldin, C. H. (1995). Improvement of an in-gel digestion procedure for the micropreparation of internal protein-fragments for amino acid sequencing. Anal. Biochem., 224(1), 451-455. Hochstrasser, D. (1997). In M. Wilkins, K. Williams, A. RD & D. Hochstrasser (Eds.), Proteome research: new frontiers in functionnal genomics. Berlin: Springer-VCH. Hochstrasser, D. (1998). Proteome in perspective. Clin. Chem. Lab. Med., 36(11), 825-836. Houthaeve, T., Gausepohl, H., Ashman, K., Nillson, T., & Mann, M. (1997). Automated protein preparation techniques using a digestt robot. J. Prot. Chem., 16(5), 343-348. Jin, Y., & Cerletti, N. (1992). Appl. Theor. Electrophor., 3, 1342-1351. Jungblut, P., Thiede, B., Zimmy-Arndt, U., Muller, E., Scheler, C., Wittmann-Liebold, B., et al. (1996). Resolution power od 2-DE and identification of proteins from gels. Electrophoresis, 17(5), 839-847. Laemmli, U. K. (1970). Cleavage of structural proteins during the assembly of the head of bacteriophage T4. Nature, 227(259), 680-685. Ogorzalek Loo, R., Mitchell, C., Stevenson, T., Martin, S., Hines, W., Juhasz, P., et al. (1997). Electrophoresis, 18, 382-390. Pappin, D., Coull, J., & Koster, H. (1990). In J. Villafranca (Ed.), Current research in protein chemmistry (pp. 191-202). San Francisco: Academic Press. Pappin, D., Coull, J., & Koster, H. (1990). Solid-phase sequence analysis of proteins electroblotted or spotted onto polyvinylidene difluoride membranes. Anal. Biochem., 187(1), 10-19. Pappin, D., Rahman, D., Hansen, H., Bartlet-Jones, M., Jeffery, W., & Bleasby, A. (1996). In A. Burlingame & S. Carr (Eds.), Mass spectrometry in the biological science (pp. 135-150). Totawa, NJ: Humana press. Rosenfeld, J., Capdevielle, J., Guillemot, J., & Ferrara, P. (1992). In-gel digestion of proteins for internal sequence analysis after one- or two-dimensional gel electrophoresis. Analytical Biochemistry, 203(1), 173-179. Salih, B., & Zenobi, R. (1998). MALDI mass spectrometry of dye-peptide and dye protein complexe. Anal. Chem., 70, 1536-1543. Sanchez, J., & Hochstrasser, D. (1998). In A. Link (Ed.), Method in molecular biology: 2-d proteome analysis protocol (Vol. 112, pp. 227-233). Totowa, NJ: Humana press. Scheler, C., Lamer, S., Pan, Z., Li, X., Salnikov, J., & Jungblut, P. (1998). Peptide mass fingerprint sequence coverage from differently stained proteins on 2-DE patterns by MALDI-MS. Electrophoresis, 19, 918-927. Seo, M. L., Kim, J. S., Lee, S. S., Bae, Z. U., Lee, H. L., & Park, T. M. (1993). Amperometric enzyme electrode for the determination of NH4+. J. Korean Chem. Soc., 37(11), 937-942. Shevchenko, A., Wilm, M., Vorm, O., & Mann, M. (1996). Mass spectrometric sequencing of proteins silver-stained polyacrylamide gels. Analytical Chemistry, 68(5), 850-858. Tal, M., Silberstain, A., & Nusser, E. (1985). Why does coomassie brillant blue R interact differently with different proteins ? a partial answer. J. Biol Chem, 260(18), 9976-9980. Traini, M., Gooley, A. A., Ou, K., Wilkins, M. R., Tonella, L., Sanchez, J.-C., et al. (1998). Towards an automated approach for protein identification in proteome projects. Electrophoresis, 19, 1941-1949. Wilkins, M., & al., e. (1999). High throughput mass spectrometry discovery of protein post translational modification. J. Mol. Biol., 289, 645-657. Wilkins, M., Pasquali, C., Appel, R., Ou, K., Golaz, O., Sanchez, J., et al. (1996). From proteins to proteomes: large scale protein identification by 2-D electrophoresis and amino acids analysis. Bio/techniques, 14, 61-65. Williams, K., & Hochstrasser, D. (1997). In Proteome Research: New Frontiers in functional genomics (pp. 1-12). Berlin: Springer-Verlag. Wilm, M., & Mann, M. (1996). Analytical properties of the nanoelectrospray ion source. Anal Chem, 68(1), 1-8.
CHAPTER 3 QUANTITATION DURING ELECTROBLOTTING STEP Enhanced Protein Recovery after Electrotransfer using Square Wave Alternating Voltage. Reprint with permission from (Bienvenut, Deon, Sanchez et al., 2002) copyright 2002, with permission from Elsevier Science
WV. Bienvenut, C. Deon, J-C. Sanchez, DF. Hochstrasser
ABSTRACT Protein identification is becoming a complement to the available fully sequenced genomes. To meet the challenge, newly developed techniques for high throughput protein identification using matrix-assisted laser desorption/ionisation mass spectrometry (MALDI-MS) and peptide mass fingerprint are needed. Two years ago, a parallel protein digestion process was proposed. It provided a collecting polyvinylidene difluoride (PVDF) membrane able to be scanned by MALDI. Acquired data were used to recreate a virtual multidimensional image. Voltage used during this protein electroblotting technique was an unusual square wave alternative voltage (SWAV). The goal of the current study is to evaluate quantitatively the efficiency of the SWAV compared with a classical electroblot process on intact proteins. The effect of the pulsed electric field and the buffer composition were compared to a standard continuous transblotting process defined as the gold standard. Combination of the pulsed asymmetric electric field with 3(cyclohexylamino)-1-propane-sulfonique acid (CAPS) buffers showed an average 65 % increase of protein recovery. Moreover, a strongest effect is observed for high Mr proteins. In conclusion, the present study highlighted a positive influence of the "shaking" effect of the asymmetric alternative voltage on gel protein extraction. KEYWORDS Protein recovery / Electroblot / Electroelution / Squared wave alternating voltage / Quantification
139 W. V. Bienvenut (ed.), Acceleration and Improvement of Protein Identification by Mass Spectrometry, 139–150. © 2005 Springer. Printed in the Netherlands.
140
BIENVENUT ET AL.
1. INTRODUCTION The rapid development of genome sequencing projects is producing a huge amount of potentially expressed proteins from many species {Blattner, 1997 #84;Consortium, 1998 #893;Venter, 2001 #1377;Consortium, 2001 #544}.By translation, most DNA sequences give primary structures of potential proteins. These information sources are important for further investigations and ultimate comprehension of organism physiology (Achaz et al., 2000; Hood, 1999). However, identification and characterization of these proteins is an immense challenge due to the huge number of potential samples (proteins) in a single gel. Mass spectrometric techniques such as MALDI-MS (Karas & Hillenkamp, 1988) and electrospray tandem mass spectrometry (Aleksandrov et al., 1984; Yamashita & Fenn, 1984) developed during the last decade allow rapid identification of target proteins with acceptable accuracy (Joubert-Caron et al., 2000; Kaji et al., 2000; Raymackers et al., 2000). New techniques for a genuinely high throughput are needed and our laboratory proposed a technique where proteins are endoproteolytically cleaved in a parallel way. This technique was named "Double Parallel Digestion" (DPD). The collecting membrane was directly scanned by MALDI-MS and a virtual multidimensional image was obtained by bioinformatic treatment of the MALDI-MS data (Bienvenut et al., 1999; Binz, Muller et al., 1999). Due to the large difference between standard electroblotting techniques using continuous current or tension and our electroblotting specification, it was interesting to determine the influence of such method using a SWAV electroblotting with ½ Towbin buffer (Eckerskorn & Lottspeich, 1990). The difference of protein recovery under different electric fields (continuous current and SWAV) and/or buffers (½ Towbin and CAPS) were evaluated. 2. MATERIAL AND METHODS 2.1. Mono-dimensional electrophoresis (1-DE) 1-DE was conducted essentially according to Laëmmli (Laemmli, 1970) with 12% T and 2.6% C linear polyacrylamide gels. Three different protein mixtures were used for this study. The first two mixtures were the commonly used, wide range Mr standard proteins containing nine unlabelled proteins (MYSS, mixture of myosin from rabbit skeletal muscle (SWISS-PROT accession number: Q28641 and P02562); BGAL: E-Galactosidase from E. Coli (SWISS-PROT accession number: P00722); PHS2: Phosphorylase b from rabbit skeletal muscle (SWISS-PROT accession number: P00489); BSA: Bovine serum albumin muscle (SWISS-PROT accession number: P02769); OVAL: ovalbumin from chicken hen egg white (SWISS-PROT accession number: P01012); CAH2: carbonic anhydrase type 2 from bovine serum (SWISS-PROT accession number: P01012); ITRA: trypsin inhibitor
QUANTITATION DURING ELECTROBLOTTING STEP
141
from soybean (SWISS-PROT accession number: P01012); LYC: lysozyme from Chicken hen egg white (SWISS-PROT accession number: P01012); BPT1: trypsin inhibitor from Bovine pancreas (SWISS-PROT accession number: P01012)) and the low range Mr standard proteins containing six unlabelled proteins (PHS2, BSA, OVAL, CAH2, ITRA, LYC). Both were obtained from Bio-Rad (Hercules, CA, USA). The third mixture of six [14C] labelled proteins (MYSS, PHS2, BSA, OVAL, CAH2, LYC) was purchased from Amersham Pharmacia Biotech (Upsala, Sweden). For all of these experiments, 1 Pg of each unlabelled protein or 50 nCi of each [14C] radiolabelled proteins were loaded on a single lane of the 1-DE gel. For all mixtures, proteins were diluted to the correct concentration in 3% E-mercaptoethanol in TrisHCl (60mM, pH 6.8), glycerol (10% v/v), sodium dodecylsulfate (SDS 2% w/v) and reduced at 95°C for 5 minutes before 1-DE migration. Protein migration was carried out using the Mini-Protean II electrophoresis apparatus (Bio-Rad) operated at 200 V for 45 to 50 minutes. When necessary, the gel was stained with Coomassie brilliant blue R250 (CBB R250 0.1% w/v), water (60% v/v), methanol (30% v/v) and acetic acid (10% v/v) for 30 minutes and destained with repeated washes of water (50% v/v), methanol (40% v/v) and acetic acid (10% v/v). 2.2. Electroblot Immediately after the 1-DE protein separation, gels were soaked in deionised water for five minutes, and then equilibrated two times five minutes in the cathodic blotting buffer. Trans-Blot PVDF membranes (Bio-Rad) were equilibrated in the anodic buffer for 5 minutes. The different buffers used for this study and their compositions are detailed in Table 1. Electrotransfer was carried out in a laboratorymade semi-dry apparatus during an overnight period at room temperature. A double layer of PVDF membrane was used just to verify that no protein cross over the first membrane. As previously described (Sanchez & Hochstrasser, 1998), the standard blotting technique is using a continuous current corresponding to 1 mA/cm2 of transferred gel using heterogeneous CAPS buffer. This technique will be referred as the gold standard method for further comparison due to its extensive utilization for protein electroblotting (Bolt & Mahoney, 1997; Jungblut et al., 1990; Mozdzanowski & Speicher, 1992; Neumann & Mullner, 1998; Sanchez & Hochstrasser, 1998). The voltage applied to the transblotting sandwich in the parallel digestion technique (13) was an asymmetrical alternating voltage. This square wave alternating voltage (SWAV) delivers +12 V during 125 ms followed by -5 V during 125 ms repetitively. It corresponds to a 4 hertz frequency signal and an average tension of 3.5 V. After the electroblotting step, the membranes were washed rapidly with deionized water and air-dried. When necessary after the transblotting operation, gels were stained with CBB R250 as previously described. PVDF membranes were stained with AB (Amido black 0.5% w/v), isopropanol (25% v/v) and acetic acid (10% v/v) for 1 minute and destained by repeated washes in deionized water.
BIENVENUT ET AL.
142
2.3. Detection, quantification and statistics The protein electroblotting technique has been widely used and subjected to many investigations in order to quantify protein recovery on the collecting membrane (1823). Usually, this was carried out with [14C] radiolabelled proteins, which emit Eparticles of low energy easily absorbed by the environment. Due to the thickness of the gel, it is not possible to obtain an accurate measurement of the E- signal emitted by the proteins. Therefore, an absolute quantification of protein recovery on the collecting membrane was not possible. To overcome this problem, the signals acquired on the collecting membranes were compared with a reference obtained from a 1-DE separation of the [14C] labelled protein standard mixture. Detection of the [14C] radioactivity was performed with a Phosphor-Imager apparatus (Molecular Dynamics, AP Biotech, Upsala, Sweden). Control experiments were also conducted with unlabelled proteins. The electroblotted material collected onto the PVDF membranes was stained with AB and detection of the bands was achieved with an optical laser scanner (Molecular Dynamics, AP Biotech, Upsala, Sweden). Melanie 3 software (GeneBio, Geneva, CH) (24) was used for image treatment and band quantification. The band volume and/or optical density (OD) were used throughout the study for recovery comparison. When possible, statistical studies were conducted using the F and Students t test. Table 1: Electroblotting buffer composition Name
Anodic buffer Cathodic buffer
Heterogeneous CAPS(Traini et al., 1998b) 10 mM CAPS, pH 11, 20% MeOH (v/v) 10 mM CAPS, pH 11, 5 % MeOH
Heterogeneous ½ Towbin (Eckerskorn & Lottspeich, 1990) 13 mM Tris, 100 mM glycine, 20% MeOH (v/v) 13 mM Tris, 100 mM glycine, 5% MeOH (v/v)
Homogeneous ½ Towbin (Kaji et al., 2000) 13 mM Tris, 100 mM glycine, 12.5% MeOH (v/v) 13 mM Tris, 100 mM glycine, 12.5% MeOH (v/v)
2.4. [14C] signal linearity and influence of the accumulation time. Detection of [14C] labelled samples needed a long exposure period to provide a valid signal despite the utilization of a sensitive phosphor-imager support. To eliminate dependence on the exposure time, the signal intensity was not used directly but always compared with a reference to obtain a ratio. One lane of [14C] labelled proteins separated by 1-DE was air dried between two cellophane sheets using an Easy Breeze Air Gel dryer (Hoefer, AP Biotech, Upsala, Sweden). This reference gel was exposed on the storage phosphor screen for various periods from 4 to 70 hours. Signal response was linear and proportional to the exposure time for the
QUANTITATION DURING ELECTROBLOTTING STEP
143
period between 20 to 70 hours (data not shown). All further analyses were done using an accumulation time within that range. The ratio between the signal of the reference gel and signal of the sample was calculated such that ratios from different samples could be compared directly, non-influenced by the exposure time. 3. RESULTS AND DISCUSSION Due to the large difference between the gold standard (Sanchez & Hochstrasser, 1998) versus the SWAV electroblotting technique (Bienvenut et al., 1999), this study was conducted to determine the impact of the transblotting conditions, i.e. electric field applied and buffer composition, to the proteins recovery. The results are described in the next four sections. 3.1. Comparison of the electric field and buffer composition effects
Figure 1: Images obtained during the comparison of protein recovery between gold standard vs. SWAV transfer. [14C] labelled proteins (A-D) or Bio-Rad Mr standard followed by AB staining (E-F) were separated by 1-DE then, electroblotted to PVDF membrane using different electric fields and buffers: A and E) Heterogeneous CAPS, 1 mA/cm2; B) Heterogeneous CAPS, SWAV; C) Heterogeneous ½ Towbin , SWAV; D and F) Homogeneous ½ Towbin, SWAV.
Two major parameters were tested in this section: the electric field and the buffer used for protein electroblotting previously described in section 2.2. Results of the [14C] labelled protein recovery using with three different buffers were compared to the gold standard method. The phosphorimages of the collecting PVDF membranes (Figure 1A-D) showed seven bands corresponding to the six separated proteins (MYSS, PHS2, ALBU, OVAL, CAH2, LYC) plus the migration front. The ratios of
BIENVENUT ET AL.
144
the band intensities were calculated by dividing the SWAV transblotting process intensity over gold standard technique intensity. The increased values corresponding to the six proteins common with [14C] labelled mixture are given in Figure 2. 250
% increase of protein recovery
200
150
100
50
0 MYSS
PHS2
BSA
OVAL
CAH2
LYC
Average
Proteins B/A
C/A
D/A
F/E
F
Figure 2: Percentage increase in protein recovery with SWAV transblotting process compared to gold standard method applied to [14C] labelled (B/A, C/A, D/A) and Bio-Rad Mr standard (F/E) proteins (calculated from the value obtained from images shown in Figure 1). B/A, C/A and D/A correspond respectively to the ratio of the [14C] labelled protein’s signal intensity of the sample B, C and D (Figure 1) over the intensity of the sample A (gold standard). Similar ratio is calculated for the AB stained proteins (F/E). Since BGAL (36% increase) and ITRA (21% increase) were not present in the [14C] labelled proteins standard mixture, the ratio values are not shown. F/E Ratio did not allow to calculate the MYSS ratio since this protein band is not visible on the PVDF membrane after staining. Buffer composition influence on protein recovery when SWAV is used by comparison to the gold standard method. Direct comparison of the SWAV and the continuous current with the heterogeneous CAPS buffer shows an average 65% increase of [14C] labelled protein recovery whereas 46 and 35% respectively are obtained with heterogeneous or homogeneous ½ Towbin buffers. For both series of proteins, the strongest increase is obtained for the high Mr proteins (MYSS or PHS2 for the [14C] labelled protein and BGAL or PHS2 for the BioRad Mr standard proteins).
It can be noted that the heterogeneous CAPS buffer with the SWAV showed the highest mean recovery, with 65% increase by comparison with the same buffer used with the continuous current transfer (gold standard method). This result highlighted the positive influence of the SWAV versus the continuous current. Utilization of ½ Towbin in heterogeneous or homogeneous composition showed a lower recovery
QUANTITATION DURING ELECTROBLOTTING STEP
145
than the heterogeneous CAPS buffer with 46% and 35% of increase respectively which was still higher than recovery with the gold standard method. The same experiment as described above was conducted with Bio-Rad Mr standard proteins followed by AB staining of the PVDF membrane (Figure 1 E and F). Height bands are visible corresponding to BGAL, PHS2, BSA, OVAL, CAH2, ITRA, LYC and the migration front that also contained BPT1. MYSS, due to its high Mr, was usually not extracted and thus remained undetectable. This was confirmed by the presence of the MYSS band that is visible in the CBB R250 stained gel after the transfer (data not shown). It must be noted that the second layer of capture membrane in both experiments (14C labelled protein and AB stainned proteins) never shows protein trace (data not shown). The average increase in protein recovery corresponds to 35 and 24% for respectively [14C] labelled and AB stained proteins. Nevertheless, due to the great disparity of recovery values related to the proteins, this result is not a good representation of the transfer. When the calculation is done only with the values obtained for both samples ([14C] and AB stained proteins), they show 26% increase of protein recovery. The increase in protein recovery is not identical for the whole Mr range and a positive impact was found for high Mr proteins (MYSS and PHS2 for [14C] labelled proteins, BGAL and PHS2 for the AB stained membranes). Table 2: Description of the parameters used for the comparison of protein recovery as a function of applied electric field and transblotting buffer. Experiment 1 corresponds to the reference for the comparison ratio summarized in Figure 1. Experiments 2, 3 and 4 are using the SWAV with different buffers that allow to compare their influence. *: Gold standard Experiment
1* 2 3 4 Heterogeneous Heterogeneous Heterogeneous Homogeneous Buffer CAPS CAPS ½ Towbin ½ Towbin Electric field 1 mA/cm2 SWAV SWAV SWAV
3.2. Statistical test for the transfer reproducibility In order to verify the reproducibility of the observed increase in protein recovery, the electroblotting experiments conducted in section 3.1 were repeated (n=6) using the low range Mr standard protein from Bio-Rad followed by AB staining of the PVDF membranes. Table 2 details the results obtained in this study. The average 23% increase in protein recovery was identical to the previous result (Figure 2). This statistical analysis showed clearly that the increase of protein recovery for three out of the six proteins (PHS2, OVAL, and CAH2) was significant (p < 0.05). The average increase for these proteins represents more than 20% and is also significant with p < 0.05. For the low Mr proteins, a net benefit of the SWAV by comparison with the gold standard method was not clearly established: the differences were not statistically significant for LYC and ITRA. Itt was also the case for BSA. It must be
146
BIENVENUT ET AL.
noted that this protein is highly soluble in aqueous solution and easily transferred under normal conditions. Table 3 Reproducibility of the increase of protein recovery between electrotransfer using the gold standard method and SWAV transfer with ½ Towbin applied to unlabeled proteins (Low range Mr protein standard). The band volumes of both experiments were compared to determine if the % of increase in protein recovery were statistically significant. This was performed using an F test to determine if the SD of both band volume quantification were comparable followed by a student t test to verify the significance. Statistical result on a six times repeated experiment is shown in the last column: (+) p < 0.05, (-) Non significant; The protein band volume is indicated in the two central columns with the following format: “Average value r SD” (n=6). All of the 6 proteins as well as the average value showed an increase of protein recovery when the SWAV transferring method was used, but only PHS2, OVAL, CAH2 and the average value are significant with p < 0.05. Protein band volume using Gold standard electrotransfer technique Proteins 548 r 72 PHS2 913 r 161 BSA 653 r 118 OVAL 882 r 91 CAH2 839 r 110 ITRA 533 r 125 LYC 728 r 91 Average
Protein band volume using SWAV electrotransfer technique 767 r 117 1099 r 261 938 r 94 1093 r 113 869 r 142 595 r 294 893 r 144
% of increase (SWAV vs. 1mA/cm2) 40 (+) 20 (-) 44 (+) 24 (+) 4 (-) 12 (-) 23 (+)
3.3. Gel residual protein after transblotting process Previous results showed an increase of material recovery mostly for high Mr proteins. Consequently, the material remaining in the gel after the transblotting step must also be affected. To verify and quantify the amount of proteins remaining in the gel after the electroblot step, gels were air dried as previously described in section 2.4. The phosphorimages of the resulting gels from the gold standard transfer (1 mA/cm2 with heterogeneous CAPS buffer) and the SWAV transfer (homogeneous ½ Towbin with SDS) are shown in Figure 3. Ratios between the signal volume of the remaining proteins in the electroblotted gels (gold standard and SWAV electroblot) and the signal volume of the reference gel (unblotted gel) are shown in Figure 4. The average remaining material contained in the gold standard electrotransfered gel corresponded to 34 ± 14% (n=6) of the material contained in the unblotted gel whilst an average of 17 ± 9% (n=6) is remaining in the case of the SWAV transferred gel. The difference between these two samples was significant (p < 0.025). Nevertheless, a large disparity depending on proteins (SD = ± 14% and ±
QUANTITATION DURING ELECTROBLOTTING STEP
147
9%, n = 6) could be observed on Figure 4. High molecular weight proteins such as MYSS are more affected by this problem whereas 58% of the material can remain in the gel. In section 3.2, the comparison of protein recovery using gold standard electroblotting conditions and SWAV technique showed higher recovery for the second method. Results obtained after the quantification of the gel remaining material confirmed this observation. 14
C labelled Proteins
A (Gold standard)
B
C
D
E
F AB stained proteins MYSS BGAL PHS2 BSA
MYSS PHS2 BSA
OVAL
OVAL
CAH2 CAH2 ITRA
LYC
LYC
Figure 3: Phosphorimages of the 14C labelled proteins remaining in the gel after the electroblot process. A, Control unblotted gel used as a reference gel to calculate volume ratio in Figure 4; B, Gel after gold standard transfer (1 mA/cm2 of gel surface using heterogeneous CAPS); C, Gel after SWAV transfer using homogeneous ½ Towbin. In-gel remaining material is lower after the SWAV electrotransfer than after the gold standard electrotransfer. It is clearly visible that the high Mr proteins are more affected by this effect.
Quantification of the material remaining in the gel after the electroblotting step confirmed the advantage of SWAV utilization compared to the continuous current. This proposed voltage is able to increase up to 200% the protein recovery on the PVDF membrane. 4. CONCLUDING REMARKS The present study had two objectives: first to determine the effect of square wave alternative voltage versus continuous current on protein electroblotting, and second,
BIENVENUT ET AL.
148
Ratio of the proteins remaining in the gel after electroblot to the reference gel
to evaluate the influence of electroblotting buffer composition. The SWAV with heterogeneous CAPS buffer showed a strong beneficial effect for the pulsed voltage with 65% average increase of protein recovery. The strongest effect was found for the high Mr proteins i.e. MYSS, BGAL, PHS2. The effect was less important, for smaller proteins (< 60 kDa). It was also found that the buffer composition influenced the level of protein recovery. For example, compared to CAPS buffer with SWAV, the use of heterogeneous and homogeneous ½ Towbin buffers showed only respectively 46 and 35 % average increase of protein recovery. The material remaining in the gel after the electroblotting step also confirmed the higher recovery for high Mr proteins. Comparison of retained material between the gold standard method and the SWAV method showed a decrease of the material in the gel, mostly for the larger proteins. Utilization of the SWAV could be generalized since the average material recovery of intact protein is 65% higher than the gold standard method. This pulsed electric field technique is highly interesting for any postseparation analysis using PVDF as a matrix. More generally this technique is applicable for protein recovery from gels, e.g. electroelution.
0.7 0.6
0.58
0.5 0.4
0.4
0.4
0.34 0.3
0.3
0.26 0 26
0.2
0 0.23
0 0.24
0.15 0 15
0.22 00.15 15
0 0.17
0.14
0.1 0 0 MYSS
PHS2
BSA
OVAL
CAH2
LYC
Mean value
Proteins Gel after gold standard electroblotting
Gel after SWAV electrotransfer
Figure 4: Signal volume ratio between the remaining 14C labelled proteins after gold standard and the SWAV electroblotting process over the reference unblotted gel used as a reference gel (values obtained from gels shown on Figure 3)
QUANTITATION DURING ELECTROBLOTTING STEP
149
6. ACKNOWLEDGEMENT This work was supported by the Swiss National Fund for Scientific Research (grant 31-59095.99). The authors acknowledge Prof. Jacques Deshusses, Dr. Richard W. James, Dr. Manfred Heller, Dr. Patricia Palagi, Dr. Christine Hoogland, Dr. Sonja Voordijk and Applied Biosystems for their technical support. 7. REFERENCES Achaz, G., Coissac, E., Viari, A., & Netter, P. (2000). Mol Biol Evol, 17, 1268-1275. Aleksandrov, M., Gall, L., Krasnov, V., Nikolae, V., Pavlenko, V., Shkurov, V., et al. (1984). Bioorg. Khim., 10, 710. Bienvenut, W., Deon, C., Sanchez, J., & Hochstrasser, D. (2002). Enhanced protein recovery after electrotransfer using square wave alternating voltage. Anal Biochem, 307(2), 297-303. Bienvenut, W., Sanchez, J., Karmime, A., Rouge, V., Rose, K., Binz, P., et al. (1999). Toward a clinical molecular scanner for proteome research: parallel protein chemical processing before and during western blot. Anal Chem, 71(21), 4800-4807. Binz, P., Muller, M., Walther, D., Bienvenut, W., Gras, R., Hoogland, C., et al. (1999). A molecular scanner to automate proteomic research and to display proteome images. Anal Chem, 71(21), 49814988. Blattner, F., Plunkett, G. r., Bloch, C., Perna, N., Burland, V., Riley, M., et al. (1997). Science, 277, 1453-1474. Bolt, M., & Mahoney, P. (1997). High-efficiency blotting of proteins of divers sizes following SDSPAGE. Anal. Biochem., 247, 185-192. Consortium, I. H. G. S. (2001). Initial sequencing and analysis of the human genome. Nature, 409(6822), 860-921. Eckerskorn, C., & Lottspeich, F. (1990). Combination of two-dimensional gel electrophoresis with microsequencing and amino acid composition analysis: improvement of speed and sensitivity in protein characterization. Electrophoresis, 11, 554-561. Hood, D. (1999). Parasitology, 118, S3-S9. Joubert-Caron, R., Le Caer, J., Montandon, F., Poirier, F., Pontet, M., Imam, N., et al. (2000). Protein analysis by mass spectrometry and sequence database searching: a proteomic approach to identify human lymphoblastoid cell line proteins. Electrophoresis, 21(12), 2566-2575. Jungblut, P., Eckerskorn, C., Lottspeich, F., & Klose, J. (1990). Blotting efficiency investigated by using two-dimensional electrophoresis, hydrophobic membranes and proteins from different sources. Electrophoresis, 11(7), 581-588. Kaji, H., Tsuji, T., Mawuenyega, K., Wakamiya, A., Taoka, M., & Isobe, T. (2000). Profiling of Caenorhabditis elegans proteins using two-dimensional gel electrophoresis and matrix assisted laser desorption/ionization-time of flight-mass spectrometry. Electrophoresis, 21(9), 1755-1765. Karas, M., & Hillenkamp, F. (1988). Laser desorption ionization of proteins with molecular masses exceeding 10,000 daltons. Anal Chem, 60(20), 2299-2301. Laemmli, U. K. (1970). Cleavage of structural proteins during the assembly of the head of bacteriophage T4. Nature, 227(259), 680-685. Mozdzanowski, J., & Speicher, D. (1992). Microsequence analysis of electroblotted proteins. I. Comparison of electroblotting recoveries using different types of PVDF membranes. Anal Biochem, 207(1), 11-18. Neumann, H., & Mullner, S. (1998). Two replica blotting methods for fast immunological analysis of common proteins in two-dimentional electrophoresis. Electrophoresis, 19, 752-757. Raymackers, J., Daniels, A., De Brabandere, V., Missiaen, C., Dauwe, M., Verhaert, P., et al. (2000). Identification of two-dimensionally separated human cerebrospinal fluid proteins by N-terminal sequencing, matrix-assisted laser desorption/ionization--mass spectrometry, nanoliquid chromatography-electrospray ionization-time off flight-mass spectrometry, and tandem mass spectrometry. Electrophoresis, 21(11), 2266-2283.
150
BIENVENUT ET AL.
Sanchez, J., & Hochstrasser, D. (1998). In A. Link (Ed.), Method in molecular biology: 2-d proteome analysis protocoll (Vol. 112, pp. 227-233). Totowa, NJ: Humana press. The C elegans Sequencing Consortium. (1998). Science, 282, 2012-2018. Traini, M., Gooley, A. A., Ou, K., Wilkins, M. R., Tonella, L., Sanchez, J.-C., et al. (1998). Towards an automated approach for protein identification in proteome projects. Electrophoresis, 19, 1941-1949. Venter, J. C., Adams, M. D., Myers, E. W., Li, P. W., Mural, R. J., Sutton, G. G., et al. (2001). The sequence of the human genome. Science, 291(5507), 1304-1351. Yamashita, M., & Fenn, J. (1984). Phys. Chem., 88, 4451-4459.
CHAPTER 4 SIGNAL TRAITMENT AND VIRTUAL IMAGES PRODUCTION (1/2) A Molecular Scanner to Highly Automate Proteomic Research and to Display Proteome Images Reproduced with permission of ( Binz, Muller et al., 1999). Copyright (1999) American Chemical Society
PA. Binz, M. Muller, D. Walther, WV. Bienvenut, R. Gras, C. Hoogland, G. Bouchet, E. Gasteiger, R. Fabbretti, S. Gay, P. Palagi, MR. Wilkins, V. Rouge, L. Tonella, S. Paesano, G. Rossellat, A. Karmime, A. Bairoch, JC. Sanchez, RD. Appel, DF Hochstrasser
ABSTRACT Identification and characterization of all proteins expressed by a genome in biological samples represent major challenges in proteomics. Today's commonly used high throughput approaches combine two-dimensional electrophoresis (2-DE) with peptide mass fingerprinting (PMF) analysis. Although automation is often possible, a number of limitations still adversely affect the rate of protein identification and annotation in 2-DE databases: the sequential excision process of pieces of gel containing protein; the enzymatic digestion step; the interpretation of mass spectra (reliability of identifications), and the manual updating of 2-DE databases. We present a highly automated method that generates a fully annotated 2DE map. Using a parallel process, all proteins of a 2-DE are first simultaneously digested proteolytically and electro-transferred onto a polyvinylidene difluoride (PVDF) membrane. The membrane is then directly scanned by MALDI-TOF MS. After automated protein identification from the obtained peptide mass fingerprints using PeptIdent software (http://www.expasy.ch/tools/peptident.html), a fully annotated 2-D map is created online. It is a multi-dimensional representation of a proteome, that contains interpreted PMF data in addition to protein identification results. This “MS-imaging” method represents a major step towards the development of a clinical molecular scanner.
151 W. V. Bienvenut (ed.), Acceleration and Improvement of Protein Identification by Mass Spectrometry, 151–168. © 2005 Springer. Printed in the Netherlands.
152
BINZ ET AL.
KEYWORDS Molecular scanner, high throughput analysis, parallel protein digestion, bioinformatics, MALDI-TOF MS, imaging, peptide mass fingerprinting, database searching, proteome, DPD, OSDT
SIGNAL TREATMENT AND VIRTUAL IMAGINGT (1/2)
153
1. INTRODUCTION Today’s genome sequencing projects provide a huge amount of information in the form of nucleotide sequences that are being stored in specific databases. In a first approach this information can be interpreted in order to obtain the coded amino acid (AA) sequences of all potentially expressed proteins. In their active forms proteins often differ from the predicted AA sequence as they can be processed or carry posttranslational modifications. Most of these modifications are not predictable from gene sequences. In fact, a single gene sequence may give rise to more than ten structurally different proteins (Wilkins, Sanchez, Williams, & Hochstrasser, 1996). As an example D-1-antitrypsin is known to exist in at least 22 different forms in the human plasma master image in the SWISS-2DPAGE database (Hoogland et al., 2000). This yields by extrapolation between 500'000 to one million different protein forms expressed in Human. The description of a proteome (Wilkins et al., 1995), involving the identification of all proteins contained in a biological sample, therefore represents a real experimental challenge. Methods involving high-resolution protein separation, parallelisation of sample preparation, automation of experimental processes and of database comparison, as well as powerful and specific visualization tools need to be developed and integrated (Hochstrasser, 1998; Williams & Hochstrasser, 1997). Identifying a protein from a complex biological sample requires at least three steps. The protein is first isolated. Then very specific experimental attributes, such as peptide mass fingerprinting (PMF) or partial amino acid sequences are determined. In the third step identification is attempted by matching these attributes with those computed for all entries in a protein sequence database. The 2-DE technique is a method of choice to separate with high resolution a large number of proteins in one single procedure, particularly when narrow range pH gradients are used (Sanchez & Hochstrasser, 1998; Scheler et al., 1998). It provides a graphical representation of a proteome, where each protein form present in the so called 2-DE map is represented by a spot or a series of spots and can be described by a pI, an apparent molecular weight and an intensity-related value. Among different methods used routinely, the PMF approach is generally accepted to be currently by far the most effective and rapid way to identify proteins from a 2-DE gel. In this method, proteins are excised and proteolytically digested from protein spots. The resulting peptides are measured by mass spectrometry and then matched against a database of theoretical peptide mass fingerprints deduced from protein sequences. A score is calculated which represents the similarity between the experimental and the theoretical peptide masses. In principle the protein with the highest score should result in a correct identification. Various problems emerge with regard to the analysis of very complex biological samples such as human tissue. How can we attain to a reasonable throughput when
154
BINZ ET AL.
performing a proteolytic digestion of all proteins after 2-DE? How can we reduce the number of manipulations required for the sample preparation before MS measurement? How can we simultaneously reduce the sizes of the samples and therefore increase their number? How can we handle the huge amount of experimental data and represent the result in a simple and comprehensive way? A number of solutions have been proposed to answer these questions. Various approaches have been described to automate and accelerate the method. Traini et.al. (Traini et al., 1998b) have proposed the use of a prototype robotics system to image and to excise a few hundred spots from a stained polyvinilidene difluoride (PVDF) blot. The protein samples were then enzymatically digested with an automated liquid handling system. The mass spectra of the peptide mass fingerprints were acquired using MALDI-TOF MS in automated mode. Proteins were identified using an automated interrogation software. Even though this approach is automated, the time consuming digestion process is partially sequential and involves expensive sample handling, due to material costs. In addition, since the size of a sample is limited by the size of the excised spot, problems occur when overlapping spots are present on a gel. In order to reduce sample handling and to decrease the analyzed sample size to that of the MALDI-TOF MS laser beam impact (a spot of a few tens of Pm in diamater), gels or membranes containing peptides or proteins have been used for direct MALDI-TOF MS measurements. Ogorzalek Loo et al. (Ogorzalek Loo et al., 1997b) have measured protein masses directly from thin layer isoelectrofocusing gels. Various types of membranes were also used as sample support for peptide or protein mass determinations, such as polyethylene (Blackledge & Alexander, 1995), non-porous polyurethane (McComb et al., 1998; McComb et al., 1997)10,11, PVDF (Immobilon PSQ or Trans-Blot) (Eckerskorn et al., 1997; Fabris et al., 1995; Schreiner et al., 1996; Vestling & Fenselau, 1994)or charged membrane Immobilon CD (Schreiner et al., 1996). Use of these sample supports allows the MS instrument to measure spectra separated by distances in the micrometer range. This opens the possibility to scan such a surface and create intensity images using the intensities of the MS signals, and therefore to localize single peptides or proteins (Bienvenut et al., 1999; Caprioli, Farmer, & Gile, 1997; Eckerskorn et al., 1997). In order to further increase the throughput of protein identification and to offer a flexible and powerful proteomic visualization tool, we designed a highly automated method that can create a fully annotated 2-D map starting from a 2-DE. This technology is called “molecular scanner”. It combines parallel methods for protein digestion and electro-transfers (using the one-step digestion-transfer (OSDT) or the double parallel digestion (DPD) techniques as described by Bienvenut et al. (Bienvenut et al., 1999)) with peptide mass fingerprinting approaches to identify proteins directly from PVDF membranes, the surface of which is scanned with MALDI-TOF MS. Using a set of dedicated tools it allows to create, analyze and visualize a proteome as a multi-dimensional image. This provides the technological
SIGNAL TREATMENT AND VIRTUAL IMAGINGT (1/2)
155
basis for the development of a clinical molecular scanner, which will be adapted and dedicated to medical diagnostics (Hochtrasser et al., 1991). 2. EXPERIMENTAL SECTION 2.1. Materials and reagents IAV-trypsin membranes were prepared as described in Bienvenut et al. (Bienvenut et al., 1999). Low range SDS-PAGE standards and Trans-Blot® PVDF membrane were purchased from Bio-Rad (Richmond, CA, USA). Trifluoroacetic acid (TFA), D-cyano-4-hydroxy-trans-cinnamic acid (ACCA) was purchased from SIGMA (StLouis, MO, USA). Acetonitrile (AcCN) HPLC grade was purchased from Flucka (Buchs, Switzerland). Methanol puriss pa was purchased from Merck (Darmstadt, Germany). High vacuum grease was purchased from Labofur GmbH (Bern, Switzerland). 2.2. Description of the method The method can be divided into 4 main sections (Figure 1). A) Separation and digestion of the proteins. One-dimensional separation of SDS-PAGE standards and mini 2-DE of human plasma were performed according to Laemmli (Laemmli, 1970) and Sanchez (Sanchez & Hochstrasser, 1998), respectively. All proteins were proteolytically digested with trypsin and electro-blotted onto a PVDF membrane, using OSDT parallel process as described by Bienvenut et.al. (Bienvenut et al., 1999) (Figure 1A). The collecting PVDF membrane thus contained sets of digestion products of all proteins, each of them localized at discrete positions on the surface. IAV-trypsin membrane was prepared as described in Bienvenut et.al. (Bienvenut et al., 1999). Where needed PVDF membranes were stained with amido-black after OSDT. B) Acquisition of the peptide mass fingerprinting data. Matrix solution made of 5 mg/ml ACCA in 50% AcCN, 0.1% TFA or of 10 mg/ml ACCA in 70% MeOH was sprayed on the PVDF membrane until the membrane became wet. After air-drying the membrane was stuck on a modified MALDI sample plate using high vacuum grease. The stainless steel surface of the MALDI MS sample plate was flattened to allow the deposition of a 4 x 4 cm2 PVDF membrane. An array of positions was
D D
W here are the m asses x, y located ? Plot as M S intensity Sm ooth the im age
Show identified proteins
E lectrotransfer under alternative electric field
x => pI
(x a , ya): SW ISS_PR OT P54001 (x b , yb ): SW ISS_PRO T P22323 (x c , yc): not found ...
Set of M S
x a, ( x b ,y
B
Identification in SW IS S -P R O T / TrEM B L
PeptIden
(xa ,y a ) -> pIa , M w a, other user defined param eters
C
(x a ,y a): {m a1 ; Ia1 }, {m a2 ;Ia2 }, ... (x b ,y b ): {m b1 ; Ib1 }, {m b2 ;Ib2 }, ... ...
P eptide M ass Fingerprints: m ass data + M S intensity data
A utom atic peak detection and calibration
TO F-detector
M DPNK
+ +
laser
CSTW HFR
Set of identification data
y => Mr
M em brane + m atrix solution sprayed on m odified M A LD I plate
Figure 1: Scheme of the molecular scanner. A) Parallel digestion and simultaneous electro-transfer of proteins from a 2-D PAGE using the DPD/OSDT method 16. B) MALDI-TOF MS scanning of PVDF collecting membrane after spraying with matrix solution. (xi, yj) refers to the position where MS spectra were measured on the PVDF membrane. C) Identification procedure. The peak detection and mass calibration yields sets of PMF. The MS signal measured at each (xi, yj) coordinate is represented by its m/z value mix and its MS intensity Iix. The xi and yi values are interpreted as pI and Mr values. The PMF data are submitted to PeptIdent. Identification results are collected together with the PMF data. D) A visualization tool allows to represent the analyzed data in different forms. Three examples of typical queries and representations are described here. (D1) An MS intensity image can be created, that contains the identification data as database labels. It is generated in a Melanie readable format. (D2) Another option allows to search for a particular protein and to visualize it as an intensity plot. In this i plot, the intensity represents the number of masses identified to belong to the protein at each position. (D3) The program further allows to search for a set of redefined masses, and to generate an intensity image where the intensity represents the total intensity of the found mass peaks at each (x,y) position. This image can be smoothed if needed.
D
W here is protein x? Plot as num ber of identified m asses
P 22323
Annotated 2D -im age
P54001
D
Exam ples of typical queries
PV D F m em brane collecting the digested products
interface with im m obilised endopeptidase, i.e. IA V -trypsin
2D-gel containing the proteins
156
BINZ ET AL.
SIGNAL TREATMENT AND VIRTUAL IMAGINGT (1/2)
157
defined on the membrane. The membrane was then scanned by the MS, i.e. a mass spectrum was acquired at each position of the array (Figure 1B). The distance between separate MS acquisition on the grid was constant for a given experiment (ranging between 0.2 and 0.5 mm). Mass spectra were acquired on a VoyagerTM Elite MALDI-TOF mass spectrometer (Perseptive Biosystems, Framingham, MA, USA) equipped with a 337 nm nitrogen laser and a Delayed Extraction device. The accelerating voltage was 18 kV, a delayed extraction parameter of 140 ns was selected and the m/z value selected for the low mass gate was generally 850. Laser power was set about 20% above threshold. The diameter of the laser beam on the membrane was about 100 m. Between 40 and 100 spectra were accumulated, depending on the amount of material analyzed. The set of coordinates of the laser shots on the sample plate and the naming of the MS files was controlled by special software to overcome limitations in the maximum number of coordinates and spectra allowed by the Voyager 4.03 acquisition software. C) Processing of the MS data and protein identification. A flexible and interactive tool was developed to automatically treat all MS data consecutively and to perform the various steps of the analysis, starting with peak detection and calibration (Figure 1C). The positions on the sample plate were converted to apparent molecular mass (M Mr) and pI values. The PMF data of all spectra, together with the calculated pI and Mr and other user defined parameters (such as mass tolerance, chemical modifications considered, species taken into account, etc.), were automatically sent over Internet for protein identification to PeptIdent, a PMF identification tool developed in Geneva (Binz, Wilkins et al., 1999) and available at the ExPASy server (http://www.expasy.ch, (Appel, Bairoch, & Hochstrasser, 1994)). D) Analysis of the results: creation of virtual maps. The identification results of PeptIdent were represented as an annotated image. All outputs of PeptIdent were acquired and stored in a modified format. The program generated a first virtual, annotated “2-D-map”, a 3-D image where the x and y coordinates related to pI and Mr values, respectively. The z values were represented in gray scale and reflected the intensity of the MS spectra, as defined by the sum of the intensities of the MS signals in the considered MS spectra. The range of m/z values to be considered was predefined. The intensity scale was chosen linear or logarithmic and the image was smoothed in some cases. The image file was stored in a graphical format, which can 2.3. Detection, quantification and statistics be read by the Melanie 2-DE image analysis software package (Appel, Palagi et al., 1997; Appel, Vargas et al., 1997). The image also contained the identification results, which can be highlighted as labels in Melanie
158
BINZ ET AL. (Figure 1D). The number of distinct attributes contained in the image reflects the number of dimension the image virtually contains. These are: pI, Mr, identification labels (SWISS-PROT or TrEMBL AC numbers, ID labels), peptide masses and MS intensities. Then for all potentially identified proteins, the annotations from PeptIdent (number of missed cleavages, annotated modifications, chemical modifications of Cys and Met residues, peptide sequences) are also available.
From all the data contained in this multi-dimensional image the user can choose to filter and visualize only particular aspects (Fig 1D). Proteins or peptides can be searched on the image by filtering part of the total information. Thus, a protein can be visualized by the positions where it has been identified. The z intensity can be a binary (black / white for present / absent, respectively) or a gray level. The darkness represents then either the number of peptides found to match the protein in the identification process using PeptIdent, or the sum of the MS intensities of the peptide masses matching the queried protein. Instead of searching for a protein, the user can specify and visualize a set of peptide masses. In this case, the image intensity scale can be defined from the number or the MS intensities of the masses detected out of the chosen list. (Figure 1D). 3. RESULTS AND DISCUSSION 3.1. Representation of the analysis of a 1-dimensional scan of 1-DE In order to set up the various experimental parameters of the method, we have performed a number of analyses on a protein mixture of molecular weight standards separated on SDS PAGE and treated by the DPD or OSDT method. The selected collecting membrane was PVDF. The membranes were initially stained with amido black to visualize the positions of the peptide fingerprinting bands (Figure 2A). Matrix solution was sprayed on the whole surface of the membrane. About 1.5 ml was used to spray a 4.4 x 0.5 cm PVDF membrane. The volume of matrix solution effectively deposited on the membrane was estimated to be 1-2 Pl/mm2. After air drying, the membrane was scanned in one dimension with MALDI-TOF MS. The summed intensity of the detected MS signals, for a given mass range, was plotted against the axis coordinate along the membrane (Figure 2B). The intensity of the MS spectra obtained from the stained membrane varied along the scanning axis. The positions of the 4 maximum intensities on the MS profiles correlated with the positions of the 4 stained bands. The MS profiles revealed distinctly resolved bands, thus suggesting a conserved separation of the peptide fingerprints during DPD or OSDT step and during matrix deposition. The peptide containing areas are separated by blank areas, showing no MS intensity (position 2 in Figure 2A and MS spectrum in Figure 2D). No significant broadening of the band was observed in comparison to the corresponding undigested electro-transfered stained protein bands. This would suggest that the
SIGNAL TREATMENT AND VIRTUAL IMAGINGT (1/2)
159
peptides are not diffusing significantly during the digestion, during the transfer process and on the membrane upon matrix apposition. From this membrane, protein identification was performed from each MS spectrum at maximum of MS intensity (Figure 2D). All 4 proteins (position 1,3,4,5 in Figure 2A) could be unambiguously identified in SWISS-PROT using PeptIdent. Different matrix solvents and apposition techniques of the matrix solution were compared. As an example, the methanol containing matrix solution wetted the surface of the PVDF in a much more homogeneous manner than the acetonitrile containing solution. More intense MS spectra and more homogeneous MS profiles were obtained with the methanolic matrix solution (data not shown). A)
1)
33)
4) 55)
D) 1)
+
CAH2_BOVIN 2
2)) B
C o 3) u n t 4) s
C
Scanning axis [x 10-4 inch]
1000
Figure 2. Result of a one-dimensional scan. Low range standard proteins were separated with 1-DE and then treated with the OSDT procedure. The one-dimensional MS scan was performed on an amido black stained PVDF membrane along the longitudinal dashed line of the PVDF image A. The plots B and C are MS intensity profiles. They represent the intensity of MS signals (number of Counts from the MALDI-TOF MS detector) as a function of the position on the membrane (x unit is 10-4 inch). In the lined plot only m/z values bigger than 1100 were considered. The intensities due to the two internal standards at mass 1498.82 +/- 1 and 2095.08 +/- 1 Da were excluded. MS spectra measured at the intensity maxima of the plot (positions 1), 3), 4), 5) in A and in the background (position 2) in A are shown in D. From these spectra, the four standard proteins could be identified with PeptIdent as labeled with their SWISS-PROT ID identifiers on the respective PMF MS spectra. The plot C is made of single ion intensity profiles. The selected ions were chosen from the set of peptides specifically matching for one of the four identified proteins. Values of 2198.2 (+), 1774.0 (*), 1440.0 (o) and 1426.7 (x) m/z were considered with a window of +/- 1 Da. Matrix solution was 10mg/ml ACCA in 70% methanol. 110 spectra have been accumulated 100 times on a total scanning length of 4.4cm.
160
BINZ ET AL.
3.2. Representation of the analysis of a two-dimensional scan from a single band of 1-DE Similarly, Figure 3 shows the result of a 2-D scan and its interpretation performed on a single protein band. Low range SDS standards were separated on 1-DE, processed with the OSDT method, and the PVDF was stained with amido black. A 0.8 x 0.6 cm2 piece of membrane containing the digested soybean trypsin inhibitor was scanned with a resolution of 0.5 mm. The amido black stained image of the band after OSDT on a PVDF membrane (Figure 3B) was compared with a MS intensity image calculated from all MS spectra (Figure 3C and Figure 3D), where only m/z values higher than 1100 Da were considered (there are disturbing matrix
A) M inte
B)
C)
Figure 3: Two-dimensional MS scan of 1-DE: The soybean trypsin inhibitor band. From the same membrane as in Figure 2 a piece of 1.1 x 0.9 cm2 was cut around the soybean trypsin inhibitor band and sprayed with a 10 mg/ml ACCA solution in 70% methanol. An array of 16 x 12 points was defined around the center of the band, with distance between spots of 500 Pm. A) 3-D MS intensity profile. All m/z higher than 1100 Da were considered to create the smoothed image. B) amido black stained image. C) MS intensity image. D) MS intensity image, plotted in a logarithmic scale. The white dots represent the positions where the ITRA_SOYBN was unambiguously identified with a minimum of 5 m/z matching values with PeptIdent
SIGNAL TREATMENT AND VIRTUAL IMAGINGT (1/2)
161
signals below 1100 Da). A 3-D plot (Figure 3A) representing the absolute intensities as function of the x,y position on the membrane was created with Matlab 5.2 (MathWorks, Inc., 24 Prime Parkway, Natick Massachussets, USA), and a MS intensity image was calculated under the same conditions (Figure 3C). The MS intensity profile followed relatively smoothed curves, suggesting a relatively homogeneous quality of the matrix crystallization. This approach highlighted the possibility to create intensity profiles with MS intensity values and to describe spot areas. Addition of a matrix solution did not seem to lead any significant diffusion of peptides on the matrix surface. The sensitivity of such an MS-staining was equal or better than that of amido black staining of the transferred protein. The intensity images obtained from these values show that the experimental noise, which may be due to inhomogeneity of the matrix crystallization quality on the membrane or to the low doping of the matrix by the analytes was limited. From the 192 acquired spectra the soybean trypsin inhibitor (SWISS-PROT ID: ITRA_SOYBN, AC: P01070) could be unambiguously identified 31 times (white dots in Figure 3D) with a minimum of 5 matched m/z values, representing a contiguous region in the center of the 2-D image. 3.3 Identification by two-dimensional scan of human plasma proteins separated by 2-DE From a human plasma sample separated on a mini-2-DE and transferred using the OSDT method, we have cut a section of the collecting PVDF membrane around the amido black stained spot of the apolipoprotein-A1 (SWISS-PROT AC P02647). The smoothed MS intensity image is shown in Figure 4B. From the 195 MS files measured from pixels at 380 Pm steps, 77 yielded the apolipoprotein-A1 as the identified protein with a minimum of 5 peptide masses matched. The image shows that MS intensities are detected around the area corresponding to the amido black stained visible surface. The observed signal corresponded to the most intense peaks of the spectra from which apolipoprotein A1 was identified, thus suggesting that “MS staining” is more sensitive than the amido black staining method. Various representations from a more complex protein mixture are shown in Figure 4. Another section of the amido black stained PVDF membrane, obtained after OSDT treatment of a mini-2-DE gel from a human plasma sample, was also scanned with the MALDI-TOF MS. Its size was about 1.83 x 0.37 cm2 (Figure 4A and Figure 4C). A total of 890 spectra were measured at 400 Pm steps. The chosen section of the scanned PVDF membrane contained a set of overlapping spots and trains of spots, as deduced from the known repartition of identified proteins in the human plasma image in the SWISS-2DPAGE database (http://www.expasy.ch/cgi-bin/map2/big?PLASMA_HUMAN). It also contained contamination from the adjacent and very abundant albumin, centered above the upper right corner of the excised PVDF surface. In addition, a probably high number of proteins, whose sequences are unknown in databases, were also present in this sample. The MS intensity image reveals a continuous background of MS signals, represented by a
162
BINZ ET AL.
Figure 4. 2-D scan of a plasma mini-2-DE after OSDT. Possibilities to extract proteomic information from a 2-D MS scan. 250 Pg of human plasma were separated on mini-2DE. Proteins were digested and transferred on PVDF using the OSDT procedure. A) Image of the membrane stained with amido black. B) Smoothed MS intensity image of the region where the apolypoprotein-A1 (SWISS-PROT AC P02647) was identified. The circle indicates the size of the spot visible by amido black staining. This shows that MS intensities are still detected where amido black staining is blank. C) Enlargement of the image A), showing the positions corresponding to the proteins identified on the area. The labels are the SWISS-2DPAGE ID names: AACT_HUMAN (alpha-antichymotrypsin, AC: P01011), VTDB_HUMAN (vitamin D binding protein, AC: P02774), ALBU_HUMAN (serum albumin, AC: P02768), A1AT_HUMAN (alpha-1-antitrypsin, AC: P01009), FIBG_HUMAN (fibrinogen gamma Dchain, AC: P02679), IGHA_HUMAN (immunoglobulin D-chain, AC: P99002). D) Raw MS intensity image of the region zoomed from the amido black image. E) Same image, but smoothed. F) In this MS image, the intensity is related to the number of MS signals used to identify the query protein, i.e. AACT_HUMAN (SWISS-PROT AC P01011). One of the MS spectra of the spot is also shown. G) Smoothed MS image, where the intensity represents the summed intensity of the MS signals used to identify VTDB_HUMAN (SWISS-PROT AC P02774) at each pixel. One of the MS spectra allowing the identification of the protein is shown. H) MS intensity image of the same region, where only MS signals belonging to Immunoglobulin D chain (IGHA_HUMAN) peptides are considered.
grey background (Figure 4D, Figure 4E). This suggests that a lot of peptide material is measured on the whole surface, and that the protein spots are not isolated entities. The analysis tool allowed, however, to filter this complex feature and gave the possibility to extract spots corresponding to single proteins. Protein spots can therefore be isolated from chemical noise. As examples Figure 4F) and Figure 4G) show two different regions of the image from which the alpha-1-antichymotrypsin
SIGNAL TREATMENT AND VIRTUAL IMAGINGT (1/2)
163
(SWISS-PROT ID: AACT_HUMAN, AC: P01011) and the vitamin D binding protein (SWISS-PROT ID: VTDB_HUMAN, AC: P02774) was identified and visualized as two isolated spots, respectively. The figure shows two possibilities of representing specific “intensity values”. In the first possibility (the AACT_HUMAN spot in Figure 4F), the intensity of each pixel is proportional to the absolute number of peptides identified for a given protein. In Figure 4F, the protein is AACT_HUMAN. This intensity is somehow related to the confidence of identification. In the second possibility, i.e. the VTDB_HUMAN spot in Figure 4G, the intensity is proportional to the sum of the MS intensities of the peptides peaks identified for a given protein. In Figure 4G, the protein is VTDB_HUMAN. The intensity was then smoothed. Here the intensity is more related to the protein concentration than the previous one. Therefore we have graphically extracted the contribution of the two proteins from the total MS intensity image shown in Figure 4C and Figure 4D. Some proteins are highly abundant, and are present in multiple forms, such as the immunoglobulin alpha chain. They are detected on a large part of the area, thus yielding chemical noise for other proteins (Figure 4H). A number of proteins were clearly identified from this sample, and their relative positions on the membrane correlated with those identified in the human plasma master gel in the SWISS-2DPAGE database (Figure 4C). 4. DISCUSSION The technique presented, known as a molecular scanner, provides a powerful tool for proteomics research. Firstly, it is a high throughput method dedicated to protein identification using peptide mass fingerprinting or other methods in the future and applied to the entire 2-DE. It uses a parallel method of protein digestion. Thus, in one experimental step, thousands of proteins can be chemically processed or digested simultaneously, under identical experimental conditions. The obtained sample can be directly used for MS measurements. This method limits losses of material caused by sample manipulation. The size of each MS sample is reduced to the size of the laser beam used in the MALDI-TOF MS, i.e. about 10-2 mm2. A single protein 2-DE spot can therefore be represented by more than 100 spectra. Secondly, the PMF analysis is fully automated, and can be modularly modified at any step, i.e. choice of the peak detection algorithm, of the calibration procedure, of the masses considered for identification, of the arguments sent to PeptIdent and of the image representation. The molecular scanner provides virtual images which can be considered as graphical projections of an automatically generated proteomic database. The database can be searched by protein identifiers (i.e. protein ‘name’), or by massrelated identification results. The user can choose to visualize a single protein by searching the positions where the protein has been identified. As the position of a set of masses can be searched, a protein can be visualized as a function of the number and/or the intensity of MS signals matched by Peptident for this protein. Where a protein yields a train of spots on a 2-DE gel, the spot corresponding to one particular
164
BINZ ET AL.
form of the protein can be isolated by searching a specific peptide mass in the spectra. This allows the systematic analysis of post translational modifications. In this respect, all Peptident results could be used as input data for a characterization step using FindMod (Wilkins et al., 1999). FindMod is a tool which interprets unannotated MS signals for a given protein and PMF data. It looks, by mass difference, for the occurrence of post translational modifications using a set of intelligent rules as well as for potential amino acid substitutions. It can therefore be systematically linked to PeptIdent, i.e. after the identification step, it helps to further characterize and discriminate all spots of a train. In the future different potential post-translational modifications will be automatically highlighted in various colors on the image obtained by this scanner. The high resolution obtained by the MS scanning becomes particularly useful when overlapping spots occur. This can be interpreted as a mixture of proteins. Reconstitution of intensity envelopes from peptide mass fingerprinting allows to discriminate the two or more overlapping spots. Then one or the other spots can be visualized by choosing the peptide masses specific to this particular protein form in order to create an image or they can be represented by different coloring systems. As an additional feature, neither the gel nor the PVDF membrane need to be chemically stained. The MS intensity acts as a ‘coloring’ agent. Since spots can be localized, the image can therefore be compared, aligned and matched with other gel images or PVDF image stained with conventional methods. As for chemical staining methods, the intensity of the MS signals are neither proportional to the amount of protein loaded, nor to the amount of amino acid contained in the different spots. This relies on the desorption process and on the ionisation yields. Thus, the intensity of the MS signals only partially correlates with the intensity of an amido-black staining (see Figure 3 of the accompanying paper (18)) or with the absolute amount of material. In the 1-D scan (Figure 2) the SDS gel was loaded with 1 Pg of each protein. Therefore no estimation of protein amount can be deduced from a single MS image. However, comparative studies may be performed between several MS images in cases where identical spots are compared. The illustrated experiments gave a preliminary idea of the sensitivity of the method. In 1D experiments we have loaded 1Pg of each molecular weight standards, which corresponds to about 10 to 33pmol of proteins (see Figure 2). The size of the bands, visible on the control PVDF membrane, i.e. membrane obtained without protein digestion, and stained with Amido Black, were of about 15 mm2. All proteins could be identified very clearly. The sensitivity was here of 66ng/mm2, respectively of 0.66 to 2 pmol/mm2. As the area of a protein spot on a mini-2D gel covers about this size, one could extrapolate that the detection limit for a clear identification lays around the low picomole range today, with no optimization of the method. Another experiment was performed by loading 0.2Pg of each molecular weight standards. All proteins could be identified, but on a fewer number of pixels (not shown). Moreover, one single MS measurement covers about 10-2 mm2. Although the efficiency of peptide extraction, co-cristallisation and ionization processes are not known, every identification was performed on about 10 fmol of
SIGNAL TREATMENT AND VIRTUAL IMAGINGT (1/2)
165
initial protein. Note that these calculations are to give a rough estimate of the current sensitivity of the identification process. It is also to be noticed that spectra measured at positions neighboring pixels of identified proteins still contain detectable peaks. The sensitivity of peptide peak detection is therefore higher than the sensitivity of protein identification. An interesting observation has to be mentioned. When comparing spectra measured after an in-gel digestion and after a membrane scan, differences are noticeable in term of peptides detected. Even if most of the mass signals are present in both cases, the intensity of the signals can differ strongly. Some signals are present in the in-gel spectrum and absent in the scanned spectrum. Unfortunately, there is no evident relationship between the presence and intensity of a signal and a physico-chemical property of the corresponding peptides. However, most of the peptides detected with the two methods are perfectly digested peptides, and generally covers similar sequence percentage. More detailed studies will have to be undertaken. As MS information represents an additional specific property for a spot, two images containing MS information could be aligned by matching their mass spectra and/or their resulting identifications in the Melanie 2-DE image analysis software. This procedure could replace and/or confirm manual and software based alignment of matched gel images. The development of the molecular scanner required us to develop and to integrate high throughput methods for sample preparation and analysis. Specific bioinformatics tools had to be created as well. The molecular scanner was designed as a set of interconnecting modules, which can be exchanged and modified in a very flexible manner. It can therefore easily be adapted for improvements and modifications. The current bottleneck of this technique is the time necessary to scan the membrane with the mass spectrometer. Without optimization, the MS scanning time of a 4 x 4 cm2 surface is about 55 hours at a 0.4 mm resolution with 64 laser shots per position. This means that a full 16x16 cm2 membrane would require, in the same conditions, more than 36 days of continuous measurements and about 40Gb of memory to archive the raw data. As people tend to stretch the pI axis using narrow pH gradient strips in the first dimension, this would increase the separation power of the protein spots, but increase the measurement time needed. In order to accelerate the acquisition rate of the MS spectra, limited currently by the 3Hz frequency of the laser and by a fixed number of laser shots per pixel, one should at least be able to software-control the number of required laser shots, i.e. to skip acquisition when spectra are empty or where the signal to noise ratio is above a given threshold. This may gain a factor 2 to 5. Due to ion statistics, it is difficult to reduce drastically the number of laser shoots per pixel. As the detector is inactive at least 99% of the time, the acquisition frequency should be increased, either by a increase of the laser repetition rate, or by the use of multiple lasers at neighboring positions on the membrane. As time is required to allow relaxation of the crystals between two laser shoots, there is a physical limitation of the pulse rate alone. As the specificity of a protein identification strongly depends on the mass accuracy, efforts can be also
166
BINZ ET AL.
focused in the comparison of mass patterns in neighboring pixels. Finally, as MS technology develops, we anticipate that the full scan of a 10 x 10 cm2 mini-2-DE gel will be performed in a few hours. 5. CONCLUSION In medicine, the development of computer assisted tomography methods allowed to visualize the complexity of the human body as a volume of anatomically related organs and tissues. The cellular components of a tissue can today be described using immunohistology and immunocytology. There is an obvious need to describe the protein content of a cell or of a biological fluid. The molecular scanner allows to analyze many proteins in such a complex system. It reports, at the molecular dimension, the complexity of protein content. The presentation of a proteome as a searchable database, which can be visualized as user defined 3-D images, provides a powerful tool for comparative analysis in proteomics. The method, initially starting from 2-DE separation of proteins, can be adapted to other fields such as protein chips or other multidimensional separation methods. It can also be applied in clinical diagnostics where modifications occurring to proteins, i.e. mutations, changes in post-translational modifications have to be monitored. These changes may be observed as changes in the PMF patterns, although they may not influence the migration of the protein itself. In addition to the presented approach, high throughput MS/MS sequencing methods 25,26 or chip technology could represent complementary features. They are yielding additional information and provide huge amount of data to be analyzed through visualization methods, such as the one proposed here. Finally, this technique allows to be combined with additional types of analysis. The same surface can be reused for new analysis, such as an MS scan under different conditions, or with another laser, i.e. an IR-laser 13. In the case of particularly interesting spots, one can use the known coordinates of the location of the spot to perform additional chemistry on this particular area. The spot of interest can similarly be cut using a dedicated excision system to be submitted to further analyses, such as MS/MS. The molecular scanner is therefore a tool which can be fully integrated in any more general proteomics analysis process. 6. ACKNOWLEDGEMENT We are deeply grateful to the Helmut Horten Foundation for its financial support. This work was also supported by the Swiss National Fund for Scientific Research (grants 32-49314.96 and 31-52974.97) and the Montus Foundation. We are also very thankful to Dr. Keith Rose for his technical and critical help.
SIGNAL TREATMENT AND VIRTUAL IMAGINGT (1/2)
167
7. REFERENCES Appel, R., Bairoch, A., & Hochstrasser, D. (1994). Trends Biochem. Sci., 19, 258-260. Appel, R., Palagi, P., Walther, D., Vargas, J., Sanchez, J., Ravier, F., et al. (1997). Melanie II--a thirdgeneration software package for analysis of two-dimensional electrophoresis images: I. Features and user interface. Electrophoresis, 18(15), 2724-2734. Appel, R., Vargas, J., Palagi, P., Walther, D., & Hochstrasser, D. (1997). Melanie II--a third-generation software package for analysis of two-dimensional electrophoresis images: II. Algorithms. Electrophoresis, 18(15), 2735-2748. Bienvenut, W., Sanchez, J., Karmime, A., Rouge, V., Rose, K., Binz, P., et al. (1999). Toward a clinical molecular scanner for proteome research: parallel protein chemical processing before and during western blot. Anal Chem, 71(21), 4800-4807. Binz, P., Muller, M., Walther, D., Bienvenut, W., Gras, R., Hoogland, C., et al. (1999). A molecular scanner to automate proteomic research and to display proteome images. Anal Chem, 71(21), 49814988. Binz, P., Wilkins, M., Gasteiger, E., Bairoch, A., Appel, R., & Hochstrasser, D. (1999). In R. Kellner, F. Lottspeich & H. Meyer (Eds.), Microcharacterisation of proteins (2nd ed., pp. 277-300). Berlin: Wiley-VCH. Blackledge, J., & Alexander, A. (1995). Polyethylene membrane as a sample support for direct MALDI MS of high mass proteins. Anal. Chem., 67, 843-848. Caprioli, R., Farmer, T., & Gile, J. (1997). Molecular imaging of biological samples localization of peptides and proteins using MALDI-TOF-MS. Anal. Chem., 69, 4751-4760. Eckerskorn, C., Strupat, K., Schleuder, D., Hochstrasser, D., Sanchez, J., Lottspeich, F., et al. (1997). Analysis of proteins by direct scanning IR-MALDI-MS after 2-D PAGE separation and electroblotting. Anal. Chem., 69, 2888-2892. Fabris, D., Vestling, M., Cordero, M., Doroshenko, V., Cotter, R., & C, F. (1995). Rapid Commun Mass Spectrom., 9(11), 1051-1055. Hochstrasser, D. (1998). Proteome in perspective. Clin. Chem. Lab. Med., 36(11), 825-836. Hochtrasser, D., Appel, R., Vargas, R., Perrier, R., Vurlod, J., Ravier, F., et al. (1991). MD-Computing, 8, 85-91. Hoogland, C., Sanchez, J., Tonella, L., Binz, P., Bairoch, A., Hochstrasser, D., et al. (2000). 28(1), 286288. Laemmli, U. K. (1970). Cleavage of structural proteins during the assembly of the head of bacteriophage T4. Nature, 227(259), 680-685. McComb, M., Oleschuk, R., Chow, A., Ens, W., Standing, K., Perreault, H., et al. (1998). Characterization of hemoglobin variants by MALDI-TOF MS using a polyurethane membrane as the sample support. Anal Chem, 70(24), 300. McComb, M. E., Oleschuk, R. D., Manley, D. M., Donald, L., A.Chow, O'Neil, J. D. J., et al. (1997). Use of a non-porous polyurethane membrane as a sample support for MALDI-TOF MS of peptides and proteins. Rapid Commun. Mass Spectrom., 11, 1716-1722. Ogorzalek Loo, R., Mitchell, C., Stevenson, T., Martin, S., Hines, W., Juhasz, P., et al. (1997). Sensitivity and mass accuracy for proteins analyzed directly from polyacrylamide gels: implications for proteome mapping. Electrophoresis, 18(3-4), 382-390. Sanchez, J., & Hochstrasser, D. (1998). In A. Link (Ed.), Method in molecular biology: 2-d proteome analysis protocoll (Vol. 112, pp. 227-233). Totowa, NJ: Humana press. Scheler, C., Lamer, S., Pan, Z., Li, X., Salnikov, J., & Jungblut, P. (1998). Peptide mass fingerprint sequence coverage from differently stained proteins on 2-DE patterns by MALDI-MS. Electrophoresis, 19, 918-927. Schreiner, M., Strupat, K., Lottspeich, F., & Eckerskorn, C. (1996). UV-MALDI-MS of electroblotted proteins. Electrophoresis, 17, 954-961. Traini, M., Gooley, A. A., Ou, K., Wilkins, M. R., Tonella, L., Sanchez, J.-C., et al. (1998). Towards an automated approach for protein identification in proteome projects. Electrophoresis, 19, 1941-1949. Vestling, M., & Fenselau, C. (1994). PVDF: an interface for gel electrophoresis and MALDI-MS. Biochem. Soc. Trans., 22(2), 547-551.
168
BINZ ET AL.
Wilkins, M., & al., e. (1999). High throughput mass spectrometry discovery of protein post translational modification. J. Mol. Biol., 289, 645-657. Wilkins, M., Sanchez, J., Gooley, A., Appel, R., Humphery-Smith, J., Hochstrasser, D., et al. (1995). Progress with proteome projects: Why all proteins expressed by genome should be identified and how to do it. Biotechnology & genetic Engineering Reviews, 13, 19-50. Wilkins, M., Sanchez, J., Williams, K., & Hochstrasser, D. (1996). Current challenges and futures applications for protein maps and post-translational vector maps in proteome project. Electrophoresis, 17, 830-838. Williams, K., & Hochstrasser, D. (1997). In Proteome Research: New Frontiers in functional genomics (pp. 1-12). Berlin: Springer-Verlag.
CHAPTER 5 SIGNAL TRAITMENT AND VIRTUAL IMAGES PRODUCTION (2/2): Visualization and Analysis of Molecular Scanner Peptide Mass Spectra. (Muller et al., 2002)
Muller M, Gras R, Appel RD, Bienvenut WV, Hochstrasser DF
ABSTRACT The molecular scanner combines protein separation using gel electrophoresis with peptide mass fingerprinting (PMF) techniques to identify proteins in a highly automated manner. Proteins separated in a 2-dimensional polyacrylamide gel (2DPAGE) are digested ‘in parallel’ and transferred onto a membrane keeping their relative positions. The membrane is then sprayed with a matrix and inserted into a matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometer, which measures a peptide mass fingerprint at each site on the scanned grid. First, visualization of PMF data allows surveying all fingerprints at once and provides very useful information on the presence of chemical noise. Chemical noise is shown to be a potential source for erroneous identifications and is therefore purged from the mass fingerprints. Then, the correlation between neighboring spectra is used to recalibrate the peptide masses. Finally, a method that clusters peptide masses according to the similarity of the spatial distributions of their signal intensities is presented. This method allows discarding many of the false positives that usually go along with PMF identifications and allows identifying many weakly expressed proteins present in the gel.
169 W. V. Bienvenut (ed.), Acceleration and Improvement of Protein Identification by Mass Spectrometry, 169–188. © 2005 Springer. Printed in the Netherlands.
170
MULLER ET AL.
1. INTRODUCTION At present, as complete genomes for an increasing number of organisms are available, attention must be focused on proteins encoded by the genes. In contrast to the static genome, the proteome of an organism is a highly dynamic and connected network, and new analytical methods have to be developed in order to describe its spatial and temporal changes and interactions (Godovac-Zimmermann & Brown, 2001). An important step in this task is the high throughput identification of proteins, which nowadays mostly relies on efficient protein separation, mass spectrometry, protein sequence databases as well as bioinformatics (Bienvenut et al., 2001). One of the most important methods for protein separation is 2-dimensional polyacrylamide gel electrophoresis (2D-PAGE) (Bjellqvist et al., 1982). This technique allows separating simultaneously thousands of proteins according to their isoelectric point (pI) I and molecular weight (Mr) and displaying them on a twodimensional map. Mass spectrometry (MS) has become one of the most powerful techniques to identify organic molecules. Among various applications, peptide mass fingerprinting (PMF) is frequently used because, combined with matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometry (Karas & Hillenkamp, 1988; Tanaka et al., 1988), it provides a rapid and sensitive method for protein identification. PMF compares the list of experimental masses of peptides, the peptide mass fingerprint, obtained by specific endoproteolytic digestion of proteins with the theoretical mass values calculated by in silico digestion of protein sequences. A score valuates how well the theoretical masses match the fingerprint (Henzel et al., 1993; James et al., 1993; Mann, Höjrup, & Roepstorff, 1993; Pappin et al., 1993; Yates, III et al., 1993). Gras et al. (Gras et al., 1999) presented a PMF identification algorithm, which is based on a scoring schema that takes into account important parameters like mass accuracy, protein coverage by matching peptides, number of missed cleavage sites and the deviation of the measured pII and Mr values (if available) from theoretical predictions. In order to learn the weights of these parameters for the PMF identification score, a set of 91 PMF test spectra was used and optimal values of these weights were calculated by means of a genetic algorithm. Eriksson et al. (Eriksson, Chait, & Fenyo, 2000) investigated the influence of different experimental parameters on statistical thresholds used to discern false matches for two different scoring schemas. Since the experimental mass fingerprint can match the theoretical peptide masses of a protein by chance, there is always a certain probability for false identifications in PMF. There is a trade-off between sensitivity and specificity of a database search: if the search is too restrictive, it might miss some proteins (false negatives) and if it is not restrictive enough, it might find too many erroneous matches (false positives).
SIGNAL TREATMENT AND VIRTUAL IMAGINGT (1/2)
171
The precision of mass measurements certainly influences the sensitivity and specificity of PMF identification. Since the resolution of mass spectrometers has improved, calibration errors are now the limiting factor. These errors originate from uncertainties in the estimation of experimental parameters such as electric field strengths and initial ion velocities. Calibration of mass spectra is not a trivial problem even if internal standards are used. For TOF instruments, the function that relates the flight time with the M/Z value and the algorithm to calculate the calibration parameters have to be carefully chosen in order to get a good precision. Christian et al. (Christian, Arnold, & Reilly, 2000) described a method that is based on physical flight time equations (Juhasz, Vestal, & Martin, 1997; Vestal & Jushaz , 1998) and a simplex method to search for the optimal instrument parameters. This approach proved to be more robust than usual curve fitting methods, especially in the mass range where no standard masses were available. Several partially automated methods have been proposed to excise protein spots from a stained gel, to submit the excised material to endoproteolytic digestion and to extract peptides from the excised gel (Lopez, 2000). The peptides are then loaded onto a MALDI sample plate and introduced into a mass spectrometer for PMF acquisition (Traini et al., 1998b) . These methods have the inconvenience that the location of protein spots must be known prior to excision, and that the excision precision is limited (> 1mm). Recently, Binz et. al. (Binz, Muller et al., 1999) introduced a new and highly automated approach, dubbed the molecular scanner, which combines 2-D PAGE separation techniques with PMF methods. In this approach, the proteins were digested firstly in the gel itself and then during transfer onto a collecting polyvinylidene fluoride (PVDF) membrane (Bienvenut et al., 1999). This membrane was sprayed with a matrix solution (D-Cyano-4-hydroxy cinnamic acid), and the co-crystallisation of the matrix and the peptides allowed MALDI-MS analysis. Since diffusion in this process was not relevant, the location of the peptides on the PVDF membrane corresponded to the location of their proteins in the gel (Pacholski & Winograd, 1999). The membrane was then scanned by a MALDI-TOF mass spectrometer. For each scanned point the acquired peptide mass fingerprint was submitted to a PMF identification program, which returned a list of matching proteins. A threshold that was based on a statistical analysis of erroneous identifications was used to distinguish false identifications by their average identification score (Bienvenut et al., 2001). This method provided good results for the most abundant proteins, but it had difficulties to distinguish weakly expressed proteins from noise. A graphical display allowed visualising the matching proteins on a two-dimensional map. High throughput methods can produce a large amount of mass spectrometric data, and multidimensional visualization of these data is becoming more and more important. It allows surveying data and provides ideas for algorithmic solutions. One example is secondary ion mass spectrometry (SIMS) techniques, where natural tissues can be scanned with a spatial resolution of less than 100nm and the resulting
172
MULLER ET AL.
spectra can be used to visualize the 2- or 3-dimensional distributions of secondary ions (Pacholski & Winograd, 1999). Stoeckli et al. (Stoeckli, Chaurand, Hallahan, & Caprioli, 2001) coated frozen thin sections of tissue with a solution of MALDI matrix, then dried and introduced them into a mass spectrometer, which scanned the sample. For a human brain tissue, an area of 8.5mm x 8mm was scanned with a grid spacing of 100Pm and the position of 45 ions were recorded and rendered as 2dimensional images. In this paper, visualization of all mass fingerprints provides important information on the presence of chemical noise that is shown to be a potential source for false matches in the PMF identification procedure. The correlation of neighbouring spectra is used to recalibrate the mass fingerprints. In order to simplify PMF identifications, an algorithm calculates distributions of peptide signal intensities and joins the masses with similar distributions into clusters. These clusters represent protein spots, and many of them yield a clear PMF identification. These methods were developed in the framework of the molecular scanner, but we think that they are of more general interest since they deal with issues such as chemical noise, calibration, weak signal detection and how contextual information can be used to improve results. 2. METHODS In this experiment, 1 mg E. coli proteins were separated by 2D-PAGE. After in-gel digestion, the proteins were submitted to a digestion-transfer and trapped on a PVDF membrane (Bio-Rad, Richmond CA). A portion with a size of approximately 9x13 mm (corresponding to a pI range of 5.1-5.2 and a Mr range of 35’000-45’000 Da) was cut out from the membrane and pasted on the sampling plate of a MALDI-TOF mass spectrometer (Voyager Elite, Applied Biosystems, Framingham MA), which was equipped with a 337-nm UV laser. 5mg/mL of D-cyano-4-hydroxycinnamic acid (4-HCCA from Sigma, St-Louis MO) dissolved in 70% methanol was sprayed on the PVDF membrane. Then the membrane was scanned on a 48x32 grid with a sampling distance of 0.25 mm. 64 laser shots were fired at a frequency of 3 Hz leading to an acquisition time of about 9 hours. The disc space needed to store all the spectra was 350 MB, which could be compressed to 3MB after peptide signal detection if just the mass fingerprints were stored. More details of the molecular scanner experiment discussed in this article can be found in (Bienvenut et al., 1999). The algorithms used for peptide signal detection and the PMF identification program SmartIdent are described in Gras et al. (Gras et al., 1999). Since the concentration of some proteins was low, only a few of their peptide masses were detectable and the minimal number of matching masses for the PMF search was set to 2 if deconvoluted peptide mass lists were used and to 3 otherwise (since the standard version of SmartIdent requires att least 3 matching masses, it was adapted to the needs of this experiment). The number of missed cleavages was set to one and
SIGNAL TREATMENT AND VIRTUAL IMAGINGT (1/2)
173
only chemical modifications of cysteine and methionine were considered. The mass tolerance was set to 200ppm. A reduced version of Swiss-Prot (Release 39.22 of 20Jun-2001) that contained all 4740 proteins from E.coli was searched for PMF identification. Calculations were performed on a 500 MHz Pentium with 128 MB RAM on Windows NT. Programs were written in C++ and Virtual Reality Modeling Language (VRML 2.0, http://www.sdsc.edu/vrml) was used for visualization. VRML is a software standard that defines the format of data files sent over the Internet for visualization and animation, and is therefore supported by Internet browsers. Netscape® Communicator 4.7 was used to render the VRML data files and ‘m/z’-software by Proteometrics to render single spectra. 3. RESULTS AND DISCUSSION 3.1. Visualization of spectra The data obtained in the molecular scanner experiment consisted of a set of mass spectra: one for each scan point. The first aim was to get an idea of how the data were structured. Since there were 1536 spectra, it was impossible to inspect and compare them by means of conventional visualization tools that are only able to render a few spectra at a time. We designed a method that allows circumventing this problem and inspecting all spectra at once. Each mass detected in a spectrum can be associated with a point in a 3dimensional space (Figure 1a) where the horizontal plain corresponds to the scanned membrane and the vertical axis to the mass value. In Figure 1b all masses between 800 Da and 1000 Da are marked as points revealing that some masses were detected on a contiguous region of the scanned membrane, while others were found only on isolated lattice sites. For the main part of this paper we considered only masses that could be reproducibly detected in a neighborhood, because this provided more reliable results than working with all masses. Therefore a filter discarded a mass from a mass fingerprint if it could not be detected in the majority of the 8 surrounding sites. All lattice sites were treated simultaneously and this process was repeated until a stable configuration was obtained, i.e. the filter can be represented as a synchronous cellular automaton (Toffoli & Margolus, 1987). This filter is different from a filter that selects the most intense peptide signals in an isolated spectrum since it takes into account the spatial correlation of the data. There were several low intensity peptide signals detected on a contiguous region that proved to be essential for the identification of a protein. The masses that pass this filter and do not belong to chemical noise (see below) are called contiguous masses and are depicted in Figure 1c.
MULLER ET AL.
174
a) b)
c)
804.
820. 838.
936.
999. 951.
d) Figure 1. (a) The pI axis goes from 5.1 to 5.2, whereas the Mr axis is inverted and goes from 45’000 Da to 35’000 Da. Masses of one spectrum (m1,…,m5) are schematically depicted. (b) Masses between 800 Da and 1000 Da. The peptide signal detection threshold was set to the optimal value used for the identification where also small signals are detected (signal height > 2.2*noise). (c) Contiguous masses between 800 Da and 1000 Da. Only the masses that were detected in a contiguous, but well localized region are shown. (d) 800 Da –1000 Da portion of a spectrum from the upper right part of the scanned membrane. Only an arbitrary selection of detected peptide signals is labeled
3.2. Chemical noise Figure 1b reveals an interesting feature: some masses cover the entire membrane while others are localized in spots. Figure 1d shows that the localized peptide signals at 951.5 Da and 999.7 Da are not distinguishable from ubiquitous masses at 804.4 Da, 820.4 Da, 838.2 Da and 936.1 Da by means of signal intensity. Figure 4 shows
SIGNAL TREATMENT AND VIRTUAL IMAGINGT (1/2)
175
that signal intensity distributions of ubiquitous masses are flat in contrast to E. coli peptide masses. In order to automatically find ubiquitous masses, a routine test how even and spread out an intensity distribution is. Therefore it divides the membrane into 26 regions of equal size (8x8) and calculates the deviation between each
I i and the overall mean intensity I tot . If the sum over all regions of the relative deviations ¦ I i I tot / I tot is smaller than a certain
region’s mean intensity
i
threshold (=20) and if the mass is detected on more than 72 sites, it is called ubiquitous (Table 1). Table 1. Ubiquitous masses. Mass: Ubiquitous mass value. Since this value is not exactly the same in all spectra where the mass was found the median value is displayed (after calibration, see below). Number of sites: The number of spectra where the mass was found (maximal 1536). Sometimes, masses were detected with a deviation of about +1 Da from a keratin/trypsin peptide mass. This might be due to difficulties to detect the monoisotopic mass for very small peptide signals. Alleged origin: If a mass matched a trypsin (SwissProt entry: TRYP_PIG) or a keratin peptide, it is indicated in this field (one missed cleavage, maximal mass deviation 200 ppm). The following human keratins produced more than one match match: 1) K1CM_HUMAN, 2) K2C1_HUMAN. Mass (Da)
Number of sites
Alleged origin
Mass (Da)
Number of sites
Alleged origin
Mass (Da)
Number of sites
Alleged origin
804.5
1320
keratin2
861.3
97
keratin1
1046.6
180
keratin2
820.4
761
871.2
577
keratin1
1060.3
105
keratin1
823.4
112
912.4
234
1092.2
234
829.3
139
913.5
103
1126.7
170
832.5
377
914.5
74
1164.7
206
833.4
265
926.4
868
1480.0
72
834.4
451
927.5
188
1804.1
99
838.3
636
936.2
355
1994.3
93
keratin2
839.3
315
940.5
582
2118.4
136
keratin1
842.5
1251
1027.2
305
2211.4
92
trypsin
845.3
665
1032.6
154
2250.2
89
859.5
202
1045.7
366
keratin1,2
keratin2
trypsin
trypsin
keratin1
MULLER ET AL.
176
Since diffusion is limited in the molecular scanner technique (Binz et al., 1999), and since none of the ubiquitous masses (exception: 820.4 Da) could be associated with peptide masses of proteins annotated in the respective portion of the master SWISS-2DPAGE (Hoogland et al., 2000) gel (Swiss-Prot entries: IDH_ECOLI, METK_ECOLI, PGK_ECOLI, ACEA_ECOLI), these ubiquitous masses do not stem from proteins of the E.coli sample. However, some of these ubiquitous masses could be attributed to known impurities from tryptic autolysis and various forms of human keratin, whereas the remaining masses couldstem from modified or unknown impurity peptides and matrix clusters. Matrix clusters form another source of chemical noise in the low mass range, especially if the amount of protein to be analyzed is low (Keller & Li, 2000; Land & Kinsel, 2001), but in contrast to contaminating peptides their mass and intensity are not reproducible and it is not sure whether they could be detected over the entire membrane. In addition, the ubiquitous masses could not be explained by a formula for matrix cluster masses as described Keller et al. (Keller & Li, 2000). Whatever the source for the masses listed in Table 1 is, it would be impossible to discern them from low intensity peptides from the E. coli sample without the knowledge of their spatial distribution provided by the molecular scanner data. 3.3. Calibration Masses detected over the entire membrane could be used to investigate the calibration of the mass spectrometer. Figure 2a reveals that mass values were locally quite stable, but varied significantly over the entire membrane, whereas the difference between the minimal and maximal measured value of the trypsin peptide mass at 842.509 Da was about 1 Da because the membrane was warped at its upper edge (high Mr values), and because physical conditions as electric field strength depend on the position of the sampling plate (Egelhofer et al., 2000). Therefore it was impossible to assign precise mass values valuable for all spectra, and a large mass deviation of 700 ppm about the median values had to be taken into account. A re-calibration of the spectra would facilitate data handling, and we had to device a method that does not rely on internal standard masses since these were not used in the experiment described here. Since we had no information about flight times and how they had been converted into mass values, it was not possible to apply the method described in (Christian et al., 2000) to our problem and we had to guess a function that calculates the corrected masses from the original masses. Egelhofer et al. (Egelhofer et al., 2000) used a linear relationship, which was a reasonably good approximation to their data and is easy to calculate with. We chose a different approach:
m1 2 corrected
a1m1 2 a2 m a3m3 2 a4 m 2 k
(1)
SIGNAL TREATMENT AND VIRTUAL IMAGINGT (1/2)
177
which took account of additional terms and fitted f well to observed data (not shown). If the calibration correction is known for a set of masses { m i , m i corrected }, then the parameters
ak can be calculated by a robust fit (Press, teukolsky, Vetterlin, &
Flannery, 1995) of equation 1. None of the trypsin or keratin peptide masses could be detected in all spectra and therefore they could not be used as internal standards. However, many masses found
Figure 2. (a) Masses between 841 Da and 845 Da. The masses around 842.5 Da, which are detected over the entire membrane, correspond to a trypsin peptide, whereas the masses around 843.5 Da stem from Isocitrate lyase (Swiss-Prot entry ACEA_ECOLI) and are localized in the pI- Mr plane except for a few outliners. The scattering of mass values is due to calibration errors that become larger (0.7 Da) towards the edges of the membrane. For better visualization, the mass values are rendered as a surface plot. (b) As in (a), but after calibration using the algorithm described in the text.
in one spectrum could also be detected in the spectra of the neighboring scan points with a relatively small mass deviation (