Microcharacterization of Proteins

R. Kellner, F. Lottspeich, H. E. Meyer Microcharacterization of Proteins 0 VCH Verlagsgesellschaft mbH, D-69451 Weinh...

Author: F. Lottspeich | H. E. Meyer | R. Kellner | F. Lottspeich | H. E. Meyer | R. Kellner

21 downloads 750 Views 14MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form

DOWNLOAD PDF

R. Kellner, F. Lottspeich, H. E. Meyer

Microcharacterization of Proteins

0 VCH Verlagsgesellschaft mbH, D-69451 Weinheim (Federal Republic of Germany), 1994

Distribution: VCH, P.O. Box 101161, D-69451 Weinheim, Federal Republic of Germany Switzerland: VCH, P. 0. Box, CH-4020 Basel, Switzerland United Kingdom and Ireland: VCH, 8 Wellington Court, Cambridge CB1 l H Z , United Kingdom USA and Canada: VCH, 220 East 23rd Street, New York, NY 100104606, USA Japan: VCH, Eikow Building, 10-9 Hongo 1-chome. Bunkyo-ku, Tokyo 113, Japan ISBN 3-527-30048-1

R. Kellner, E Lottspeich, H. E. Meyer

Microcharacterization of Proteins

4b

VCH

Weinheim - New York Base1 - Cambridge - Tokyo

Main authors: Dr. Roland Kellner till Oct. 31, 1994 European Molecular Biology Laboratory MeyerhofstraRe 1 D-69012 Heidelberg Germany presently: Institute for Physiological Chemistry and Pathobiochemistry Johannes Gutenberg University Duesbergweg 6 D-55099 Mainz Germany

Priv.-Doz. Dr. Friedrich Lottspeich Max Planck Institute for Biochemistry Am Klopferspitz 18a D-82152 Martinsried Germany Priv.-Doz. Dr. Helmut E. Meyer Ruhr University Bochum D-44780 Bochum Germany

This book was carefully produced. Nevertheless, authors and publisher do not warrant the information contained therein to be free of errors. Readers are advised to keep in mind that statements, data, illustrations, procedural details or other items may inadvertently be inaccurate.

Published jointly by VCH VerlagsgesellschaftmbH, Weinheim (Federal Republic of Germany) VCH Publishers Inc., New York, NY (USA) Editorial Director: Dr. Hans-Joachim Kraus Editorial Manager: Karen Weber Production Manager: Dip1.-Wirt.-Ing. (FH) H.-J. Schmitt Cover illustration: Anneke de Raadt-Schroth Letter arrangement with oil marbled paper Library of Congress Card No.: applied for British Library Cataloguing-in-Publication Data: A catalogue record for this book is available from the British Library Die Deutsche Bibliothek - CIP-Einheitsaufnahme: Microcharacterization of proteins I R. Kellner ... With contributions of U.Bahr ... - Weinheim ; New York ;Basel ; Cambridge ;Tokyo : VCH, 1994 ISBN 3-527-30048-1 NE: Kellner, Roland

0 VCH VerlagsgesellschaftmbH, D-69451 Weinheim (Federal Republic of Germany), 1994

Printed on acid-free and chlorine-free paper All rights reserved (including those of translation into other languages). No part of this book may be reproduced in any form -by photoprinting, microfilm, or any other means -nor transmitted or translated into amachinelanguagewithout written permission from the publishers. Registerednames, trademarks, etc. used in this book, even when not specifically marked as such are not to be considered unprotected by law. Printing: Druckhaus Diesbach, D-69469 Weinheim Bookbinding: GroBbuchbinderei J. Schaffer, D-67269 Griinstadt Printed in the Federal Republic of Germany.

In June 1994 we organized the first meeting called "Mikromethoden in der Proteinchemie" at the Max Planck Institute for Biochemistry in Martinsried. Attendance was large, indicating an apparently broad-based interest in the detailed description of aspects of protein chemistry - perhaps because proteins are becoming more and more important in a variety of different disciplines. In selecting the presentations we focused on the most important methods, with descriptions of established ones as well as new ones. The speakers were experienced workers in the field, who made the conference attractive to interested learners and established 'protein people' alike. It was hoped that this would initiate lively exchange and dialogue between all participants. The chapters presented in this book are the written outlines of the spoken presentations and try to combine basic explanation with detailed descriptions of optimized procedures for sample handling at the micro level. Thanks to the help of all the authors, we are now able to offer this book to a wider public, and are very pleased of the chance to pass on various tips and ideas to those who need them. We should like to see this book become a valuable tool for anyone seeking to gain access to analytical problems in protein chemistry. Heidelberg, Martinsried and Bochum. July 1994 Roland Kellner

Friedrich Lottspeich

Helmut E. Meyer

Contributors Bahr, Dr. Ute; Institute of Medical Physics and Biophysics, University of Miinster, D-48149 Munster, Germany Eckerskorn, Dr. Christop; Max Planck Institute for Biochemistry, D-82152 Martinsried, Germany Eisermann, Bernd; Institute for Physiological Chemistry, Ruhr University, D-44780 Bochum, Germany George, David G.; Max Planck Institute for Biochemistry, D-82152 Martinsried, Germany Hillenkamp, Prof. Dr. Franz; Institute of Medical Physics and Biophysics, University of Munster, D-48149 Munster, Germany Houthaeve, Tony; European Molecular Biology Laboratory, D-69012 Heidelberg, Germany Karas, Dr. Michael; Institute of Medical Physics and Biophysics, University of Munster, D-48149 Miinster, Germany Mann, Dr. Matthias; European Molecular Biology Laboratory, D-690 12 Heidelberg, Germany Metzger, Dr. Jorg; Institute for Oranic Chemistry, University of Tubingen, D-72076 Tubingen, Germany Mewes, Dr. Hans-Werner; Max Planck Institute for Biochemistry, D-82152 Martinsried, Germany Schagger, Dr. Herrmann; Johann Wolfgang Goethe University, D-60590 Frankfurtm, Germany Schwer, Dr. Christine; Max Planck Institute for Biochemistry, D-82152 Martinsried, Germany Serwe, Maria; Institute for Physiological Chemistry, Ruhr University, D-44780 Bochum, Germany Weigt, Dr. Christiane; Institute for Physiological Chemistry, Ruhr University, D-44780 Bochum, Germany

Contents Section I: Overview 1.1 1 2 3

Microcharacterization of Proteins 3 Friedrich Lottspeich General Aspects 3 From a Cell to a Protein Sequence 5 FutureTrends 8

Section 11: Sample Preparation 11.1

Chemical and Enzymatic Fragmentation of Proteins 11 Roland Kellner

1 2 3 3.1 3.2 4 4.1 4.2 4.3 4.4 4.5 5

11.2

Strategy 11 Denaturation, Reduction and Alkylation 13 Enzymatic Fragmentation 15 Enzymes 17 Practical Considerations 17 Chemical Fragmentation 19 Cyanogen Bromide Cleavage 19 Partial Acid Hydrolysis 22 Hydroxylamine Cleavage of Asn-Gly Bonds 22 Cleavage at Tryptophan 24 Cleavage at Cysteine 24 References 25

Microseparation Techniques I: High Performance Liquid Chromatography 29 Maria Senve and Helmut E. Meyer

1 2 2.1 2.2 2.3 2.4 2.5

Introduction 29 Getting Started 31 Solvents 31 Pump 32 Pre-Column Split 33 Sample Preparation 33 Injector 33

X

Content

2.6 2.7 2.8 2.9 2.10 2.11 3 4

11.3

Tubings 34 In-Line Filter, Guard Column 34 Analytical Column 34 Elution 37 Detection 37 Fractionation 38 Applications 39 References 44

Microseparation Techniques 11: Analysis of Peptides and Proteins by Capillary Electrophoresis 47 Christine Schwer

1 2 2.1 2.2 2.3 3 3.1 3.2 4 4.1 4.2 5

11.4

Introduction 47 Theory 47 Capillary Isotachophoresis 48 Capillary Zone Electrophoresis 49 Electroosmotic Flow 50 Instrumentation 52 Injection 52 Detection 55 Applications 57 Peptide Separations 57 Protein Separations 58 References 6 1

Microseparation Techniques 111: Gel Electrophoresis for Sample Preparation in Protein Chemistry 63

Hermann Schagger

1 2 2.1 2.2 2.3 2.4 2.5 3 3.1

Introduction 63 Denaturing Techniques 65 Commonly Used SDS-Polyacrylamide Gel Electrophoresis Techniques for Protein Separation 65 Blue-SDS-PAGE for Quantitative Protein Recovery from Gels 67 Electroelution of Proteins After Blue-SDS-PAGE 68 Electroblotting of Blue and Colourless SDS Gels 69 Isoelectric Focusing in the Presence of Urea 69 Native Techniques 70 Colourless-Native-PAGE 70

Content

3.2 3.3 4

11.5

Blue-Native-PAGE 70 Native Isoelectric Focusing 73 References 73

Microseparation Techniques IV: Electroblotting 75 Christoph Eckerskorn

1 2 2.1 2.2 2.2.1 2.2.2 2.3 2.3.1 2.3.2 2.3.3 2.3.4 2.3.5 3 4

Introduction 75 Electroblotting 76 Polyacrylamide Gel Electrophoresis 76 Blot Systems 76 Tank Blotting 76 Semidry Blotting 78 Blotting Parameters 80 The Blotting Process 80 Transfer Buffers 81 Addition of SDS 81 Addition of Methanol 82 Influence of Protein Concentration 82 Blotting Membranes 82 References 87

Section 111: Amino Acid Analysis 111.1

Amino Acid Analysis 93 Roland Kellner, Helmut E. Meyer and Friedrich Lottspeich

1 2 2.1 2.1.1 2.1.2 2.1.3 2.2 3 3.1 3.1.1 3.1.2 3.1.3 3.2 3.2.1

Introduction 93 Sample Preparation 94 Peptides and Proteins 94 Enzymatic Hydrolysis 94 Acid Hydrolysis 95 Alkaline Hydrolysis 96 Free Amino Acids 97 Derivatization 97 Post-Column Derivatization 99 Ninhydrin 99 Orthophthaldialdehyde 99 Fluorescamine 101 Pre-Column Derivatization 101 Phenylisothiocyanate 101

XI

XI1

Content

3.2.2 3.2.3 3.2.4 3.2.5 3.2.6 4 5 6 7

Orthophthaldialdehyde 102 Fluorenylmethyl Chloroformate 102 Dabsyl Chloride 102 Dansyl Chloride 103 Chiral Reagents 103 Data Evaluation 105 Instrumentation 106 Discussion 107 References 110

Section IV. Protein Sequence Analysis IV. 1 1 1.1 1.2 2 2.1 2.2 2.3 2.4 2.5 3 3.1 3.2 4 5 6

IV.2

The Edman Degradation 117

Friedrich Lottspeich, Tony Houthaeve and Roland Kellner The Edman Chemistry 117 Coupling, Cleavage and Conversion 117 Identification of the PTH Amino Acids 119 Instrumentation 121 The Liquid Phase Sequencer 122 The Solid Phase Sequencer 123 The Gas Phase Sequencer 123 The Pulsed Liquid Phase Sequencer 124 The Biphasic Column Sequencer 124 Difficulties of Amino Acid Sequence Analysis 125 The Sample and Sample Matrices 125 Difficulties with the Edman Chemistry 127 Protein Ladder Sequencing 128 State of the Art 129 References 130

Analyzing Post translational Protein Modifications 131 Helmut E. Meyer

1 2 2.1 2.2 2.3

Introduction 131 Classification of Post-translational Modifications according to their Behaviour during Purification and Edman Degradation 132 Modifications: Stable During Purification and Edman Degradation 132 Modifications: Stable During Purification but Unstable during Edman Degradation 132 Modifications: Unstable During Purification and Edman Degradation 133

Content

3 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 4

Examples 133 1-Methyl-Histidine 133 Glyco-Asparagine, Glyco-Threonine 134 Phospho-Tyrosine 135 N-Pyruvyl or N-a-0x0-butyric Acid 135 Gluco-Arginine 138 Farnesyl-Cysteine 139 Phospho-Serine 140 Phospho-Threonine 142 Screening for Phospho-Serineheonine Containing Peptides by HPLCNS 143 Lanthionine, 3-Methyl-Lanthionine,Dehydroalanine, Dehydro-a-aminobutyric Acid 145 References 145

Section V: Bioanalytical Mass Spectrometry v.1

Analysis of Biopolymers by Matrix-Assisted Laser DesorptiodIonization (MALDI) Mass Spectrometry 149 Ute Buhr, Michael Kurus and Frunz Hillenkump

1 2 2.1 3 3.1 3.2 3.3 4 4.1 4.2 4.2.1 4.2.2 4.3 4.4 5 5.1 5.2 5.3 6 6.1 6.2 7

XI11

Introduction 149 Development of MALDI 150 Mechanism of Matrix-assisted Laser Desorptionhonization 150 Instrumentation 151 Time-of-flight (TOF) Mass Spectrometers 151 Laser Desorption Ion Source 153 Ion Detection and Data Collection 154 Applications 155 Sample Preparation 155 Molecular Weight Determination of Proteins and Glycoproteins 155 Accuracy of Mass Determination 158 Sensitivity and Mass Range 159 Analysis of Oligonucleotides 159 Analysis of Glycans and Glycoconjugates 161 Combination of MALDI with Biochemical Methods 162 Peptide Mapping of Digested Proteins by MALDI 162 Combination of MALDI and Gel Electrophoresis 162 Combination of MALDI with Capillary Zone Electrophoresis 163 Future Developments 163 Peptide Sequencing 163 "Surface" MALDI 164 References 164

XIV

V.2

Content

Electrospray Mass Spectrometry 167 Jorg Metzger and Christoph Eckerskorri

1 2 2. I 2.2 2.3 3 4

5 5.1 5.2 5.3 6 6.1 6.2 6.3 6.4 7

V.4

Introduction 167 Instrumentation 167 The Electrospray Source 168 The Mass Analyser 170 The Detector 171 The Ion Spectra 171 Coupling of Chromatographic Methods to the Mass Spectrometer 174 Off-line HPLC-MS 175 Sample Introduction with an Autosampler 176 Purity Control of Synthetic Peptides 177 Characterization of Synthetic Peptide Libraries 179 Structure Elucidation of Peptides and Proteins 184 HPLC Coupled to Mass Spectrometry 184 Capillary Electrophoresis Coupled to Mass Spectrometry 184 Microcapillary LC coupled to Mass Spectrometry 184 Practical Aspects 185 References 185

Sequence Analysis of Proteins and Peptides by Mass Spectrometry 189 Christiane Weigt,Helmut E. Meyer and Roland Kellner

1 2 2.1 3 4 4.1 4.2 4.3 5

Introduction 189 Protein Sequencing by Mass Spectrometry 190 Tandem Mass Spectrometry 193 Strategy for Protein Sequencing with Electrospray Tandem Mass Spectrometry 195 Examples of Protein Sequencing Using Tandem Mass spectrometry 197 Sequence Analysis of Peptides Presented to the Immune System by MHC Molecules 197 Partial Sequencing and Identification of a Phosphorylation Site of Recombinant Mitogen-Activated Protein Kinase p42mapk 199 Protein Sequence Analysis by Tandem Mass Spectrometry in Combination with Microcapillary HPLC 201 References 204

Content

XV

Section VI: Database Analysis VI. 1 1 2 3 3.1 3.2 4 4.1 4.2 5 6

VI.2

Protein Sequences and Sequence Databases 209

Hans- Werner Mewes and David G. George

Introduction 209 Current Databases 210 Data Processing and Principles of Data Organisation 21 1 Data 214 Computer Science in the Development of Biological Databases 215 Access to Molecular Sequence Databases 215 The ATLAS Multidatabase Information Retrieval System 217 Searching Molecular Sequence Databases 219 Future Developments 220 References 22 1

Mass Spectrometrical Data for Protein Sequence Analysis 223

Matthias Mann 1 2 2.1 2.2 2.3 2.4 3 3.1 3.2 4 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11

Introduction 223 Program and Algorithm 224 Requirements for Installing Peptidesearch 224 Data Structures and Algorithm 225 Performance 226 Collaboration with Other Programs 226 Searching by Total Molecular Weight 227 Limitations of Searching by Total Molecular Weight 228 Recommendations and Prospects for Searching by Total Molecular Weight 230 Searching by the Molecular Weight of a Set of Peptides Generated by Sequence-Specific Cleavage of a Protein 230 Influence of Mass Accuracy 234 Influence of Target Protein Mass Range 234 Influence of Minimum Number of Peptide Matches 235 Influence of Partial Digestion Setting 235 Choice of Enzyme 235 Avoiding False Positives 236 Special Searches: Time Course Digestion, Parallel Digestion and Subdigestion 237 Special Searches: DNA Database Searching 237 Searching Incompletely Purified Proteins and Protein Mixtures 238 Limitations of Searching by Peptide Masses 238 Recommendations and Prospects of Searching by Peptide Masses 238

XVI

5 5.1 5.2 5.3 5.4 5.5 5.6 5.7 6 7

VI.3

Content

Searching by the Molecular Weight of a Peptide and Its Partial Sequence 239 Searching by Partial Sequence Obtained by Edman Degradation and the Peptide Molecular Weight 239 Searching by an MUMS Pattern 240 Matching Peptides with Sequence Errors 242 Matching Peptides with Post-translational Modifications 242 Removing Contaminating Proteins 242 Searching DNA Sequence Libraries 243 Recommendations and Prospects 243 Conclusion 244 References 245

Software Packages for Personal Computers 247 B e m d Eisermann and Helmut E. Meyer

1 2 3 4 4.1 4.2 5

Introduction 247 Overview of Protein and DNA Databanks 247 What a Program for Sequence Analysis Should Do 248 Available Program Packages 249 On-Line Services 250 Program Packages for Personal Computers 251 References 259

Index 261

Section I: Overview

Microcharacterization of Proteins by R. Kellner, E Lottspeich & H. E. Meyer 0 VCH Verlagsgesellschaft mbH, 1994

I. 1 Microcharacterization of Proteins Friedrich Lottspeich

At the beginning of the 20th century it became apparent to biochemists that proteins are macromolecules which consist of amino acid components. Emil Fischer deduced from hydrolyzation experiments and the very first peptide syntheses (1903) that proteins are built up of numerous linked a-amino acids. The nature of different amino acids and the peptide bond linkage became clear. However, small synthetic peptides did not exhibit the properties of proteins. It was not realized until 1950 that all molecules of a given protein have one unique amino acid sequence. Several key developments helped together to reveal the molecular architecture of proteins, namely the development of chromatography by Martin (1941) and amino acid analysis by Stein and Moore (1948). One milestone was the finding of Frederick Sanger, who identifiedphenylalanine as an end-group of insulin by its reaction with 2,4-dinitrofluorobenzene. The Sanger method forms yellow 2,4-dinitrophenyl (DNP) derivatives with amino groups. At the same time Pehr Edman established the stepwise degradation of proteins using phenylisothiocyanate as a reagent. Then in 1953 Sanger and Tuppy were able to determine the amino acid sequence of insulin, which was the first protein characterized on the molecular level. Since then, biosciences and biotechnology have experienced an unprecedented expansion. Several key areas have driven this evolution, e.g., gene technology, hybridoma and cell culture techniques and, especially, protein chemical separation and analysis methods, which are discussed in the following chapters of this book.

1 General Aspects In the classical view, the reason for a protein purification is the observation of a biological phenomenon and search for its molecular basis. Thus, a single or a few proteins which are correlated with observed activity are isolated and characterized at the molecular level. Knowledge of the primary structure of these proteins as well as their modifications and processing sites is of fundamental importance. To acquire this knowledge, particularly when only a small amount of protein is available, may be difficult and laborious. All techniques available at the time have to work in a concerted action to elucidate the structure of the protein in a reasonable time. The entire amino acid sequence of a protein is now more easily obtained by deduction from its DNA sequence rather than analyzing the total protein by protein chemical amino acid sequence determination. However, some protein chemical information is always required to isolate the DNA or to check the accuracy of the DNA R. Kellner, F. Lottspeich, H.E. Meyer (1994) Microcharacterization of Proteins, VCH Weinheim

4

1

F. Lottspeich

-

50 000 100 000 human genes

Genome

I

I

mRNA/cDNA

4

-AAACAATGCCAA-

Identification of an unknown protein

4

c

Identification of a known protein

*--------+-----

t

f

Figure 1. Protein analysis as an interface between genomic and protein information

1.1 Microclzrimcter.iectiorl of Proteins

5

sequence. Important information, like the actual N-terminus or C-terminus of a protein, can only be determined by protein chemical methods. Probably the most important task of protein chemistry determing the post-translational modifications, about which little can be learned from the analysis of DNA sequences, but which often regulate the activity of the proteins. Supplementation with immunologic, crystallographic or NMR data provides additional information which is not available by analysis of the DNA or protein sequences alone. Modern protein chemistry is one of the important members in the orchestra of all the different disciplines in the biosciences, providing the basis for a detailed understanding of structure-function relationships. In recent years additional areas of responsibility have been allocated to protein chemistry, mainly due to the developments in recombinant DNA techniques. Projects to analyze the human genome or the genomes of other organisms at the DNA or RNA level are the focus of strong international research activities. However, even today it is apparent that the meaningfulness of the genomic analyses will be limited, unless at the same time more attention is paid to the function of the huge number of sequenced or characterized pieces of DNA. The connective link between the genome and the multiple cellular functions of an organism are the proteins, the direct translation products of the genes. Thus, the analysis of proteins, which are the real players and tools in the cell, is a supplementary approach to understanding the biological events in a cell. Figure 1 outlines how protein analysis links genomic and protein information at the molecular level.

2 From a Cell to a Protein Sequence Nowadays it is possible to separate most of the cellular proteins by two dimensional (2D) gel electrophoresis and to check their amount quantitatively. In this way, a picture of the protein expression for a certain cell stage can be obtained. In a kind of subtractive approach, different metabolic stages of cells can be compared with the help of computerized image analysis and data handling. The subtle protein changes that arise in all complex biological systems against a background of constitutive proteins have to be recognized and further analyzed with modern protein chemical micro methods. The amounts of the most abundant proteins in a 2D gel electrophoretic separation are in the very low picomole or femtomole level, corresponding to few micrograms or even submicrogram amounts of material. It is because the amounts of interesting proteinsare so small that it has been so imperative to improve the protein analytical methods towards higher sensitivity and more speed. Sequence analysis, for example, has come to a level where a few picomoles of a protein can be sequenced: that is roughly the amount of the 100 - 300 most abundant cellular proteins separated in 2D gel electrophoresis. All the other most commonly used methods in protein chemistry, like separation techniques, amino acid analysis or mass spectrometry, today work in a similar sensitivity range. An urgent need also to attack the proteins present in even lower quantities is a strong driving force for further developments in methodology and instrumentation. Improvements in sensitivity to the femtomole level for all the protein chemical methods mentioned is within the reach of modern micro protein chemistry. The consequence will be that thousands of proteins of a cell it may become possible to analyze.

6

F. Lottspeich

Manipulating these small quantities requires several peculiarities to be taken into consideration. Proteins adsorb strongly to any kind of surface and microgram amounts may be lost in a few minutes to the wall of a vessel. Consequently. several techniques commonly used on the macro scale, like ultrafiltration, dialysis or lyophilization, cannot be recommended when working with micro amounts. Sometimes adsorbed protein can be recovered by treatment with concentrated formic acid or by incubating with detergent solutions. In general a good recovery is achieved for peptide material dissolving it in aqueous trifluoroacetic acid with a few percent of acetonitrile. At the micro level contamination becomes a major problem. Laboratory dust and impurities in solvents, reagents and equipment are almost inevitable sources of contamination. It is obvious that automated separation and reaction devices which use minimum volumes of solvents and reagents and which keep the sample in a closed environment should be used for preference. However, so far, common laboratory equipment seldom is in accordance with these requirements, and thus instrument development is called for. As a consequence of the special situation with micro amounts, careful planning of the purification strategy is imperative to minimize all the handling and transfer steps and to keep contamination as low as possible. Often, early fractionation steps by multistep extraction or precipitation techniques, commonly used in large scale protein purification. cannot be adopted at the micro scale. Even at the early stages the most efficient separation techniques available have to be adopted in an optimal sequence. Detection methods like mass spectrometry, or diode array spectroscopy yielding multiple information are strongly recommended. In Figure 2 several strategies are summarized which have been developed and successfully applied in the last years.

CELL Fractionation Cleavage in gel

Cleavage on membrane

RP HPLC

-

1

f---

I Sequence analysis] I Amino acid analysis]

Elution of peptides

I

Figure 2. Strategies for the microcharacterization of proteins.

Mass spectrometry

1

1.1 Microcharacterization of Proteiirs

7

Starting material like a cell or a complex protein mixture already enriched in the protein of interest by conventional purification steps is applied to a high-resolving separation method. In Figure 3 the different fractionation methods are compared according to their capability to separate molecules of different sizes. Immediately it becomes clear why gel electrophoresis is usually the method of choice to separate complex protein mixtures. For small molecules (e.g., amino acids or peptides) chromatography has the best separation power, which is reflected in almost exclusive use of high performance liquid chromatography (HPLC) for amino acid separations or peptide fractionations and peptide maps. For proteins in the molecular weight range of 5000 to 200 000 Dalton gel electrophoresis, particularly in its 2D mode, is capable of separating up to 10000 proteins in a single analysis. Capillary electrophoresis is gaining increased attention as a fairly new method in protein chemistry. However, so far it has not been adopted for routine preparative work, due to technical difficulties in applying large quantities of samples and in collecting fractions. Nevertheless, capillary electrophoresis is ideally suited in principle for microscale separations. Samples are eluted at high concentrations and it can be connected on-line to other analysis techniques like mass spectrometry. Furthermore, capillary electrophoresis with its ionic separation mode is ideally complementary to reversed phase micro HPLC, where the separation is caused by hydrophobic interactions of the solutes. When the individual components of the complex sample are resolved, then they are further characterized by three main methods: amino acid sequence analysis, amino acid composition analysis and mass spectrometry. Here, the immediate goal is to recognize if the protein under investigation has already been sequenced or if it is a new, as yet unknown compound. If no N-terminal sequence can be obtained due to a blocked N-terminus, which happens in almost every second case, a database search with the amino acid composition

109i CELLS

5 3

lo8. 7

ORGANELLES

4 10

SF-L

/

PROTEINS

1 0- ELECTROPHORESIS

PEPTIDES

lo3. lo2-

AMINO ACIDS

10’

1o2

1o3

104

PEAK CAPACITY

Figure 3. Separation capacity of different methods depending on the molecular mass of the substances

8

F. Lottspeich

may give a hint as to the nature of the protein. However, usually with a blocked protein cleavage with enzymes or chemicals has to be performed. The internal fragments produced can be analyzed directly by matrix-assisted laser desorption/isonization (MALDI) mass spectrometry. With the help of computer programs the fragment masses found can be compared with the calculated fragment masses of all the protein sequences stored in protein and DNA databases. Multiple matches will yield significant evidence towards a knowledge of the protein. If the protein is unknown, or if the mass search is not unambiguous, or if modifications are present, the fragments have to be separated by microbore reversed phase HPLC or capillary electrophoresis. Both techniques may be coupled on-line to electrospray mass spectrometric detection, and sequencing will beperformedeitherbyEdman degradation or M S N S . The sequence information obtained is again used for a database search for identities or homologies. Usually one or two short sequences of 6-8 amino acid residues are sufficient to recognize a homology or an identity: in fact, often not more of the fragments than this will be analyzed at the protein level. The identifed peptide fragments usually provide sufficient sequence information for isolating the DNA of the protein of interest or for the production of monospecific antibodies. Once the cDNA is sequenced, the mass of the encoded protein sequence and the actual measured mass of the isolated protein have to be compared. In case of discrepancies the N- and C-terminii need to be checked and/or modified amino acid residues must be identified. Determination of the exact position of protein modification is a tremendous challenge when sample amounts are small.

3 Future Trends In the future it can be expected that the sensitivity limits of all the analysis methods will be brought down further. Even now spectacular work is for example being done on single molecule detection. Capillary chromatography and capillary electrophoresis in conjunction with mass spectrometry appear to be promising tools for femtomole analysis of proteins or peptides. In minimizing the contamination and transfer problems, adsorptive and covalent attachment to solid supports will be further developed. These supports will function as a kind of "sample bus" where the immobilized proteins can be washed extensively without loss. Additionally, reactions and sample transfers can be performed in closed automated devices where modern analysis techniques will work on them directly. The protein chemical methods will have to speed up to be able to compete with the enormous data output of the automated DNA sequencing projects; here again mass spectrometry is predicted to play a major role. In addition, the huge accumulation of DNA- and protein-derived data will certainly force the development of computational methods for data handling. Finally, in the near future the concentration of research may drift away from single proteins to more complex structures like multimers, complexes and organelles. Method research has to concentrate on how to analyze such structures, and on macromolecular complex assembly and disassembly. So far, there are only rather primitive tools available.

Section 11: Sample Preparation


11.1 Chemical and Enzymatic Fragmentation of Proteins Roland Kellner

1 Strategy Chemical or enzymatic fragmentation is required to study the covalent structure of a protein. Resulting peptide fragments have to be isolated and analysed, and the information gained can be used to clone the structural gene, or to identify a N-terminally blocked protein, or for structure/function studies. Protein amounts less than 100pmol (= 3 pg for a 30 kDa protein) are difficult to handle. Any desalting step, precipitation or lyophilization, etc., carries the risk of sample loss due to adsorption onto surfaces or of insolubility after a drying step. Therefore, on the one hand the number of purification steps must be minimized. On the other hand, extremely high purity is required for microcharacterization techniques like Edman sequencing or mass spectrometry. Additionally, a conflict may arise because of incompatibility of buffers or detergents with a forthcoming fragmentation or separation step. Only a limited number of isolation techniques is useful for microscale sample preparation and fragmentation. Reversed phase HPLC and gel electrophoresis combined with electroblotting are used in most applications. The quantitation of proteins is difficult but important for planning or interpretation. Classical techniques like Lowry or Bradford require microgram amounts [Stoscheck 19901. This is in the range of the protein amount available for the total analysis and therefore, these methods are not applicable. Generally, the quantities are estimated by the intensity of a Coomassie Blue stained gel spot. However, proteins are stained very differently by Coomassie Blue and over- or underestimation can easily give an error of factor ten. Amino acid analysis is the method of choice to achieve quantitative informations on protein samples (see Chapter 111). The chemical or enzymatic fragmentation of a native protein may be hindered due to its secondary and tertiary structure caused by disulfide bridges. The reduction of a cystine to cysteines elongates the protein backbone. Subsequently the alkylation of thiol groups yields stable cysteine derivatives and these can be detected. In general there are three ways in which aprotein can be applied for a cleavage reaction: (1) in solution; (2) bound onto a membrane; (3) in the polyacrylamide matrix (Figure 1). Proteins in solution are supposed to be the simplest form to start a digest - however, it is often difficult to get the protein in solution. A solubilization buffer may contain denaturing R. Kellner, F. Lortspeich, H. E. Meyer (1994) Microcharacterization of Proteins, VCH Weinheim

R. Kellner

12

membranesolution

gel slice

I I I I

I

I I I I

I I

I

Chemical / enzymatic Fragmentation

I

I

I

b

I

I I 1 I I I I I I

I I I I I

I I I I SDS-PAGE / Electroblotting

.

R P-HPLC

I I I I I Characterization

I I I I I

+---

7

Figure 1. Strategy for characterizing proteins. detergents and chaotropic salts which might affect protease activity or interfere with a subsequent chromatographic separation. For these samples an extra purification step is required. Electrophoretic separation and electroblotting of a protein onto a membrane is the most flexible strategy for protein microisolation. Membrane supports in use for protein characterizations are either of polyvinylidene difluoride (PVDF) [Pluskal 1986, Matsudaira 19871, siliconized glass fibre [Eckerskorn 19881or nitrocellulose [Aebersold 19871. The protein band is visualized by staining, e.g. by Coomassie Blue, Amido Black

11.I Chemical and Enzymatic Frugnzentatzon

13

or Ponceau S, and then excised in order to be applied for N-terminal sequencing, fragmentation, or amino acid analysis. PVDF membranes are inert and withstand the harsh conditions of Edman chemistry or acid hydrolysis. The protein bound onto PVDF can be applied for an N-terminal sequencing approach and if it is shown to be blocked, that piece of PVDF can still be used to check the quantity or determine the composition by amino acid analysis. Glass fibre membranes are routinely used as a support in protein sequencers but they are not recommended for amino acid analysis. Nitrocellulose binds proteins less strongly than PVDF does and therefore more hydrophobic peptides are recovered from this membrane. However, it has the disadvantage that it is neither compatible with Edman chemistry nor with acid hydrolysis; it decomposes e.g. at higher acetonitrile concentrations. Proteins can also be cleaved directly within the polyacrylamide matrix. After separation on a 1D- or 2D-SDS PAGE the protein is visualized by Coomassie staining and the band is cut out by a scalpel. The gel piece is washed, dried and then the protease and buffer are added. While reswelling the matrix takes up the enzyme and the digestion starts. With the Cleveland method the gel slice is placed in the sample well of a second gel and then overlaid with protease. Digestion proceeds directly in the stacking gel during subsequent electrophoresis, and proteolytic fragments are immediately separated [Cleveland 19771. The recovery of the resultingpeptidescan be done either by electroblotting [Kennedy 19881 or they are eluted from the gel matrix by organic solvents and separated by RP-HPLC [Eckerskorn 19891. The in matrix protocol uses only two steps for protein isolation and digestion [( 1) gel electrophoresis, (2) fragmentation of the gel embedded protein] while the blotting strategy need three [(1) gel electrophoresis, (2) electroblotting, ( 3 ) fragmentation of the membrane-bound protein]. Reduced handling of the sample helps to improve the yield of the peptide fragments which need to be determined and therefore the in matrix digestion is advantageous for small sample amounts [Kurzchalia 1992, Fiedler 19941.

2 Denaturation, Reduction and Alkylation Fragmentation reactions are facilitated by denaturing the investigated protein. The cleavage sites are not equally accessible due to the secondary and tertiary structure of a native protein. Denaturation disrupts the structure of the protein chain and its conformation approaches that of a random coil. Detergents, urea or guanidine hydrochloride are used as denaturants. Detergents are mainly used for solubilization and disaggregation of membrane proteins [Neugebauer 19901. Urea can be contaminated by cyanate ions which cause carbamylation of the amino groups of a protein [Cole 19661, thereby blocking the Nterminus. Guanidine hydrochloride may be preferred as adenaturing agent. For denaturation a 6 M guanidine solution is applied. Subsequently it is dissolved to give a 1-2 M concentration which is compatible with the activity of many proteases [Riviere 19921. Disulfide bonds can also hinder proteolysis. Furthermore, cleavage fragments are difficult to identify if they comprise two peptide chains still linked by an S-S bond. Therefore, the reduction and alkylation of cystines facilitates cleavage reactions. First, the S-S bridges are cleaved by reduction, yielding two cysteines (Figure 2). This is achieved by dithiothreitol (DTT), 2-mercaptoethanol or tributylphosphine as reducing agents. DTT (Cleland' s reagent) [Cleland 19641 has some advantages, e.g. a low redox potential, and at pH 8 the reduction of a cystine is completed in a few minutes; it is resistant

14

R. Kellner

HO

RS-SR

OH

+ HS-)-(-SH

---+

---+

RSH

+

2 RSH

RS-S

+

SH

"OHoH

s-s

Dithiothrei tot

2-Mercaptoethanol

RS-SR

+

-P

OH

---+

2RSH

+

-p=O

Tributylphosphine

Figure 2. Reduction of disulfide bonds. to air oxidation; it has not such an unpleasant odor as 2-mercaptoethanol. Tributylphosphine [Riiegg 19771 gives the advantages of a vapor-phase reaction [Amons 19871. Secondly, the thiol groups are modified (Figure 3). Cysteine residues in a protein are modified either before gel electrophoresis or after the proteolytic digest prior to HPLC [Tempst 19901. Modification is necessary to stabilize the thiol groups. Cysteine is destroyed during Edman degradation and cannot be identified. The two most frequently used agents for alkylation are 4-vinylpyridine [Raftery 19661, yielding (4pyridylethyl)cysteine, and iodoacetic acid (or iodoacetamide), yielding S-carboxyniethyl cysteine [Crestfield 19631. The pyridylethylation needs to be followed immediately by

11.1 Chemical and Enzymatic Fragmentation

CH2=CH I RSH

15

RS-CH2-CH2 I

+ N

N

4-Vinylpyridine

RSH

+

I-CH2-COOH

-

RS-CH2-COOH

lodoacetic acid Figure 3. Alkylation of cysteine residues in proteins.

separation of the reaction mixture because prolonged incubation causes several side reactions of the excess reagent (i.e. His, Trp and Met maybe modified). The reduction/ alkylation should be fitted into the overall protocol in such a way that no extra purification step is required, e.g. prior to the last electrophoretic separation. It should be remembered that residual acrylamide monomers in polyacrylamide gels are known to couple to thiol groups. This side reaction has also been suggested for use in Salkylation [Brune 19921. A variety of protein modifications is described in the literature [Glazer 1975, Darbre 1986, Allen 19891but they play no significant role in protein microcharacterization. A nanoniole amount of protein is generally needed for a modification reaction because of inherent difficulties like multiple product formation or side reactions. Applications in the inicroscale range are therefore nearly impossible.The reduction/alkylation of cysteine is a rare exception.

3 Enzymatic Fragmentation The enzymatic fragmentation of proteins has several advantages: Only a catalytic amount of the protease is needed High specificity High cleavage yield Side reactions are negligible Proteases are commercially available in high quality. The choice of an enzyme is according to given information and the purpose of the fragmentation. Knowledge of the amino acid composition or even the primary sequence

16

R. Kellner

helps in selecting an enzyme that will give either an excessive or a more limited fragmentation - so many small or a few large peptide fragments will occur. Specific proteases can be used to cleave either at basic, acidic or hydrophobic side chains. A hypothetical protein of 300 amino acid residues can be calculated from a protein database (NBRF-PIR) by the total number for each amino acid residue versus the number of database entries. This example helps to demonstrate the number of specific fragments and their average length (Table 1).

Table 1. Calculated number and length of peptide fragments for a hypothetical 300 residue protein. Amino acid

Specific cleavage

No. of fragments

Average length

Phe/Trp/Tyr/Leu Lys/Arg Glu LYS A% ASP Met Trp

Chymotrypsin Trypsin Glu-C Lys-c Arg-C Asp-N CNBr BNPS

54 35 20

6 9 15 16 18 18 38 60

19 17 17 8

5

An enzymatic fragmentation generates a peptide mixture which is characteristic of the substrate; itis called a ‘peptide map’ or a ‘fingerprint’ of aprotein. The HPLC chromatogram or the gel pattern of the mixture can be used for comparative studies. Limited enzymatic fragmentation is used for protein chemical studies. The identification of post-translational modification sites is often achieved by combining two different fragmentations. A first cleavage may separate a discrete region of a protein. Subsequently, a second digest generates smaller fragments, one containing the modified amino acid residue. This strategy helps to isolate a target peptide of reasonable size in order to be analysed by Edman degradation and/or mass spectrometry. Edman degradation requires peptide fragments where the modification is located close to the N-terminus, otherwise an unequivocal identification becomes even more difficult. For internal sequence analysis ca. 5-10 times more starting material is needed than for a direct N-terminal approach. This is because both the fragmentation of a protein and the isolation of peptides causes sample loss. The yield of individual peptide bond cleavages varies according to the influence of the amino acid side chains at the neighbouring positions in the protein on the protease activity. In particular, a proline residue adjacent to the cleavage site hinders proteolysis; and, for example, trypsin is restricted if Lys/Arg is followed by Glu. Side chain modifications like glycosylation can also prevent proteolytic attacks. The recovery of fragments from a membrane or a polyacrylamide gel by solvent elution gives various yields according to the solubility of the peptides and their hydrophobicity. The same is true for separation by RP-HPLC. The modern sequencers

II. 1 Chemical and Enzymatic Fragmentation

1I

routinely allow identification of 10 pmol samples. Hence, in order to analyse a peptide fragment, the amount of protein to start with must be big enough to compensate for the sample loss. All steps and manipulations required must ensure a final recovery of about 10 pmol peptide to be loaded onto the sequencer. Starting amounts of at least 50-100 pmol proteins are required even for skilled operators.

3.1 Enzymes Proteases can be divided into endo- and exoproteases (Table 2) and their ability to hydrolyze specific peptide bonds. Endoproteases attack the protein backbone, generating peptide fragments. They are mainly used for protein studies. There is a very high specificity available to cleave exclusively adjacent to charged residues. Proteases like chymotrypsin, pepsin or thermolysin cleave adjacent to several amino acid residues and yield a more extended fragmentation. Exoproteases attack the N- or C-terminal amino acids. They are of especially importance for the removal of N-terminal blocking groups like a pyroglutamate residue [Jansenss 19941 or a N-acetyl group [Farries 19911. Fragmentations vary considerably with the neighbouring amino acid residues. Additionally, applications are restricted for peptides rather than for proteins. Acylaminoacid releasing enzyme (AARE) can be used to remove N-acetylated termini. However, this enzyme cleaves only peptides up to a maximum length of ca. 40 residues [Mitta 19891. Therefore the protein must be fragmented first and the acetylated N-terminal peptide needs to be isolated (!) in order to apply an AARE digestion. The C-terminal degradation using carboxypeptidases provides the complementary information to the Edman degradation. The released amino acids have to be identified which might be difficult for small amounts. Cathepsin C cleaves repetitively dipeptides from the amino-terminus of a protein. Enzymes are identified by EC code numbers which are based on the classification worked out by the Enzyme Commission. The code number contains four elements with the following meaning: the first figure defines one of six main divisions (e.g. EC 3. are hydrolases); the second figure indicates the nature of the bond hydrolysed (EC 3.4. are peptide hydrolases); the third figure specifies the catalytic mechanism of the active center (EC 3.4.21. are serine proteinases); the fourth figure is the serial number of the enzyme in its subclass (EC 3.4.21.4 designates trypsin). A list of the enzymes most frequently used is given in Table 2. These proteases are commercially available from several suppliers as high quality products.

3.2 Practical Considerations The experimental parameters for enzymatic fragmentations are: buffer, pH, temperature, time, enzymehubstrate ratio. The buffer is chosen according to the required pH range; volatile buffer systems are preferred. For most digestions 0.1 M ammonium bicarbonate or N-ethylmorpholine buffers support an optimal milieu of about pH 8 and they can be removed by lyophilisation. Incubation at 37 "C may last for 4 h or overnight. For digestions in solution an enzymehubstrate ratio (w/w) of 1:10to 1:100is recommended. However, for digestions

18

R. Kellrier

Table 2. List of the enzymes most commonly used for protein structure analysis. Enzyme Endopeptidases Trypsin Chymotrypsin Endoproteinase Asp-N Endoproteinase Arg-C Endoproteinase Glu-C Endoproteinase Lys-C Pepsin Thermoly sin Exopeptidases Carboxypeptidase A Carboxypeptidase B Carboxypeptidase P Carboxypeptidase Y Cathepsin C Acylamino acid releasing enzyme Pyroglutamate aminopeptidase

EC. No.

PH

Accession No.

3.4.21.4 3.4.21.1

PO0760 PO0766

3.4.22.8 3.4.21.19 3.4.21.50 3.4.23.1 3.4.24.4

8.0-9.0 7.5-8.5 6.0-8.0 8.0-8.5 8.0 8.0-8.5 2.0-4.0 7.0-9.0

3.4.17.1 3.4.17.2 3.4.16.1 3.4.16.1 3.4.14.1

7.0-8.0 7.0-9.0 4.0-5.0 5.5-6.5 5.5

3.4.19.1

7.5

P19205

3.4.19.3

7.0-9.0

P286 18

X63673 PO4 188 PO079 1 A24924 PO0730 PO0732 PO0729

Primary structure not yet defined. 2, Specificity depends on incubation buffer; Asp-Xaa bonds are additionally cleaved in Yhosphate buffer. P19205 refers to pig, the commercial enzyme is from horse.

where the protein is either bound onto a membrane or is placed in the polyacrylamide matrix a ratio as high as 1:l may be used. The proteases are dissolved in concentrations of 0.1-1 pg/pl immediately prior to use. Trypsin requires calcium ions (2 mM) to stabilize the active center; pyroglutamate aminopeptidase requires a thiol compound for activation and bivalent metal ions are complexed by EDTA; the other proteases need no special additives. Autohydrolysis of the proteases is sometimes observed and care must be taken that the sequence interpretation will notify those peptides. A database search should be routinely performed for the identified amino acid sequences to overcome this pitfall. Attempts have been made at increasing resistance against autohydrolysis, e.g. by the modification of trypsin. However, autolytic fragments were still observed. For digestions of proteins bound onto PVDF or nitrocellulose a pretreatment is necessary to prevent nonspecific binding of the protease during digestion. The free surface of the membrane is masked by the adsorption of polyvinylpyrrolidone (mw: 40 kD; PVP-40) before the enzyme is added (0.5 % PVP-40 in methanol, 30 min, 37 "C). Unfortunately, the

II. I Chemical arid Enzymatic Fragrnentution

19

PVP quenching can cause a huge artefact peak in the subsequent HPLC separation. Extensive washing with water (5-10 changes) is therefore necessary to wash off as much PVP-40 as possible. However, PVP-40 can never be removed completely from the membrane and this is an inherent drawback to this procedure. To overcome the disadvantage of the PVP-40 procedure the use of hydrogenated Triton X-100 (RTX-100) has been described [Fernandez 19941. The membrane-bound protein is digested in a buffer containing 1 % RTX-100 which serves as both an elution agent for peptides from the membrane and a block for prevention of enzyme adsorption to the membrane. The detergent Tween-20 can be used to recover hydrophobic peptide fragments by adding 0.02 % to the extraction solvents [Ward 1990; Kurzchalia 19921. Of major importance is the fact that Tween-20 does not interfere with the subsequent chromatographic separation. Together with every digestion, a blank sample must be processed simultaneously - a matrix from the same origin and with identical history as the investigated protein. Either the solution of a dissolved protein, or apiece of membrane from the same blot, or a gel piece from the same gel without containing any protein should be processed identically. This helps to mark artefact peaks in the HPLC trace resulting from solvents used for extractions, from detergents or dyes, or to detect autolytic fragments from the protease used for the digest.

4 Chemical Fragmentation The chemical fragmentation of proteins is complementary to enzymatic digestions. Cleavages can be performed at amino acid residues where no enzyme is available. Reagents are nearly insensitive to salts or detergents, and fragmentations are enabled in cases when the use of proteases is restricted. A variety of chemical methods for cleavage have been reviewed [Kasper 1975; Fontana 19861. However, as early as 1975 it was stated that ‘... these methods have not been developed to the stage where they may be routinely applied in a predictable fashion to the chemical fragmentation of proteins ...’.The disadvantages are: ‘(a) low cleavage yields, (b) lack of specificity, (c) undesirable side reactions, and (d) wide variability in the reactivity of the sensitive bond’ [cited from Kasper 19751. Since then, no major improvement has been achieved. Nevertheless, a few chemical fragmentation procedures are important alternatives to enzymatic cleavages and are valuable tools for the elucidation of protein structures.

4.1 Cyanogen Bromide Cleavage Cyanogen bromide (CNBr) cleavage has become the most widely used chemical fragmentation method for proteins since its introduction by Gross & Witkop in 196 1. CNBr provides the following advantages: Met-Xaa bonds are cleaved specifically and nearly quantitatively.

20

R. Kellner

(4)

Figure 4. Cyanogen bromide cleavage.

II.1 Chenzicnl arid Eiizyinntic Frngmentntinii

21

Methionine residues are usually rare in proteins, and generally a few, large peptide fragments are obtained. The reaction is generally performed in 70 % formic acid, which is also a strong solvent, Reagent and by-products are volatile. No appropriate enzyme is available to cleave at methionine positions. CNBr is a highly toxic reagent and proper care has to be taken! It must be handled in a fume hood and has to be stored under nitrogen. CNBr may decompose during storage and then changes colour, becoming yellow - only colourless stocks should be used. A Met-Xaa bond is cleaved, converting methionine into a C-terminal homoserine residue and creating a new amino terminus NH2-Xaa. The selectivity of CNBr can be explained by the reaction mechanism (Figure 4) [Inglis 19701. The sulfur from the methionine side chain electrophilically reacts with CNBr (1) forming a sulfonium ion (2). Methyl thiocyanate is released while an intermediate imino ring is formed involving the carbonyl group from methionine (3). Then the iminolactone is hydrolyzed and the Met-Xaa bond is cleaved (4). Finally, a peptide fragment with a new amino terminus is released (5) and homoserine results in the C-terminal position. Homoserine and homoserine lactone are interconvertible and form a mixture (6). A general procedure is to use a roughly 100-fold molar excess of CNBr over the methionine residues. Either solid CNBr is added directly to the protein solution or it is dissolved in acetonitrile. The reaction is performed under mildly acidic conditions; generally 70 % formic acid is used. The mixture needs to be flushed with nitrogen and is incubatedin thedark for4 - 24 h. Then water is added (5-l0volumes) and theexcess reagent is removed by lyophilization (but remember the toxicity of CNBr and use KOH to neutralize the vapour). Repeat this step twice. A 1000-fold excess, as described in several applications, should on the whole be avoided because of additional side reactions. Also a short incubation time is favourable because additional cleavages at Trp residues occur during a prolonged cleavage. Together with two products arising from the homoserine and the homoserine lactone form of a peptide, a CNBr digest can yield quite a complex fragmentation pattern in HPLC. Problems can arise with Met-Ser and Met-Thr bonds where the hydroxyl group of the side chain may interfere with the ring opening of the iminolactone. The use of 70 % trifluoroacetic acid instead of formic acid has been reported to significantly improve these difficulties [Titani 19721. A CNBr cleavage can be performed on targets which were found to be N-terminally blocked after automated Edman degradation. For this purpose the sequencer is stopped after a few cycles and the reaction cartridge is disassembled. An aliquot of CNBr (20 p170 % formic acid containing an equivalent of a 20-fold excess of CNBr over protein, w/w) is spotted onto the glass filter containing the blocked protein. Then the sample is placed in a vacuum dessicator and left over a solution of CNBr in formic acid. After 16 h the membrane is air dried, reassembled in the sequencer and sequence analysis is allowed to continue [Simpson 19841. This procedure does not separate any resulting fragments. Therefore, this method is of use if only 1 - 3 methionines are present and/or the amino acid sequence just needs to be confirmed. PVDF-blotted proteins and polyacrylamide embedded samples may also be cleaved by CNBr. PVDF membranes are submerged in 70 5% formic acid and the reagent is added. After incubation the peptide fragments are extracted and separated [Yuen 19891.Cleavage

22

R. Kellner

in the gel slice and separation of fragments in a second SDS-PAGE [Nikodem 19791 or separation by RP-HPLC [Jahnen 19901 helps to map large fragments . The fragmentation of blotted proteins by CNBr and a subsequent tryptic digest is reported to be advantageous for the recovery of large fragments. The following strategy has been applied for internal sequence analysis of unknown proteins [Stone 19931: ( I ) CNBr cleavage of PVDF blotted proteins; (2) extraction of the fragments; (3) reduction/ alkylation of the CNBr fragments; (4) tryptic digest; ( 5 ) separation and sequence analyais of CNBr/tryptic peptides. As little as 50 pmol PVDF-blotted protein could be handled and thus prove the applicability of CNBr cleavage on the micro-level.

4.2 Partial Acid Hydrolysis Partial acid hydrolysis is useful for particular samples where other fragmentations have failed. A fragmentation of Asp-Pro bonds is what is primarily expected. This technique has not gained widespread application because of its lack of specificity. The rate of hydrolysis of peptide bonds ranges from a specific Asp-Pro cleavage to complete hydrolysis which is done for amino acid analysis. The extent of fragmentation is hard to predict. Three parameters can be varied: concentrated or diluted acid solutions, temperature, and time. Some examples [Inglis 19831 are: 11 M HCl, 37 "C, 4 days: Xaa-Ser and Xaa-Thr are preferentially hydrolyzed. 0.03 M HCl, 110 "C, 6-18 h: Xaa-Asp-Xaa are preferentially hydrolyzed. 75 % formic acid, 37 "C, 48 h: Asp-Pro is preferentially hydrolyzed. Cleavages vary considerably depending on the particular protein. However, side reactions are observed only to a small extent; amide groups can be hydrolysed and tryptophan is partially destroyed. The link of SDS-PAGE and partial acid hydrolysis has been described. Gel slices containing Ponceau S or Coomassie stained proteins were subjected to in matrix cleavages using 20 % formic acid at 112 "C for 4 h [Vanfleteren 19921. Fragments were separated by HPLC and microsequenced with starting amounts of as low as 100 pmol protein.

4.3 Hydroxylamine Cleavage of Asn-Gly Bonds Asparagine and aspartic acid residues are involved in several spontaneous processes of nonenzymatic protein modification and degradation in vivo. The intermediate formation of a five-membered succinimide ring plays a key role in all these reactions [Stephenson 1989; Geiger 19871. The side-chain carbonyl group of Asn is attacked by the a-amino group of the Cterminally neighboured residue (1) forming a succinimide intermediate (2) (Figure 5). Under the nucleophilic attack of hydroxylamine the Asn-Gly peptide bond is hydrolyzed and a mixture of a- (3) and B-aspartyl hydroxamate (4) as well as a new N-terminal Gly (5) is released. The desamidation reaction at Asn yielding a g-carboxylic acid group and the isomerization and racemization at Asp are significant reactions in protein degradation. The succinimide intermediate may undergo ring opening and form either an aspartate (6) or an isoaspartate

11.1 Chemical arid E n y t m t i c Frugmerrtation

Succinimide

23

formation

Hydroxylamine cleavage -OH-

v

+NH20H

It

R i mNH

COOH R1 -NH

+

NH2

Yo-

R2

H

( 5 )

x

CO-OH

( 7 )

(4)

Figure 5. Cleavage of Asn-Gly bonds using hydroxylamine.

24

R. Kellrier

residue (7) (isomerization). Isoaspartate residues will block an ongoing Ednian degradation. Furthermore, the racemization of the a-carbon of asparagine is possible. Small hydrophilic residues C-terminally adjacent to Asn, especially Gly, support this rearrangement due to the absence of steric hindrance. The Asn-Gly sequence occurs statistically about every 350 amino acids in a protein, and hence the cleavage of this bond will yield usually a few large fragments. Hydroxylamine was shown to be a useful reagent affecting the hydrolysation of the succinimide intermediate [Blumenfeld 1965; Bornstein 19771. For this purpose, a 1-2 M NH,OH solution is applied in an alkaline buffer (ca. pH 9- 11) at 45-67 "C. For example, O-lactoglobulin contains one Asp-Gly bond and a cleavage yield of only 5 % was reported at 45 "C?compared to 30 % at 67 "C [Saris 19831. Increasing the reaction time did not extend the cleavage. The cleavage of proteins after blotting onto glass fiber sheets by dipping the blot for 48 11 at 25 "C i n 3 solution of 1 M NH,OH adjusted to pH 11 with K,CO, is described [Bauw 19881.

4.4 Cleavage at Tryptophan Tryptophan as a relative rare amino acid is of interest for 5pecific cleavages in order to yield large peptide fragments. Two reagents are recommended to be quite specific and to produce nearly no side reactions. BNPS-skatole (bromine adduct of 2-(2-nitrophenylsulphonyl-3indolenine) [Omenn 19701 is applied in a 10-50 fold molar excess in 80 % acetic acid and the reaction runs for 24 h at 37 "C. Yields may be up to 70 %. Most experiments have been performed with concentrations of 10 mg/ml protein and applications on the microscale are rarely seen. In one example Trp cleavage is carried out by dipping PVDF blotted proteins in a BNPS-skatole solution (0.25 mg/ml in 80 % acetic acid) for 24 h at 25°C [Bauw 19881. The blot is first rinsed with water and then butylchloride to remove excess reagent. For the fragmentation by o-iodosobenzoic acid the reagent is dissolved in 80 % acetic acid, 4 M guanidine hydrochloride and first pretreated with p-cresol(20 yl per ml reaction mixture). The additive prevents the formation of o-iodoxybenzoic acid which cleaves unspecifically at tyrosine residues. A 2-3 fold excess (w/w) of reagent is then added to protein and the reaction proceeds at room temperature in the dark for 24 h. The yield of cleavage is reported to be ca. 95% [Mahoney 198I]. The fragmentations yield C-terminal lactones which can be conveniently coupled to supports for solid-phase sequencing. Several other procedures to cleave proteins at tryptophan positions with less specificity are described and reviewed by Fontana [Fontana 19861.

4.5. Cleavage at Cysteine Even though cysteine is an attractive cleavage site because of its rarity, no really useful procedure is available. Cleavage via the thiocyanate derivate using 2-nitro-5thiocyanobenzoic acid (NTCB) creates N-terminally blocked fragments and requires a Raney nickel treatment in addition [Jacobson 1973, Otieno 19781.

!I. 1 Chemical and Enzymatic Fragtnentatiorr

25

5 References Aebersold, R.H., Leavitt, J., Saavedra, R.A., Hood, L.E., Kent, S.B.H. (1987) Proc.Natl.Acad.Sci.USA 84,6970-6974. Internal amino acid sequence analysis of proteins separated by one- or two-dimensional gel electrophoresis after in situ protease digestion on nitrocellulose. Allen, G. (1989) Sequencing of Proteins and Peptides. Elsevier, Amsterdam. Amons, R. (1987) FEBS Letters 212, 68-72. Vapor-phase modification of sulfhydryl groups in proteins. Bauw, G., Van den Bulcke, M., Van Damme, J., Puype, M., Van Montagu, M., Vandekerckhove, J. (1988) J.Prot. Chem. 7,194-196.Protein-electroblottingon polybasecoated glass-fiber and polyvinylidene difluoride membranes: an evaluation. Blumenfeld, O.O., Rojkind, M., Gallop, P.M. (1965) Biocheinistry4,1780-1788. Subunits of hydroxylamine-treated tropocollagen. Bornstein, P., Balian, G. (1977) Methods Enzymol. 47,132-145.Cleavage at Asp-Gly bonds with hydroxylamine. Brune, D.C. (1992)AnaLBiochem.207,285-290. Alkylation of cysteine with acrylamide for protein sequence analysis. Cleland, W.W. (1964) Biochemistry 3,480-482. Dithiothreitol, a new protective reagent for SH groups. Cleveland, D.W., Fischer, S.G., Kirschner, M.K., Laemmli, U.K. (1977) J.Biol.Chem. 252,1102- 1106. Peptide mapping by limited proteolysis in sodium dodecyl sulfate and analysis by gel electrophoresis. Cole, E.G., Mecham, D.K. (1966) AnaLBiochem. 14,215-222. Cyanate formation and electrophoretic behavior of proteins in gels containing urea. Crestfield, A.M., Moore, S., Stein, W.H. (1963) J.Biol.Cherrz. 238, 622-627. The preparation and enzymatic hydrolysis of reduced and S-carboxymethylated proteins. Darbre, A. (1986) Practical Protein Chemistry. Wiley, Chichester. Eckerskorn, C., Mewes, W., Goretzki, H., Lottspeich, F. (1988) Eur.J.Biochem. 176,509519. A new siliconized-glass fibre as support for protein-chemical analysis of electroblotted proteins. Eckerskorn, C., Lottspeich, F. (1989) Chroinatographia 28,92-94. Internal amino acid sequence analysis of proteins separated by gel electrophoresis after tryptic digestion in polyacrylamide matrix. Farries, T.C., Harris, A,, Auffret, A.D., Aitken,A. (1991) Eur.J.Biochem. 196,679-685. Removal of N-acetyl groups from blocked peptides with acylpeptide hydrolase. Fernandez, J., Andrews, L., Mische, S.M. (1994) A one-step enzymatic digestion procedure for PVDF-bound proteins that does not require PVP-40. In: Techniques in protein chemistry V (Crabb, J.W.; ed.), pp.215-222, Academic Press, San Diego. Fiedler, K., Parton, R.G., Kellner, R., Etzold, T., Simons, K. (1994) EMBO d 13,17291740. VIP36, a novel component of glycolipid rafts and exocytic carrier vesicles in epithelial cells. Findlay, J.B., Geisow, M.J. (1989) Protein Sequencing. IRL Press, Oxford. Fontana, A. (1972) Meth.Enzymol.25,419-423. Modification of tryptophan with BNPSskatole (2-(2-nitrophenylsulfenyl)-3-methyl-3-bromoindolenine).

26

R. Kellnei

Fontana, A., Gross, E. (1986) Fragmentation of polypeptides by chemical methods. In: Practical protein chemistry (Darbre, A.; ed.), pp.67- 120, Wiley,Chichester. Geiger, T., Clarke, S. (1987) J.Biol.Chem. 262,785-794. Deamidation, isomerization, and racemization at asparaginyl and aspartyl residues in peptides. Glazer, A.N., DeLange, R.J., Sigman, D.S. (1975) Chemiccrl modification ofyroteins. North-Holland, Amsterdam. Gross, E., Witkop, B. (1961) J.Atn.Chem.Soc. 83,1510-1511. Selective cleavage of the methionyl peptide bonds in ribonuclease with cyanogen bromide. Inglis, A.S. (1983) Meth.Enzymo1. 91,324-332. Cleavage at Aspartic acid. Inglis, A.S., Edman, P. (1970) AnaLBiochem. 37,73430. Mechanism of cyanogen bromide reaction with methionine in peptide and proteins. Jacobson, G.R., Schaffer, M.H., Stark, G.R., Vanaman, T.C. (1973) J.Biol.Chem. 248,6583-6591. Specific chemical cleavage in high yield at the amino peptide bonds of cysteine and cystine residues. Jahnen, W., Ward, L.D., Reid, G.E., Moritz, R.L., Simpson, R.J. (1990) Biochem.Bioplz?;s. Res. Conz. 166,139-145. Internal amino acid sequencing of proteins by in situ cyanogen bromide cleavage in polyacrylamide gels. Jansenss, M.E., Kellner, R., Gade, G. (1994) Biochem.J., in print. The damselflies Pseudagrion inconspicuum and Ischnura senegalensis contain a novel adipokinetic octapeptide. Kasper,C.B. (1975) Fragmentation of proteins for sequence studies and separation of peptide mixtures. In: Protein sequence determination (Needleman, S.B.; ed.), pp. 1 14161, Springer, Heidelberg. Kennedy, T.E., Gawinowicz, M.A., Barzilai, A., Kandel, E.R., Sweatt, J.D. (1988) Proc.Nutl.Acad.Sci. USA85,7008-701 2. Sequencing of proteins from two-dimensional gels by using in situ digestion and transfer of peptides to polyvinylidene difluoride membranes: Application to proteins associated with sensitization in Aplysia. Kurzchalia, T.V., Dupree, P., Parton, R., Kellner, R., Lehnert, M., Simons, K. (1992) J. CellBioZ. 118,1003-1014.Vip21, a 2 1 KD membraneproteinis anintegral component of the Trans-Golgi network-derived transport vesicles. Kwong, M.Y., Harris, R.J. (1994) Protein Sci. 3,147-149. Identification of succinimide sites in proteins by N-terminal sequence analysis after alkaline hydroxylamine cleavage. Landon, M. (1977) Meth.Enzymol. 47,145- 149. Cleavage at Aspartyl-Prolyl bonds. Mahoney, W.C., Smith, P.K., Hermodson, M.A. (198 1) Biochemistry 20.443-448. Fragmentation of proteins with o-iodosobenzoic acid: chemical mechanism and identification of o-iodoxybenzoic acid as a reactive contaminat that modifies tyrosyl residues. Matsudaira, P.T. (1987) J.Bio1. Chenz. 262,10035-10038.Sequencefrom picomole quantities of proteins electroblotted onto polyvinylidene difluoride membranes. Matsudaira, P.T. (1993) A practical guide to protein and peptide purification .for nzicrosequencing. Academic Press, San Diego. Mitta, M.,Asada, K., Uchimura, Y., Kimizuka, F., Kato, I., Sakiyama, F., Tsunasawa, S. (1989) J.Biochem. 106,548-551. The primary structure of porcine liver acylamino acid-releasing enzyme deduced from cDNA sequences. Neugebauer, J.M. (1990): Detergents: An overview. In: Protein purification (Deutscher, M.P.; ed.), pp.239-253, Academic Press, San Diego.

11.1 Chemical arid Enzymatic Fragmentation

21

Nikodem, V., Fresco, J.R. (1979) AnnLBiochein. 97,382-386. Protein fingerprinting by SDS-gel electrophoresis after partial fragmentation with CNBr. Omenn, G.S.,Fontana,A.,Anfinsen, C.B. (1970)J.Biol.Chem. 245,1895-1902. Modification ofthe single tryptophan residue of Staphylococcalnuclease by a mild oxidizing agent. Otieno, S, (1978) Biochemistry 17,5468-5474. Generation of a free a-amino group by Raney-Nickel after 2-nitro-5-thiobenzoic acid cleavage at cysteine residues: application to automated sequencing. Pluskal, M.G., Przekop, M.B., Kavonian, M.R., Vecoli, C., Hicks, D.A. (1986) Biotechniques 4,272-283. Immobilon PVDF transfer membrane: a new membrane substrate for western blotting of proteins. Raftery,M.A., Cole, R.D. (1966) J.Biol.Chem. 241,3457-3461. On the aminoethylation of proteins. Riviere, L.R., Fleming, M., Elicone, C., Tempst, P. (1991) Study and applications of the effects of detergents and chaotropes on enzymatic proteolysis. In: Techniques in protein chemistry II (Villafranca, J.J.; ed.), pp. 171-179, Academic Press, San Diego. Riiegg, U.T., Rudinger, J . (1977) Meth.Enzymo1. 47,111-116. Reductive cleavage of cystine disulfides with tributylphosphine. Saris, C.J., Van Eenbergen, J., Jenks, B.G., Bloemers, H.P. (1983) AnaLBiochem. 13234-67. Hydroxylamine cleavage of proteins in polyacrylamide gels. Simpson, R.J., Nice, E.C. (1984) Biochem.lnt. 8,787-791. In situ cyanogen bromide cleavage of N-terminally blocked proteins in a gas-phase sequencer. Stephenson,R.C., Clarke, S. (1989) J.Bio.Chem. 264,6164-6170. Succinimide formation from aspartyl and asparaginyl peptides as a model for the spontaneous degradation of proteins. Stone, K.L., McNulty, D.E., Lopresti, M.L., Crawford, J.M., DeAngelis, R., Williams, K.R (1992) Elution and internal amino acid sequencing of PVDF-blotted proteins. In: Techniques in Protein chemistry III (Angeletti, R.H.; ed.), pp.23-34, Academic Press, San Diego. Stoscheck, C.M. (1990) Meth.Enzymo1. 182,50-68. Quantitation of protein. Tempst, P., Link, A.J., Riviere, L.R., Fleming, M., Elicone C. (1990) Electrophoresis 11,537-553. Internal sequence analysis of proteins separated on polyacrylamide gels at the submicrogram level: improved methods, applications and gene cloning strategies. Titani, K., Hermodson, M.A., Ericsson, L.H., Walsh, K.A., Neurath, H. (1972)Biochemistry 11,2427-2435. Amino acid sequence of thermolysin. Isolation and characterization of the fragments obtained by cleavage with cyanogen bromide. Vanfleteren, J.R., Raymackers, J.G., Vanbun, S.M., Meheus, L.A. (1992) Biotechniques 12,550-557. Peptide mapping and microsequencing of proteins separated by SDSPAGE after limited in situ acid hydrolysis. Ward, L.D., Reid, G.E., Moritz, R.L., Simpson, R.J. (1990) Peptide mapping and internal sequencing of proteins from acrylamide gels. In: Current research in protein chemistry: techniques, structure, and function (Villafranca, J.J.; ed.), pp. 179-190, Academic Press, San Diego. Yuen, S.W., Chui, A.H., Wilson,K.J., Yuan, P.M. (1989) Biotechniques 7,7443. Microanalysis of SDS-PAGE electroblotted proteins.


11.2

Microseparation Techniques I: High Performance Liquid Chromatography Maria Serwe and Helmut E. Meyer

1 Introduction Since the commercial introduction of high-performance liquid chromatography (HPLC) in the 1970s this versatile method has gained wide popularity, resulting in a phenomenal increase of successful applications described in the literature. What sets HPLC apart from the classical, low-pressure chromatographic systems which use soft-gel is the adoption of micron-sized particles of high mechanical strength as suppoi-ts for column packing materials, thus enabling liquid flow through the column at high pressure. As HPLC is not so much a new technique but an advancement in the design of chromatographic systems, all common chromatographic modes by which peptides and proteins are fractionated on the analytical as well as preparative scale are applicable. In all cases except size-exclusion chromatography, peptides and proteins are chromatographically resolved by a surface mediated process. The differential adsorption of peptides and proteins on the surface of the packing material (i.e. the stationary phase) depends on the sample characteristics and the mobile phase strength. The various chromatographic modes differ according to the way the sample is adsorbed. Size-exclusion chromatography [Chicz 1990; Mant 1991b1, also known as gel chromatography, discriminates between molecular species on the basis of size (or better hydrodynamic volume) due to differential permeation into matrices of controlled porosity. Large molecules which cannot enter the pores pass through the column unretained and are eluted first. Small molecules totally permeate the liquid volume within and between the particles and are eluted last. Mid-sized molecules will selectively permeate the pores depending on their relative size, and be eluted with a retention time between the two extremes. Size-exclusion chromatography may be used with a variety of mobile phases in near physiological conditions. Ion-exchange chromatography [Chicz 19901 resolves proteins according to accessible surface charges and their corresponding electrostatic interaction with surface-bound negatively charged (cation exchange) or positively charged (anion exchange) moieties. Displacement of the protein from the stationary phase is achieved at constant pH with increasing ionic strength of the mobile phase. Sodium chloride is the most commonly used

R. Kellner, F. Lottspeich, H. E. Mejer (1994) Microcharncterizatiorl of Proteins, VCH Weinheirii

30

M. Serwe and H.E. Meyer

displacing salt, its concentration depends on the strength of interaction between the protein and the stationary phase. Affinity chromatography [Chicz 1990; Mant 1991bl is based on the bioaffinity of n protein for a specific ligand coupled to a solid support. The immobilized ligand interacts only with proteins that can selectively bind to it, whereas others are eluted unretained. The retained protein can later be released in a purified state. Partition chromatography [Chicz 1990; Mant 1991b] depends on the partitioning of the sample between the stationary phase on the solid support and the mobile phase that flows freely down the column. This chromatographic mode is termed "normal phase" or hydrophilic interaction chromatography (HILIC) if the stationary phase is more polar than the starting mobile phase. The elution order is generally related to the increasing hydrophilic nature of the sample. The more soluble a sample is in water (or the more hydrophilic), the slower it will be eluted. Vice versa, the more hydrophobic the sample, the faster it will be eluted. The elution of sample molecules in order of increasing hydrophilicity is accornplished by decreasing the concentration of organic modifier in the mobile phase. The most popular support is microparticulate silica gel. In contrast to this, partition chromatography is termed "reversed-phase'' if the starting mobile phase is more polar than the stationary phase. The support is also silica, but in contrast to "normal phase" the silanol groups are chemically derivatized with organosilanes. such as octadecyl. When peptides or proteins are eluted with increasing organic solvent in order of increasing relative surface hydrophobicity, this mode is called reversed-phase chromatography (RPC). Compared to it, separations carried out with descending salt gradients at neutral pH are termed hydrophobic interaction chromatography (HIC). RPC and HIC both involve hydrophobic interactions between the solute and the stationary phase. What distinguishes these methods besides the eluting solvents are the ligand density of the stationary phase (in HIC approximately one-tenth of that used in RPC) and the fact that HIC is used for the separation of proteins in their native state, while harsher elution conditions in RPC tend to disrupt protein tertiary structure. This is especially true for complex, multiinteraction enzymes, whereas stabilized or crosslinked proteins as well as small peptides are less likely to lose biological activity. However, biological activity may be retained through proper chromatographic conditions or may be regained by postchromatographic treatment. RPC, which has almost become synonymous with HPLC, is one of the most powerful separation techniques for peptides and proteins on the analytical as well as preparative scale. Besides speed and efficiency, the great variety in choice of mobile and stationary phase as well as column diameter account for this prevailing position. In multi-dimension chromatography RPC is widely used as the last step prior to microsequencing or mass spectrometry; e.g. two-dimensional HPLC purifications are common where a crude protein extract or complex protein digest is fractionated on an ion-exchange column, followed by further purification of each fraction by injection and gradient elution on an RP column. The successive use of two RP columns with different packings (or the same packing with different solvents) is also wide-spread in the separation of complex mixtures of peptides (e.g. enzymatic digests). RPC on micro-columns provides purified fractions (free of compounds that interfere with sequence analysis) in small volumes suitable for microsequencing. The two following paragraphs are designed to give an overview of both theoretical and practical aspects of RPC (including applications leading to succesful microsequencing).

11.2 HPLC

31

2 Getting Started Suggestions presented here derive from our own experience with RPC and manufacturers’ recommendations, as well as from several excellent articles dealing with practical and theoretical aspects of RPC [Chicz 1990; Dolan 1991; Mant 1991a,b; Nugent 1991a,b].

2.1 Solvents Mobile phase purity is a particularly important consideration in the RPC of peptides and proteins: e.g., dust or other particulate matter plug columns; air-bubbles in the pump result in erratic low flow rates and pressure fluctuations; impurities and air-bubbles in the detector cell cause baseline noise or artificial peaks. Therefore, all solvents (water, organic modifier) and additives (e.g., buffer salts) should be of HPLC grade or of the highest quality available. All solutions used should be filtered through a 0.2 pm membrane filter and thoroughly deaerated prior to use (continuous helium sparging is recommended when using reciprocating pumps). An in-line solvent filter between the reservoir and the pump is mandatory to improve solvent reliability. The mobile phase is often a good microbial growth medium, especially when using acetate buffers (which provide a good carbon source) and/or low organic solvent content; thus, use of fresh mobile phase each day is recommended. A major part of the excellent resolving power of RPC is derived from the availability of ion-pairing reagents. Peptides are charged molecules at most pH values and the presence of different counterions will influence their chromatographic behaviour. TFA is the anionic ion-pairing reagent in the most popular mobile phase system: A: 0.1 % (v/v) TFA in water B: 0.085 % (v/v) TFA, 84 % (v/v) acetonitrile in water The low pH (pH 2) of this unbuffered solution ensures protonation of carboxyl groups thereby increasing the interaction of peptides with the reversed-phase sorbent. TFA, an excellent solvent for most peptides, is completely volatile and enables detection at wavelengths below 220 nm due to its high UV transparency. As the dielectric constant of the solvent changes withincreasing organic modifier concentration, the absorption spectium of TFA shifts, resulting in an upward baseline drift. Most of this difference can be compensated for by adding 15 % (v/v) more TFA to the water reservoir. If the presence of TFA is not sufficient to resolve a particular mixture of peptides efficiently, better results may be achieved through use of more hydrophilic (e.g., orthophosphoric acid) or a more hydrophobic (e.g., heptafluorobutyric acid) anionic ion-pairing reagent. The use of acetonitrile as organic modifier takes advantage of its low viscosity (resulting in low back pressure), high UV transparency at low wavelengths and high volatility (easy removal from peptide containing fractions). Alcohols, particularly isopropanol and methanol, are occasionally used for the separation of very hydrophobic or very hydrophilic proteins, respectively. If gradient systems of higher pH are needed, the following buffered solutions are convenient:

32


A: 10 mM ammonium acetate, pH 6 B: 10 mM ammonium acetate, 84 % (v/v) acetonitrile and A: 0.2 % (v/v) hexafluoroacetone (HFA)/NH,, pH 6 or 8.6 B: 0.03 % (v/v) HFA/NH,, 84 % (v/v) acetonitrile. The pH of these solvent systems is raised above the pK for carboxyl groups: thus, peptides containing aspartate or glutamate residues elute earlier than in the TFAhcetonitrile system (Figure 3). Besides, the ammoniumacetate/acetonitrile gradient system provides resolution of peptide species which differ in their phosphorylation state (Figure 4).

2.2 Pump Peptide separations on micro-columns require HPLC hardware that delivers reproducible gradients at low flow rates (50 pl/min or less). Without further equipment, such gradients can be achieved only with HPLC systems that employ high-pressure micro-syringes. They deliver almost pulseless flow and meet the dead-volume requirements for micro-columns. Solvent delivery must be pulse free to limit the detector baseline noise and damage to columns. Drawbacks of syringe pumps are the need of refilling, the compression of liquids, which disturbs flow, and the large volume. Reciprocating pistons have a small stroke, making solvent changes rapid and accurate, and provide fast flow rate change. A major drawback, of course, is pump pulsation, which excludes generation of constant low flow rates needed for micro-columns. Wettable components of the pumping system must be chemically resistant to common mobile phases. Many of the older HPLC instruments have stainless steel pumping systems that will corrode in the presence of halide under acidic conditions. Titanium and very resistant plastics or ceramics have been used to fabricate pumping systems that are stable in both acid and base. These systems are generally more expensive than the stainless steel one, but give an added measure of confidence. The valves and the pump seals are the parts of apump most likely to give problems caused by bubbles, dirt and normal wear. The pump seal does not seal completely around the piston, so the piston is damp behind the seal. To prevent abrasive damage of the pump seal and the piston resulting from crystalline buffer residues, it should be flushed behind the pump seal. Many HPLC systems have flushing ports at the rear of the pump head for this purpose allowing continuous flushing with water during the HPLC run. Convenient mixing of the solvents without disturbing the gradient is required at all flow rates. Dynamic mixers are the most versatile hut have the disadvantage of introducing additional dead volume into the pumping system. Static mixers can be of lower volume than dynamic mixers but high-viscosity solutions are not mixed well. The volumetric flow rate and volume of the mixer and tubing connecting the mixer to the column determine the delay time between the start of a gradient program and actual solvent delivery to the head of the column. Replacing the mixer in a pump by a smaller one meets the needs for micro-HPLC, where long overall run-times caused by low flow rates are common.

11.2 HPLC

33

Solvent gradients should begin with at least 3-5 % organic modifier and should not exceed 95 %. The extremes of solvent composition should be avoided if possible as mobile phase mixing is most difficult in this range. Besides, it is difficult to totally remove organic solvent from the stationary phase when equilibrating with water and this results in very long equilibration times.

2.3 Pre-column Split Capillary columns (e.g., 320 pm internal diameter, I.D.) require low flow rates (ca. 1-5 pl/ min). Complementary equipment of conventional HPLC hardware with a pre-column-split device is necessary to reduce the flow rate; e.g., preinjection solvent split is possible by diverting most of the solvent, via a tee, through a balance column [Moritz 19931. Now a microflow processor (LC Packings, Amsterdam, The Netherlands) is available that copes with changes in viscosity during gradient elution without the need for packed restrictor columns [Chervet 19921.

2.4 Sample Preparation All the material to be injected should be in solution and free fromparticulate matter. Sample preparation techniques removing as much interfering material as is practical should be applied. When capillary columns are used, filtration of the sample through a 0.2 pm disposable membrane filter is recommended [Chervet 19921. Surfactants are generally harmful to polypeptide separations and should be avoided. Even traces of SDS in a peptide sample reduces separation efficiency and peptide recovery during RPC. Formic acid should be used cautiously because it shortens column lifetime and harms HPLC pumps. The sample solvent must be compatible with the mobile phase being used, so that the sample or buffer does not precipitate in the pores of the column packing. To prevent further column problems, it is important to optimize the sample concentration for proper column loading. Sample components can gradually coat the window of the detector cell thus decreasing the sensitivity and increasing the background noise.

2.5 Injector Samples are most commonly introduced into the RPC column via a manual injector. Autosamplers are not convenient for single injections of important samples. The loop should be of inert material like polyetheretherketone (PEEK) or titanium. The loop volume should be adapted to the sample volume and not be fully exploited, to prevent sample loss.

34


2.6 Tubings When biocompatible conditions are essential, connective tubings made of Teflon, PEEK, fused silica or titanium are employed instead of stainless steel ones. Conventional compression fittings are replaced by fittings that can be finger-tightened. To reduce extracolumn dispersion, all tubing through which the sample passes should be of 0.1 mm I.D. andin as short length as is convenient. Larger I.D. (e.g., 0.25 mm) tubing is appropriate for other system connections (e.g., connecting the pump to the injector), but lengths should be kept as short as possible in order to minimize the gradient delay volume.

2.7 In-line Filter, Guard Column An in-line filter (0.5 pm porosity) and/or a guard column between the injector and the main RPC column will help prevent blockage of the column inlet frit with particulates from the sample, mobile phase or from instrument wear (e.g., pump seals and injector rotors). The use of a guard column is strongly recommended when working with complex samples, to remove materials that are irreversibly bound to the column packing or are so large that they block either the column frit or the packing bed. The best guard column to use is one that is prepacked with the exact same material (same support, particle size, pore size, and bonded phase) as the separation column. However, in-line filter and guard columns provide the risk of irreversible sample adsorption. This can lead to the complete loss of the sample, which is especially true for low amounts of protein. As samples are often the result of weeks or even months of work, the risk of losing this sample or of damaging the analytical column when using it without in-line filter or guard column must be carefully considered.

2.8 Analytical Column Alkylsilane-derivatized silica media, characterized by the type of hydrophobic ligand, have been the dominant choice for RPC of peptides and proteins. Typical ligands are n-octadecyl (C18), n-octyl (C8), n-butyl (C4), and phenyl. Several parameters are to be considered in the choice of the packing: While peptides are capable of intercalating the stationary phase of the packing, proteins have restricted access due to their large size and are assumed to interact at the outer surface. The alkyl chain length most effective for proteins and more hydrophobic samples is therefore in the shorter range (C4 to C8), while longer chains (C8 to CI 8) are normally used for small peptides as well as for more hydrophilic samples. In general, retention time increases together with chain length. The C18 phase is especially recommended for mapping enzymatic digests. An alternate selectivity for polypeptides, particularly those containing aromatic sidechains, the diphenyl phase is offered (similar in hydrophobicity to C4).

To ensure adequate mass transfer, it is recommended to use macroporous supports with a pore diameter between 300 and 1000 for protein separations. This allows total

11.2 HPLC

35

penetration into the pores, thus maximizing available sorbent surface area for proteinsupport interaction. In the separation of peptides, a pore size of 700 A has become most popular. Nonporous media are acceptable for analytical separations, where speed is a major issue and small samples may be used. As with small molecules, theory predicts that performance should go up as particle size goes down. To achieve maximum resolution in analytical separations, the use of particles of 5 pm or less is reasonable. 2-3 pm particles, however, perform higher column back pressure, shorten column life and cause greater susceptibility to plugging. For preparative applications, 10 - 20 pm media are often used because the resolution of 5 pin sorbents deteriorates quickly when they are overloaded.

When working with silica-based columns, most manufacturers recommend using a mobile phase pH between 2 and 7.5, as silica will slowly dissolve and greatly shorten column life at higher pH. Polymer-based columns or the newer stabilized silica columns offer a superior pH-stability. Peptide or protein retention times generally decrease with increasing temperature, due to increasing solubility of the sample in the mobile phase. In addition, due to a more rapid transfer of the sample between the stationary and mobile phase, improved peptide resolution happens. Furthermore the viscosity of the mobile phase will be reduced, resulting in lower back pressure. But in general, room temperature is adequate for peptide and protein separations. High temperature runs are important for the use of new supports with smaller particle sizes (1 to 3 pm) for increased resolution and for the application of non-porous supports for ultrafast performance. Flow-through or perfusive-particle chromatography is a powerful technique for the very rapid separation of peptides and proteins. Columns are packed with derivatized porous polystyrene divinylbenzene and operate at very high flow rates (e.g., 5-10 ml/min for a 4.6 mm column; 40-100 pl/min for a 320 pm I.D. fused silica capillary). Like conventional columns, sample is transported to the surface of the particle by convective flow. The particles, however, contain two types of pores, so-called throughpores, which allow convective flow, and diffusive pores, which provide high adsorptive surface area. At high flow rates, convection dominates diffusion and allows direct access to the high surface area of the diffusive pores. Thus, rapid perfusive transport is made possible and column resolution and capacity maintained independent of flow rate [Kassel 19931. Use of non-porous RP packings, made of monodisperse, spherical silica particles with a mean diameter of 2 pm, eliminates the intraparticulate diffusion that is responsible for band-broadening,losses in efficiencyand resolution. As the solid core of these micropellicular packings is fluid-impenetrable, the columns are generally more stable at high pressures and elevated temperatures than conventional packings. Since the loading capacity of the column is a function of the surface area of the packing, this approach is not advantageous for preparative applications [for references see Kitagawa 1989; Cox 19931.

36


Conventional 4.6 mm I.D. columns were quite adequate for most proteidpeptide purification problems until the advent of microsequencing, which allowed primary sequence information to be obtained from as little as 5 pmol of polypeptide. Thus, use of columns requiring less sample load but maintaining similar peak detection became reasonable. Narrow- (2 mm I.D.) and microbore (0.5-1 mm) columns are now commonly used for peptide separations. While most modern HPLC hardware is compatible with microbore columns, the use of fused silica columns for capillary (0.1-0.5 mm I.D.) or nano (50-100 pm I.D.) HPLC requires special devices to reduce flow-rate and enhance detectability of low sample amounts. Table 1 provides an overview of the columns suitable for separation of peptides and proteins for microsequencing, with typical flow rates for the respective column I.D. and sample load capacities.

Table 1. I.D. (mm)

Flow rate (pl/min)

Capacity (mg)

Conventional LC

4.6

400 - 2000

1.4 - 10

Narrowbore LC

2

50 - 400

0.25 - 1.8

Microbore LC

1

20- 100

0.04 - 0.3

Capillary LC

0.1 - 1

1 - 100

< 0.04

Nan0 LC

0.05 - 0.1

0.1 - 1

< 0.01

For the separation of proteins the internal diameter is an important column parameter. Sensitivity or minimal detectable quantity varies inversely with column diameter. The effect can be understood as a scaling down of volumes, particularly peak volumes, while holding the mass constant. Thus, compounds are eluted in smaller volumes, resulting in increase in peak concentration giving greater detector response. This enhances detection limits in sample limited situations. Lower flow rates used with smaller I.D. columns demand less solvent (causing less waste). Together with the higher concentration this circumvents the need for subsequent concentration steps when preparing samples for further analysis, because volumes are typically 60 pl or less. For protein separations flow rate and column length do not play an important role because they adsorb to and are displaced from the RP surface by the solvent gradient once and do not interact appreciably with the surface after displacement. In fact, long columns together with low flow rates often yield poor recovery of proteins due to the long residence time the protein spends on the column. Therefore short (20-50 mm) columns are recommended for the separation and purification of proteins. In contrast to proteins, peptides chromatograph by a mixture of adsorption and partitioning effects which lead to

11.2 HPLC

37

a positive effect on column length and flow rate on resolution. In general, the lowest flow rate consistent with near maximum resolution should be used. For peptide mixtures (e.g. fragments from enzymatic digests) therefore a 10 cm column should be applied. As RPC columns are expensive, precautions should be taken in order to extend their life. A proper application protocol (e.g., solvent used, pressure at the usual flow rate, notes about peak resolution) will help to quickly notice column deterioration. Besides it is useful to measure column aging by separation of a mixture of synthetic peptides under standard conditions and comparing records with special regard to peak retention time, peak shape and height. Sudden deterioration by blockage of the frit at the head of the column (resulting in significant rise in back pressure) or a void in the packing bed (peak broadening or splitting) can be quickly repaired. When strongly retained compounds are acccumulated on the column (rise in backpressure, deterioration of sample resolution) an effective cleaning procedure (gradient elution from 0.1 % (v/v) aqueous TFA to 0.1 % (v/v) TFA in isopropanol with repetetive gradients) should be applied. If deteriorations are permanent and cannot be cured, the column must be replaced. Storage should be in a high concentration of organic solvent, the columns are then cleaned as well after each day' s use. For long term storage, 100% methanol is recommended by many manufacturers. Prior to the first run performed on a previously stored column, a gradient run up to 100 % B in the absence of sample (i.e., a "blank" run) should be carried out. In fact, even before the blank run the column should be subjected to a rapid gradient wash (e.g., 100 % eluent A to 100 % B in 15 min). This will serve to remove any impurities from the column that may have accumulated during storage.

2.9 Elution Gradient elution is virtually mandatory when chromatographing peptides and proteins. The change in eluent strength may be continuous (linear gradients) or stepwise according to the degree of resolution required. The majority of peptides are eluted from reversed-phase columns prior to a 50 % acetonitrile concentration, optimum peptide resolution is generally obtained between 15 % and 40 % concentration of acetonitrile in the mobile phase.

2.10 Detection Detection of peptides during HPLC is generally based on peptide bond absorbance at low ultraviolet (UV) wavelengths. The absorption maximum for the peptide bond is actually ca. 187 nm but detection below 200 nm can suffer from interference due to impurities present in buffers, solvents or sample. Thus, the common use of 214 nm as detection wavelength is a good compromise between detection sensitivity and potential detection interference. In addition to peptide bond absorbance, the aromatic side chains of tyrosine, phenylalanine and tryptophan absorb light in the 250 to 290 nm UV range. Thus, using a diode array detector or a multiwavelength detector allows the identification of peptides containing these amino acids during peptide isolation. For the above reasons, UV detectors are the most widely used LC detectors with protein and peptide samples. Certain components, including tryptophan-containing proteins,

38


fluoresce under UV illumination, and so fluorescence detections can provide higher sensitivity and selectivity than absorbance detection, particularly in the analysis of proteins from complex samples. The detector cell volume must be minimized to prevent postcolumn mixing, band broadening and collection errors due to a delay of several seconds before the peak leaves the detector. Toprevent loss in sensitivity caused by the small cell volume (Le., short optical path length), the use of ultrasensitive UV flow cells, e.g., Z- or U-shaped flow cells with longer optical path, is necessary [Chervet 19891.

2.11 Fractionation Automatic peak fractionation causes problems at low rates. When more than one peak elutes during the time when the peak detector is counting down the transit time from the UV detector to the fraction collector, and hence the peaks are mixed together [Stone 19931. Therefore, fractions should be hand-collected in small plastic cups to assure optimal peak pooling. However, as this procedure is time consuming, better automatic fractionation devices are imperative. The use of micro- or even smaller columns and low flow rates enables collection of peptide peaks in a volume of several microliters. This minimizes the losses from adsorption to the collection tube observed when a low amount of sample is collected and stored in a larger volume typical of conventional columns. After collecting, if fractions are not directly applied to the sequencer, rapid freezing (-80 "C) is strongly recommended. 1oc

l%B1

01 AUF

214 nm

\

50

I

1-40

50

,

30

Time [rninl

20 I

I

\

0

10

Figure 1. Separation of peptides from myosin-light-chain kinase [Meyer 19871. 1.S nmol of the digested kinase was applied onto a 5 pm Vydac C18 RPC column (4.6 x 150 mm). Separation was performed using a gradient of 0.1 % TFA (solvent A) vs. 84 9% acetonitrile/ 0.08 % TFA as indicated (solid line) at a flow rate of 1 ml/min.

11.2 HPLC

39

3 Applications If not otherwise stated, peptides were detected by their absorbance at 214 nm, setting the sensitivity to absorption units at full scale (AUFS) as indicated. Solvent data in percent mean v/v. For further data on instrumentation see references. Figure 1 [Meyer 19871shows the separation of peptides from Staphylococcus aureus V8 proteinase digested myosin-light-chain kinase from rabbit skeletal muscle. Some fragments were subjected to sequence analysis without further (re-)chromatography. If peptides from enzymatic digests are not well separated in the first chromatography, a second, tandem-arranged RPC can achieve optimum resolution. Figure 2 [Serwe 19931 depicts the elution of peptides gained from Lys-C digestion of the regulatory light chain (LC25) of obliquely striated muscle myosin from earthworm, Lumbricus terrestris. Fraction 32 was subjected to a second RPC (Figure 3) carried out in a different solvent system (HFAAVHJacetonitrile) on the same column, succeeding in the separation of two purified peptides further analysed by sequence analysis. As the pH of the solvent system (pH 6) is raised above the pK for carboxyl groups (i.e. pH 4.5), peptides containing aspartate or glutamate residues elute earlier according to their acidic character. Fraction 33 in Figure 2 was injected onto a Vydac-phenyl column (2.1 x 50 mm). Separation was achieved with a gradient of TFA/acetonitrile to receive better resolution for a peptide containing tryptophan (data not shown). Diphenyl offers an alternate selectivity to aliphatic reversed-phases, particularly for those containing aromatic sidechains. Rechromatography on a smaller I.D. column is generally useful to minimize sample loss and increase sample concentration. 10 2 AUK-

i

10.5AUK-

'100

/' /'

3

50

N

Q

-> QJ

0 Ln

/

10

20

30

40

50

60

70

00

0

Time lmin)

Figure 2. Separation of endoproteinase Lys-C peptides from 10 nmol LC25 [Serwe 19931. Chromatography was carried out as described in the legend to Figure 1 using a gradient as indicated (broken line). Fluorescence of tryptophan-containing peptides was measured at 285 nm excitation and 340 nm emission (punctuated line).

40

L

M. Serwe and H.E. Meyer Solvent B i%J

A U F S ,214nm 01

0 05

/

/

/

/

/

/’

0

L

20

10

30

40 O

t (mid

Figure 3. Re-chromatography of fraction 32 from the separation of Lys-C peptides from LC25 (Figure 2). 800 pmol peptide was injected onto a 5 pm Vydac C18 RPC column (4.6 x 150 mm). Elution was performed applying a gradient system consisting of 0.2% HFMNH3, pH 6 vs. 0.03% HFA/NH3, 84% acetonitrile as indicated (broken line); the flow rate was 1 ml/min. The designated fractions were subjected to sequence analysis.

A convenient solvent of higher pH is the ammoniumacetate/acetonitrile gradient system at pH 6. As shown in Figure 4 it provides resolution of peptides only varying in the phosphorylation state of one serine residue. A synthetic peptide of 19 amino acid residues derived from the rabbit skeletal muscle phosphorylase kinase a subunit sequence (provided by Dr. Chen, Research Institute of Occupational Medicine at the Ruhr University of Bochum), was phosphorylated in vitro by a recombinant mitogen-activated protein kinase (provided by Dr. E. Mandelkow, Research Unit for Structural Molecular Biology, MaxPlanck Gesellschaft, Hamburg). The more hydrophilic phosphorylated peptide eluted about 2 min earlier than the unphosphorylated one. -

100

-

s

I

m

50

a,

0

v)

0

6

10

20 Time [minl

30

Figure 4. Separation of a synthetic peptide from its phosphorylated form. An aliquot of aphosphorylation mixture containing 1 nmol peptide was applied onto a 5-pm SGE C18 RPC column (2 x 250 mm). Separation was performed in a gradient system consisting of 10 mM ammoniumacetate, pH 6 (solvent A) and 10 mM amnioniumacetate, 84 % acetonitrile (solvent B) as indicated (broken line). The flow rate was 80 p1/ min. P designates the phosphorylated peptide.

11.2 HPLC

41

A solvent system with even higher pH is provided by the HFA/NH3/acetonitrile system at pH 8.6. Figure 5 [Meyer 19901 illustrates the separation of endoproteinase Lys-C peptides of self-phosphorylated phosphorylase kinase a subunit from rabbit skeletal muscle. Fractions containingphosphoserine were further purified on RP material employing the system described in Figure 1. Elution conditions in RPC (e.g., low pH, high organic solvent content) are not convenient for the isolation of proteins in their native state. This is especially true for complex, multiinteraction enzymes. A positive effect of the tertiary structure disrupting conditions of RPC is shown in Figure 6 [Weber 19891. Native phosphorylase kinase from rabbit skeletal muscle is rapidly resolved into individual subunits by RPC in TFA/acetonitrile. Figures 7 [Chervet 19891and 8 [Chervet 19921depict applications of 320 pm I.D. fused silica columns to separate proteins or peptides, respectively. Microgradients were produced by a microflow processor; detection was performed with a 20-mm Z-shaped flow cell. These special hardware devices (microflow processor, ultrasensitive flow cell and capillary columns from LC Packings, Amsterdam, The Netherlands) are also applied for the separation of a tryptic digest of human growth hormone on a 75 pm I.D. column shown in Figure 9 [LC Packings, supplemental material, 19941.

w

Time I m i n )

Figure 5. Separation of endoproteinase Lys-C peptides of phosphorylase kinase a subunit [Meyer 19901. The Lys-C digest from 20 nmol isolated a subunit was injected onto a pHstable RPC column (4.6 x 250 mm, Vydac 228TP104). A solvent system consisting of 0.2 % HFA/NH3, pH 8.6 (solvent A) and 0.03 % HFA/NH3/84 9% acetonitrile (solvent B) was employed as indicated at a flow rate of 1 ml/min. Aliquots of 100 pl were taken from each fraction to determine the S-ethyl-cysteine content after conversion of the phosphoserine residues. The open bars represent the amount of S-ethyl-cysteine found. The different species of a phosphoserine-containing peptide are numbered I - I11 according to their elution position.


42

100

-

8 Y

m 50

t; a, > 0

v)

.

20

30

0

10

Time [minl

0

Figure 6. RPC of holophosphorylase kinase [Weber 19891.1mgoftheprotein wasapplied onto a 10 pm Vydac C4 RPC column (20x 50 mm). Elution was carried out using a gradient system consisting of 0.1 % TFA vs. 0.085 9% TFA, 84 % acetonitrile as indicated (solid line). The flow rate was 7.8 ml/min.

2

1

I

e I

0

I 5

I

I

10

15

1

20

1

25

I J O h

Figure 7. Separation of proteins by using a micro gradient [Chervet 19891.A mixture of 5 proteins (1 = ribonuclease; 2 = insulin; 3 = cytochrome C; 4 = lysozyme; 5 = bovine serum albumin; 60 ng per compound) was injected onto a 5-pm fused silica capillary column(320pmi.d.x30cm,C18,3OO~;LC Packings). Separation was carried out applying a gradient with the solvents A: 0.1% TFA and B: 0.1 % TFA, 95 % acetonitrile (49 % to 70 % B in 30 min) at a total flow rate of 4 pl/min (generated with a microflow processor, LC Packings). Temperature was set to 20 "C. Detection was performed at UV 220 nm with a 20 mm Z-shaped flow cell (LC Packings), range 0.2 AUFS.The picture was kindly provided by Dr. J.P. Chervet, LC Packings.

11.2 HPLC

Figure 8. Separation of a tryptic digest of S-lactoglobulin [Chervet 19921.The digest was applied onto a3-pm capillary column ( 5 cm x 320 pm, C18; LC Packings). Separation was performed using premixed mobile phase (5:95 acetonitrile-water) withO.1 % TFA(A) vs. 0.08 % TFA, 80 % acetonitrile. The linear gradient went from 5 to 65 % B in 45 min at a flow rate of 5 pl/min (generated by a microflow processor, LC Packings). Dual channel U V detection at 214 and 280 nm with a 20mm Z-shaped flow cell (LC Packings) was used. T1-T5 designate arbitrarily chosen peptides to measure reproducibility of retention times (data not shown). The illustration was made available by Dr. J.P. Chervet, LC Packings.

0

10

20

30

43

40

Time (min)

supplemental material, 1994, kindly provided by Dr. J.P. Chervet]. 100 fmol were injected onto a 3-pm capillary column (75 pin I.D. x 30 cm, C18, LC Packings) and separated in a solvent system containing TFNacetonitrile. The flow rate was 180 nYmin. Detection was performed at 214 nm with a Z-shaped flow cell (9 nl, LC Packings).

44


4 References Chervet, J.P., Ursem, M., Salzmann, J.P., Vannoort, R.W. (1989) J . High Resolution Chromatography 12(5),278-28 1, Ultra-sensitive UV detection in micro separation. Chervet, J.P., Meijvogel, C.J.,Ursem, M., Salzmann, J.P. (1992) LCGC 10(2),140-148. Recent advances in capillary liquid chromatography: Delivery of highly reproducible micro flows. Chicz, R.M., Regnier, F.E. (1990) Methods Enzymol. 182,392-421, High-performance liquid chromatography: effective protein purification by various chromatographic methods. Cox, G.B. (1993) Preparative reversed-phase chromatography of proteins. In: Chromatography in Biotechnology (Horvath, C., Ettre, L.S.; eds.), pp. 165-182, American Chemical Society, Washington, DC. Dolan, J.W. (1991) Preventive maintenance and troubleshooting LC instrumentation. In: High-Performance Liquid Chromatography of Peptides and Proteins: Separation, Analysis, and Conformation (Mant, C.T., Hodges, R.S.; eds.), pp.23-29, CRC Press, Boca Raton. Kassel, D.B., Luther, M.A., Willard, D.H., Fulton, S.P., Salzmann, J.P. (1993) Rapid purification, separation and identification of proteins and enzyme digests using packed capillary perfusion column LC and LC/MS. In: Techniques in Protein Chemistry ZV (Hogue Angeletti, R.; ed.), pp.55-64, Academic Press, San Diego. Kitagawa, N. (1989) New hydrophilic polymer for protein separations by HPLC. In: Techniques in Protein Chemistry (Hugh, T.E.; ed.), pp.348-356, Academic Press, San Diego. Mant, C.T., Hodges, R.S. (1991a) Mobile phase preparation and column maintenance. In: High-Performance Liquid Chromatography of Peptides and Proteins: Separation, Analysis, and Conformation (Mant, C.T., Hodges, R.S.; eds.), pp.37-45, CRC Press, Boca Raton. Mant, C.T., Hodges, R.S. (1991b) Mobile phase preparation and column maintenance. In: High-Performance Liquid Chromatography of Peptides and Proteins: Separation, Analysis, and Conformation (Mant, C.T., Hodges, R.S.; eds.), pp.69-94, CRC Press, Boca Raton. Mant, C.T., Hodges, R.S. (1991~)The effects of anionic ion-pairing reagents on peptide retention in reversed-phase chromatography. In: High-Performance Liquid Chromatography of Peptides and Proteins: Separation, Analysis, and Conformation (Mant, C.T., Hodges, R.S.; eds.), pp.327-341, CRC Press, Boca Raton. Meyer, H.E., Mayr, G.W. (1987) Biol.Chem. Hoppe-Seyler 368,1607- 1611. Npmethylhistidine in myosin-light-chain kinase. Meyer, H.E., Meyer, G.F., Dirks, H., Heilmeyer, L.M.G. Jr. (1990) EurJ.Biochem. 188,367-376. Localization of phosphoserine residues in the a subunit of rabbit skeletal muscle phosphorylase kinase. Moritz, R.L., Simpson, R.J. (1993), Capillary liquid chromatography: a tool for protein structural analysis.In: Methods in Protein Sequence Analysis (Imahori, K., Sakiyama, F.; eds.), pp.3-10, Plenum Press, New York. Nugent, K.D. (1991) Commercially available columns and packings for reversed-phase HPLC of peptides and proteins. In: High-Performance Liquid Chromatography of

11.2 HPLC

45

Peptides and Proteins: Separation, Analysis, and Conformation (Mant, C.T., Hodges, R.S.; eds.), pp.279-287, CRC Press, Boca Raton. Nugent, K.D., Dolan, J.W. (1991) Tools and techniques to extend LC column lifetimes. In: High-Performance Liquid Chromatography of Peptides and Proteins: Separation, Analysis, and Conformation (Mant, C.T., Hodges, R.S.; eds.), pp.31-35, CRC Press, Boca Raton. Serwe, M., Meyer, H.E., Craig, A.G., Carlhoff, D., D’Haese, J. (1993) EurJ.Biochem. 211,341-346. Complete amino acid sequence of the regulatory light chain of obliquely striated muscle myosin from earthworm, Lumbricus terrestris. Stone, K.L., Williams, K.R. (1993) Enzymatic digestion of proteins and high-performance liquid chromatography peptide isolation. In: A Practical Guide to Protein and Peptide Purification for Microsequencing (Matsudaira, P.; ed.), pp.43-69, Academic Press, San Diego. Weber, C. (1 989), PhD Thesis, Institute of Physiological Chemistry, Ruhr-University of Bochum, Entwicklung einer spezifischen Methode zur Lokalisierung von Phosphoserin.


11.3 Microseparation Techniques 11: Analysis of Peptides and Proteins by Capillary Electrophoresis Christine Schwer

1 Introduction The high efficiency, ease of automation and short analysis time (ranging from several ten minutes to less than one minute) has made capillary electrophoresis (CE) a powerful tool in peptide and protein analysis. The use of different techniques applying different separation modes, a wide range of additives and the selection of the buffer pH allows to adjust the selectivity for every individual separation problem. Coupling CE to mass spectrometry and micropreparative applications of CE permit a further characterization of the separated compounds.

2 Theory All electrophoretic techniques have in commen that they apply high electrical fields to achieve separation of charged species. To obtain maximum efficiency peak dispersion due to convection caused by temperature gradients (due to Joule heating) in the system has to be kept a minimum. In classical electrophoresis stabilizing media - e.g. paper, cellulose acetate, agarose or polyacrylamide gels - are used to reduce convection. The other approach used in capillary electrophoresis is to use tubings of very small internal diameter (less than 100 pm), as the temperature difference between the center of the capillary and the wall is proportional to the square of the diameter of the capillary [Knox 19871. In this way electrical field strengths in the order of several hundred V/cm can be applied, resulting in very fast separations. The basic intsrumental setup for CE, as shown in Figure 1, is relatively simple, consisting mainly of a high voltage power supply (f30 kV), a fused-silica tubing as separation capillary and a on-column UV-detector. Depending on the type and arrangement of the buffer solutions in the capillary different techniques can be distinguished.

R. Kellner, F. Lottspeich, H. E. Meyer (1994)Microchuructel-izrrtior?of Proteins, VCH Weinhriin

48

C. Schwer

4

4

Figure 1. Schematic representation of a capillary electrophoresis instrument. (1) fusedsilica separation capillary, (2) platin electrodes, (3) high voltage power supply, (4) electrolyte vessels, (5) on-column UV-detector.

2.1 Capillary Isotachophoresis In isotachophoresis (ITP) two electrolyte solutions are used in the separation system, the leading electrolyte and the terminating electrolyte, which are chosen such that the leading ions have the highest mobility of all ions of interest and the terminating ions the lowest. The sample is injected between these two electrolytes. During separation the electric current is kept constant and the electrical field strength increases according to Ohm's law with decreasing mobility (conductivity) of the consecutive zones. Separation is achieved due to the different velocities v, of the analyte ions i in the mixed zone formed by the sample:

where Em'*' is the electric field strength in the mixed zone and m, is the mobility of the species i in this zone. After the separation of the sample ions into individual zones all zones migrate with the same velocity, each zone having its own electrical field strength (depending on its mobility). The distribution of all ions in the separation system is described for isotachophoresis and zone electrophoresis by the Kohlrausch regulating function [Kohlrausch 18971: (2)

S ci / m i = const.

where cjis the concentration of species i. The concentration profile of the analytes zones in ITP is rectangular, their concentration being adapted to that of the leading zone according to the Kohlrausch regulating function. For the concentration c, of analyte A it follows: (3)

c, =

C L Im,(m,+m,)I

/Im,(m,+m,)I

where cLis the concentration of the leading ion L and mA,mLand mp are the mobilities of analyte A, the leading ion and the counter ion Q in the steady state.

11.3 Capillary Electrophoresis

49

Sharp zone boundaries are maintained because of the "self-sharpening" effect: ions diffusing in a zone behind or in front of them, will be accelerated or slowed down due to the step gradients in the electrical field strength and will therefore return to their own zone. For dilute samples ITP leads to a concentration of the analytes and can be used therefore as a preconcentration technique for capillary zone electrophoresis.

2.2 Capillary Zone Electrophoresis In capillary zone electrophoresis (CZE) the separation system is filled with a single electrolyte of relatively high concentration to provide a uniform field strength when an electrical potential is applied to the capillary. Ions i are separated according to their different migration velocities vi under uniform electrical field strength E:

where mlefis the effective mobility of an ion of a weak electrolyte, which is related to the mobility mi of the fully dissociated species i by the degree of dissociation a:

(5)

m,ef= in,a

In contrast to ITP band broadening caused by diffusion leads to a dilution of the injected sample zone. The concentration profile is approximated by a Gaussian distribution for those cases, where the electrical field is only slightly disturbed by the analyte. High concentrations of analytes with mobilities highly differing from the mobility of the background electrolyte result in a distortion of the electrical field and thus in nonsymmetrical peaks [Mikkers 19791. Under ideal conditions (no contributions to peak dispersion caused by injection, detection, convection due to temperature gradients, distortion of the electrical field, etc.) only longitudinal diffusion leads to band broadening and the width of the concentration distribution, expressed by the spatial variance sz2,is given by:

where D is the diffusion coefficient of the sample component, szis the standard deviation of the Gaussian peak and t is the migration time. Using the concept of theoretical plates to describe efficiency, an expression for the number of theoretical plates N can be obtained [Giddings 1969; Jorgenson 19811: (7)

N = mU/2D

where U is the applied voltage. As mobility and diffusion coefficient are related according to the Einstein-Nernst equation, D/m=kT/ze,, equatation 7 can be rewritten as [Giddings 1989; Kendler 19911:

50

C. Schwer

where k is the Boltzmann constant, Tthe absolute temperature, e,, the electronic charge and z the charge number of the analyte. Thus the plate number is only dependent on the charge, but not on the size of the sample ions. This explains the extremely high plate numbers that can be obtained for highly charged solutes as proteins or oligonucleotides.

2.3 Electroosmotic Flow The most frequently used separation capillaries are made of fused silica. The silanol groups of the inner surface dissociate when in contact with electrolyte solution, leading to the formation of an electrical double layer, which is charcaterized by the zeta potential z. When an electrical potential is applied to the capillary, a bulk flow of the liquid in the direction to the cathode - the electroosmotic flow iiEO- results, which is related to the zeta potential as follows:

vEO= ezE / 4ph

(9)

where e is the dielectric constant and h the viscosity of the liquid in the electrical double layer. This electroosmotic velocity of the bulkliquid is superimposed on the electrophoretic velocity of the analytes, as shown in Figure 2. The net velocity of the analytes is the vectorial sum of electroosmotic and electrophoretic velocity. For most anions the electrophoretic mobility is lower than the electroosmotic mobility at high pH, so they can be detected at the side of the cathode.

\ \ \ \ \ \ \ \o- ‘o-\o-\o- \o- \0-\0@30- 0- 0- 0s0-

@ c D O@ e3

CD

@ @

e e e

a

e3

0

e3

e3 @ @ 03 a3 -0 -0 4 3 @ 0 0 0- 0- 0- 0- 0 0 0- 0- 0 0 0@

Figure 2. Electroosmotic and electrophoretic velocity, vEo and vEpH:The surface silanol groups are dissociated when in contact with buffer electrolyte, forming, together with the counter ions, the electrical double layer. When an electrical field is applied to the capillary, a flow of the bulk liquid in the direction of the cathode results, which contributes to the overall velocities of all species: as the electroosmotic velocity at high pH is usually higher than the electrophoretic velocity, also anions move in the direction of the cathode; neutral molecules are transported by the electroosmotic flow and can be detected at the side of the cathode.


lZO 100

1

51

pHdependence ofthe electroosmotic niobiliy nzE0

20

0

3

4

5

6

7

0

9

1

0

1

1

1

2

PH

Figure 3. Dependence of the electroosmotic mobility, mEo,on the pH of the buffer electrolyte. The electroosmotic velocities were determined using neutral marker substances in a lo-' M KC1 / M K-phosphate buffer. The zeta potential and thus the electroosmotic velocity is depending on the ionic strength of the electrolyte solution - it increases with decreasing ionic strength - and on the pH. The pH-dependence, as shown in Figure 3, resembles a titration curve with an inflection point at about pH 5.5 [Schwer 19911. At pH values above 7 the EOF is usually higher than the migration velocity of the anions, so that cations and anions can be detected at the side of the cathode. The EOF shows a plug-like profile in contrast to the parabolic profile of hydrodynamic flow. Its contribution to peak dispersion can therefore be neglected, leading to an expression for the number of theoretical plates under conditions of electroosmotic flow as follows [Jorgenson 19811: (10)

N = (rn+mE,)U/2D = (m,,/m +1)ze,U/2kT

By substituting the Einstein-Nernst equation it can be seen that the plate number depends on the charge number of the analyte and the ratio of electrophoretic and electroosmotic mobility [Schwer 19921. Although the EOF is of advantage in a large number of applications, because anions and cations can be determined in a single run, adsorption of solutes onto the capillary wall can make a modification of the capillary wall necessary. Strong electrostatic interactions between the charged capillary surface and analytes with high charge density, especially proteins, lead to a significant loss in efficiency and unreproducible migration times. To decrease solute-wall interactions several different approaches have been described. Using buffer electrolytes of extreme pH values results either in a repulsion of anions from the highly negatively charged wall at high pH or no electrostatic interactions because of a non-charged surface at low pH [McCormick 1988; Lauer 1986; Walbrohl 1989; Lindner 19921. However, to work at moderate pH other approaches have to be used: permanent coatings of the inner wall by chemically modifying the silanol groups or dynamic modifications by using buffer additives.

52

C. Schwer

The modification of the capillary surface with polyacrylamide via different types of bonding is the most widely used permanent coating with relatively good stability at higher pH [HjertCn 1985,1991; Cobb 1990; Schmalzing 19931.Other permanent coatings include methylcellulose, polyethylene glycol, different silanes, ion exchangers, polyvinylalcohol, etc., most of them with limited pH stability and/or effectiveness in suppressing solute-wall interactions [HjertCn 1985, 1993; Bruin 1989; Swedberg 1990; Jorgenson 1983; Gilges 19921. Dynamic modifications include the use of high ionic strength buffers, zwitterionic salts, divalent amines, non-ionic surfactants, polymers or charge-reversal reagents [Chen 1992; Bushey 1989; Bullock 1991; Towns 1991; Mazzeo 1991; HjertCn 1989; Eninier 1991;Wiktorowicz 19901.

3 Instrumentation 3.1 Injection Injection in CE is usually performed either hydrodynamically or electrokinetically. In order to obtain maximum efficiency the variance of the injection profile must not contribute significantly to peak broadening. For pressure injection it follows that the injection volume must not exceed a few nanoliter for capillaries of 75 - 100 pm i.d.. As the injected volume with hydrdynamic injection techniques is dependent on viscosity, temperature control is decisive for reproducible injection volumes. If injection is carried out electrokinetically (the only injection method for gel filled capillaries) discrimination depending on the mobility of the analytes has to be considered. Further the actually injected amount is very much dependent on the concentration of matrix components. These effects are the more pronounced, the lower the EOF is. To increase the injection volume and thus the detection sensitivity sample stacking can be performed. The easiest way of sample preconcentration is achieved, if the sample is dissolved in water or very dilute buffer[Haglund 1950; Mikkers 1979; Burgi 19911. The low conductivity in the sample plug results in a high electrical field strength in this zone and consequently to a concentration of the analytes at the front of the zone. This type of sample stacking and modifications of it have been extensivley treated by Burgi [Burgi 1991, 1992, 1993; Chien 19911. Another approach uses discontinuous buffer systems to achieve an isotachophoretic sample concentration. This can be performed in a dual-column mode [Kaniansky 1990; Foret 1990; Stegehuis 19911, where an enrichment factor of about 1000 can be reached or in a single-column mode in commercial instruments, where the injection volume can be increased by a factor of 100 to about 1 pl [Schwer 1992, 1993; HjertCn 1987, 1990; Foret 19921.Two types of discontinuous buffer systems have been developed for preconcentration. The one buffer system, applicable for the concentration of peptides and proteins, has the sample plug sandwiched between zones of extreme pH, i.e. a zone of OH- and H', as shown in Figure 4. As OH-and H' are migrating towards each other, the analytes are concentrated in between. In Figure 5 this system was applied to the separation of tryptic peptides of plactoglobulin. The injection volume was increased to 750 nl, while still maintaining the high separation efficiency.


53

One Buffer System

Figure 4. Arrangement of the different buffer solutions in the one-buffer stacking system. CE...carrier electrolyte; A,B ...sample; H', OH-...solution of an acid or base, respectively. For the composition of the buffer electrolytes see Table 1, system I. 0

0 d

I

Figure 5. Separation of tryptic petides of b-lactoglobulin under stacking conditions in the one-buffer system (system I in Table 1).The concentration of the digested protein was 10 pmoVp1 and the injection volume was 750 nl. The two buffer system, shown in Figure 6, consists of two electrolytes, a leading and a terminating electrolyte like in ITP, but the capillary is filled with terminating electrolyte and only in front of the sample a zone of leading electrolyte is applied. The ends of the capillary are placed in terminating electrolyte. The presence of the leading electrolyte leads first to isotachophoretic conditions for the sample ions and therefore to a concentration adaption to the concentration of the leading electrolyte. However, as the leading electrolyte is preceeded by terminating electrolyte, the leading ions migrate zone electrophoretically into the terminating electrolyte, thus migrating away from the sample ions. As the isotachophoretic conditions no longer hold for the sample ions, they are themselves separated by zone electrophoresis in the terminating electrolyte. As an example the separation of 5 basic proteins is shown in Figure 7. The injection volume was increased to 500 nl. The adsorption of the proteins was suppressed by dynamic coating with polyethylene glycol as a buffer additive. The composition of these and other stacking systems is given in Table 1.

C. Schwer

54

Two Buffer System TE

Figure 6. Arrangement of the different buffer solutions in the two-buffer stacking system. LE ...leading electrolyte; TE...terminating electrolyte; A,B ... sample. For the composition of the different buffer solutions see Table 1, systems IIa-c. 8

0 0 r

o m

4

zg

i 8% b

9

0 0 0

0

3

Figure 7. Separation of five basic proteins under stacking conditions in a two-buffer system at pH 4 (system IIa in Table 1). The capillary was dynamically coated with 0.02 o/o PEG as buffer additive. Injection volume: 500 nl. Protein concentration: 10 ng/ yl. Peaks: (1) cytochrome c (2) lysozyme (3) ribonuclease A (4) trypsinogen ( 5 ) chymotrypsinogen A.

z

z

0

8

7

9

0

0

0

7

N

Table 1. Electrolyte systems.

~~~~

One buffer system I

CE: 20mM Na-phosphate pH 2.9 OH-: 100 mM NaOH H+: 100 mM H3P04

Two buffer system Ila

LE: TE:

50 mM ammonium acetate 50 mM p-alanine / 50 mM acetic acid pH 4.0

Two buffer system IIb

LE: TE:

50 mM ammonium acetate 50 mM betaine / 50 mM acetic acid pH 3.3

Two buffer system Ilc

LE: TE:

50 mM HCI / 50 mM p-alanine 50 mM MES / p-alanine pH 4.8

~

~

~

CE: carrier electrolyte; LE: leading electrolyte; TE: terminating electrolyte; MES: 2(Nmorpho1ino)ethanesulfonic acid. System IIc is used for the concentration of anions. while the other systems are cationic systems.


55

3.2 Detection Due to the small optical pathlength in on-column detection relatively high sample concentrations are needed for UV-detection. To increase sensitivity either preconcentration techniques, as described above, can be applied or more sensitive detection methods used, e.g. laser induced fluorescence [Swaile 1991; Cheng 19881. For non or low UV-absorbing analytes indirect UV or indirect fluorescence detection have proven to be a good alternative [Foret 1989; Kuhr 1988; Jandik 19911. Other specific detection methods, as e.g. electrochemical, radiometric or conductivity detection [Kaniansky 1983; Pentoney 1989; Wallingford 1987, 19881 have been described in literature, but are not commercially available yet. The coupling of CE to mass spectrometry [Olivares 1987; Smith 1988; Moseley 19891 has already shown to be a very powerful method, especially in peptide and protein chemistry, as it allows a further characterization of the analytes. As an example CE was coupled to electrospray mass spectrometry to identify tryptic peptides of p-casein as shown in Figure 8. Good agreement between the determined molecular masses and the known fragments of fetuin was found. As an illustration the mass spectra of three selected peaks are depicted in Figure 9. T

1

A

TIC from 400 lo 1800

400 16.3

600

255

800 32.8

1000

40.5

1200 47.8

ScanlTirne (rnm)

Figure 8. CE-MS analysis of tryptic peptides of p-casein in an ammonium acetate buffer at pH 3.5. A P/ACE 2100 CE-instrument was coupled to a SCIEX API I11 with a coaxial interface arrangement. As sheath liquid 0.1 % acetic acid in SO % methanol at a flow rate of 5 pl/min was used. A potential of 25 kV was applied at the inlet side of the CE-instrument, while the electrospray needle was set at a potential of 5 kV. A fused silica capillary with an internal diameter of 50 pm and a total length of 95 cm was used for separation. The orifice voltage was 75 V. Further characterization of the separated compounds can be achieved off-line after collection of fractions. Fraction collection has been performed using different techniques. By calculating the time window, when acompound has migrated to the end of the capillary, fractions can be collected by simply changing the vial at the outlet [Guttmann 1990; Banke 1991; Bergman 1991; Camilleri 1991; Altria 1993; Chen 19921. This method requires a prerun to determine the migration velocities and highly reproducible and constant migration

56

C. Schwer

'"1

Milt

scans 589-594

EAMAPK (hlW=646.41

0 500

'"1

0 600

550

600

0

650

700

750

800

650

900

rnh

I MH+

Scans 786-793

700

750

600

mlz

Figure 9a-c. Mass spectra of 3 selected peaks from the CE-MS separation in Figure 8. Known sequences of tryptic fragments could be assigned to the corresponding m/z values of the mass spectra of the individual peaks.

-

times. Fractions are diluted with 10 1 1 buffer electrolyte. A non-uniform electrical field strength along the capillary (e.g. under stacking conditions) makes the prediction of the time window for fraction collection difficult. A continuous approach makes use of a frit structure or a similar construction to obtain electrical contact to the counter electrode [Huang 1990; Guzman 1991; Fujimoto 1991, 19921. No contact between electrode and analytes occurs preventing possible decomposition of the analytes by electrode reactions. The solutes are transported by electroosmotic flow to the end of the capillary. Highly reproducible electroosmotic flow rates are required for fractions of high purity. Using a moving membrane the analytes can be continuously collected without losing resolution. Detection can be performed e.g. by staining, immunoassays or radioactivity measurements [Eriksson 1992; Cheng 1992; Konse 19931.


57

Making use of a make-up flow [Hjerttn 1985; Schwer 19941, aconstant flow rate allows fraction collection as in HPLC with the possibility of automation. A schematic representation of the instrumental setup is shown in Figure 10. This method of fraction collection was applied to tryptic peptides of b-casein. The sample solution was applied to the micropreparative systems under stacking conditions to increase the injection volume to about 300 nl. Ten fractions were collected and the purity was controlled by reinjection under stacking conditions. Even closely migrating peptides were collected in high purity. Amino acid sequence analysis was performed from several fractions, proving that fractions from a single run were sufficient for the determination of the sequence. The initial yield was 90 pmol, which is in good agreement with the amount separated [Schwer 19941. I

2

5 7

+

Figure 10. Schematic representation of the arrangement for fraction collection using a make-up flow. (1) separation capillary with an i.d. of 100 pm and an 0.d. of 190 pm; ( 2 ) make-up flow, delivered by a syringe pump at a flow rate of 5 pYmin; (3) outer fused silica capillary with an i.d. of 250 pm; (4) counter electrode connected to ground; ( 5 ) external UV-detector with modified cell for on-column detection in outer capillary; (6) buffer vial; (7) electrode at inlet.

4 Applications 4.1 Peptide Separations Peptides are usually separated at low pH of about 2-3, where they all migrate as cations or at high pH under conditions of EOF. The mobility of a peptide can be predicted by the following equation

where K is a constant, z is the valence and M the molecular mass [Offord 1966; Rickard 19911. Good correlation between migration times and molecular mass were found, if ionization constants, which are typical for peptides, are used to calculate the overall charge of the peptide instead of the pKa of the free amino acids. A semiempirical model based on charge, size and hydrophobicity was also described to predict the mobilities of peptides [Grossmann 19891.

58

C. Schwer

The selectivity of the separation of peptides can be influenced by: pH of the buffer electrolyte micelle forming reagents ion-pairing reagents complexing reagents cyclodextrins organic solvents. CE is routinely applied to the purity control of synthetic peptides or HPLC fractions, as shown in Figure 11. Due to the orthogonal separation principles CE is able t o resolve one HPLC fraction into several peaks. Absorbance (x 10-3) 0.0000

2.0000

4.0000

2.0000

4.0000

10.00

12.00

14.00

15.00 0.0000

Absorbance (x 10-3)

Figure 11. Purity control of an RP-HPLC fraction of tryptic petides of fetuin. To increase the injection volume and the sensitivity the two-buffer stacking system IIb (see Table 1) was used for separation. Due to the orthogonal separation mechanisms CE can separate one HPLC peak into several components.

In combination with mass spectrometry CE can be used to gain further structural information of tryptic peptides. A micropreparative application of CE allows the further characterization of peptides by the determination of its amino acid composition or sequence or other off-line methods as e.g. immunoassays.

4.2 Protein Separations Proteins can be separated applying different modes of CE, using different sample properties to achieve separation: capillary zone electrophoresis (CZE): mobility isoelectric focusing (IEF): isoelectric point (PI) capillary gel electrophoresis (CGE): molecular mass


59

I ?

1 1

17

2

I

19

I

I

21 MINUTES

I

I

23

,

I

2

Figure 12. CZE separation of protein mixture in PVP-modified capillary: capillary, 110cm of 52 pm i.d. fused silica derivatized with [(3-methacryloyloxy)propyl]trimethoxysilane and l-vinyl-2-pyrrolidone, 75 cm separation distance: buffer, 38.5 mM H,PO,, 20 mM NaH,PO,, pH 2.0; detection, 190 nm; injection, 5 s at 5 kV; separation, 5 to 25 kV linear program in 150 s. Sample: (A) P-lactoglobulin B, (B) P-lactoglobulin A, (C) lysozyme (ovine egg), (D) albumin (human serum), (E) albumin (bovine serum), (F) cytochrome c (horse), (G) trypsinogen (bovine), (H) myoglobin (whale), (I) transferrin, (J) conalbumin, (K) myoglobin (horse), (L) carbonic anhydrase B (bovine), (M) carbonic anhydrase A (bovine), (N) hemoglobin (human), (0)paralbumin (rabbit). (Reproducedfrom [McCormick 19881 with permission of the American Chemical Society.) CZE: Separation is obtained due to the differences in mobility (depending on size and charge). Loss of efficiency may occur due to interactions between the solutes and the charged wall of the capillary, as discussed previously. In Figure12 the separation of a mixture of 15 proteins at pH 2.0 is shown. The capillary surface was modified withpoly(viny1 pyrrolidone) to further reduce protein-wall interactions. IEF: Separation occurs according to the PI of the proteins in a pH-gradient, formed by an ampholyte solution. Usually coated capillaries are used to suppress EOF in order to obtain high resolution. For the detection of the analytes under conditions of suppressed EOF, mobilization of the focused proteins must be carried out. This can be achieved either electrophoretically by changing the composition of inlet or outlet vial [HjertCn 1985,1987; Zhu 19911 or hydrodynamically by pressure mobilization, where pressure is applied together with the high voltage in order to maintain the high resolution [HjertCn 19871. Under conditions of EOF no mobilization is necessary, as the focused proteins are transported by the EOFpast the detector [Mazzeo 1991, 19931. In Figure13 the separation of hemoglobin variants by IEF with anodic mobilization is shown. The resolution obtained was in the order of 0.02 PI units. CGE: Proteins are separated as their SDS-complexes according to their molecular mass. As sieving medium capillaries filled either with cross-linked polyacrylamide gel [Cohen 19871are used or with polymer solutions such as polydextran, polyethylene glycol or linear polyacrylamide solutions [Widhalm 1991; Ganzler 1992; Guttman 19931. Using polyacrylamide as sieving medium has the disadvantage of high UV absorbance below 230 nm, thus favouring the application of other polymers, being UV-transparent also at 214 nm. as polydextran or polyethylene glycol, allowing the detection of proteins with increased

60

C. Schwer

71.25

C I

47.50

23.75

0

2

4 min

6

8

Figure 13. Separation of hemoglobin variants by capillary IEF in a 12 cm x 25 pm coated capillary using pH 3-10 ampholytes. Focusing and mobilization were carried out at 8 kV constant voltage. Protein concentration was approximately 250 pg/ml for each protein. Isoelectric points are: hemoglobin A, PI 7.10; hemoglobin F, PI 7.15; hemoglobin S, PI 7.25; hemglobin C, PI 7.50. (Reproduced from [Zhu 19911 with permission of Elsevier Science Publishers.) I

Figure 14. Electropherogram of an SDS-E.coli crude extract using a PEG polymer network. Conditions: buffer, 0.1 M TRIS-CHES, 0.1% SDS, pH 8.8; polymer, 3% w/v PEG 100 000; effective length, 40 cm; 100 Ftn i.d.; applied electric field, 300 Vkm; Injection, pressure mode for 20 s; detection, 214 nm; OG: internal standard - Orange G. Electropherogram was redrawn using Lotus 1,2,3. (Reproduced from [Ganzler 19921with permission of the American Chemical Society.) sensitivity. To suppress EOF and electrostatic interactions with the wall coated capillaries are used. A linear correlation between migration time and molecular mass is obtained allowing the determinination of the molecular mass of unknown proteins. In Figure14 the separation of an SDS - E.coli crude extract is shown using a PEG solution as sieving medium.


61

5 References Altria, KD; Dave, YK (1993) J. Chrornatogr. 633,221 Banke, N; Hansen, K; Diers, I (1991) J.Chrornatogr. 559,325 Bergman, T; Agerberth, B; Jornvall, H (1991) FEBS 283,100 Bruin, GJ; Chang,JP; Kuhlman, RH; Zegers, K; Kraak, JC; Poppe, H (1989)J. Chromatogr. 471,429 Bullock, JA; Yuan, LC (1991) J.Microcol.Sep. 3,241 Burgi, DS; Chien, RL (1991) AnaLChenz.63,2042 Burgi, DS; Chien, RL (1992) AnaLBiochem. 2,306 Burgi, DS (1993) Anal.Chein. 65,3726 Bushey, MM; Jorgenson, JW (1989) J. Chromatogr. 480,301 Camilleri, P; Okafo, GN; Southan, C; Brown, R (1991) AnaLBiochern. 198,36 Chen, FA; Kelly, L; Palmieri, R; Biehler, R; Schwartz, H (1992) J.Lig.Chronzatogr. 15,1143 Cheng, YF; Dovichi, NJ (1988) Science 242,563 Cheng, YF; Fuchs, M; Andrews, D; Carson, W (1992) J.Chromatogr. 608,109 Chien, RL; Burgi, DS (1991) J.Chrornatogr. 559,153 Cobb, KA; Dolnik, V; Novotny, M (1990) Anal.Chern. 62,2478 Cohen, AS; Karger, BL (1987) J. Chromatogr. 397,409 Emmer, A; Jansson, M; Roeraade, J (1991) HRC 14,738 Eriksson, KO; Palm, A; Hjerttn, S (1992) Anal.Biochem. 201,211 Foret, F; Fanali, S; Ossicini, L; Bocek, P (1989) J.Chrornatogr. 470,299 Foret, F; Susthcek, V; Bocek, P (1990) J.Microco1. Sep. 2,229 Foret, F; Szoko, E; Karger, BL (1992) J.Chrornatogr. 608,3 Fujimoto, C; Muramatsu, Y; Suzuki, M; Jinno, K (1991) HRC 14,178 Fujimoto, C ; Fujikawa, T; Jinno, K (1992) HRC 15,201 Ganzler, K; Greve, KS; Cohen, AS; Karger, BL; Guttman, A; Cooke, NC (1992) Anal.Chenz. 64,2665 Giddings, JC (1969) Sep. Sci. 4,181 Giddings, JC (1989) J. Chromntogr. 480,21 Gilges, M; Husmann, H; Kleemiss, MH; Motsch, SR; Schomburg, G (1992) HRC 15,452 Grossman, PD; Colburn, JC; Lauer, HH (1989) AnaLBiochern. 179,28 Guttman, A; Cohen, AS; Karger, BL (1990) Anal.Chern. 62,137 Guttman, A; Horvath, J; Cooke, N (1993) Anal.Chenz. 65,199 Guzman, NA; Trebilcock, MA; Advis, JP (1991) J.LLiq.Chrornatogr. 14,997 Haglund, H; Tiselius, A (1950) Actu Clzem.Scnnd. 4,957 Hjerttn, S; Zhu, MD (1985) J.Chrornntogr. 346,265 Hjerttn, S (1985) J.Chromatogr. 347,191 Hjerttn, S; Zhu, MD (1985) J.Chrornatogr. 327,157 Hjerttn, S; Liao, JL; Yao, K (1987) J.Chromatogr. 387,127 Hjerttn, S; Elenbring, K; Kilir, F; Liao, JL; Chen, AJ; Siebert, CJ; Zhu, MD (1987) J. Chrornntogr. 403,47 Hjerttn, S; Valtcheva, L; Elenbring, K; Eaker, D (1989) J.Liy.Chromcitogr. 12,2471 Hjerttn, S (1990) Electrophoresis 11,665 Hjerttn, S, Kiessling-Johansson, M (1991) J.Clzroinatogr. 550,811

62

C . Schwer

Hjerttn, S; Kubo, K (1993) Electrophoresis 14,390 Huang, X; Zare, RN (1990) Anal. Chem. 62,443 Jandik, P; Jones, WR (1991) J.Chromatogr. 546,431 Jorgenson, JW; Lukacs, KD (1981) Anal. Chem. 53,1298 Jorgenson, JW; Lukacs, KD (1983) Science 222,266 Kaniansky, D; Rajec, P; Svec, A; Havasi, P; Macasek, F (1983) J.Chromatogr. 258,238 Kaniansky, D; Marak, J (1990) J. Chromatogr. 498,19 1 Kenndler, E; Schwer, C (1991) AnaLChem. 63,2499 Knox, JH; Grant, IH (1987) Chromatographia 24,135 Kohlrausch, F (1897) Ann. Phys. Chem. 62,209 Konse, T; Takahashi, T; Nagoshima, H; Iwaoka, T (1993) Anal.Biochem. 214,179 Kuhr, WG; Yeung, ES (1988) AnaLClzem.60,1832 Lauer, HH; McManigill, D (1986) AndChem. 58,166 Lindner, H; Helliger, W; Dirschlmayer, A; Jaquemar, M; Puschendorf, B (1992) Biochem.J. 283,467 Mazzeo, JR; Krull, IS (1991) AnaLChem. 63,2852 Mazzeo, JR; Martineau, JA; Krull, IS (1993) AnaLBiochem. 208,323 McCormick, RM (1988) Anal.Chem. 60,2322 Mikkers, FE; Everaerts, FM; Verheggen, ThP (1979) J.Chromatogr. 169,l Mikkers, FE; Everaerts, FM;Verheggen, TP (1979) J. Chromatogr. 169,ll Moseley, MA; Deterding, LJ; Tomer, KJ3; Jorgenson, JW (1989) J. Chrotnatogr. 480,197 Offord, RE (1966) Nature 211,591 Olivares, JA; Nguyen, NT; Yonker, CR; Smith, RD (1987) Anal.Chem. 59,1230 Pentoney, SL; Zare, RN; Quint, JF (1989) Aiza1.Chem. 61,1642 Rickard, EC; Strohl, MM; Nielsen, R.G. (1991) AnaLBiochem. 197,197 Swedberg, SA (1990) AnaLBiochern. 185,51 Schmalzing, D; Piggee, CA; Foret, F; Carrilho, E; Karger, BL J (1993) Clzromatogr.6-52,149 Schwer, C; Kenndler, E (1991) AndChem. 63,1801 Schwer, C; Lottspeich, F (1992) J.Chromatogr. 623,345 Schwer, C; Kenndler, E (1992) Chronzatographin 33,33 1 Schwer, C; Gao, B; Lottspeich, F; Kenndler, E (1993) Anal.Clzenz.65,2108 Schwer, C; Lottspeich, F (1994) Anal. Chem., submitted Smith, RD; Barinaga, CJ; Udseth, HR (1988) Anal. Chem. 60,1948 Stegehuis, DS; Irth, H; Tjaden, UR; Van der Greef, J. (1991) J.Chromatogr. -538,393 Swaile, DF; Sepaniak, MJ (1991) J.Liq.Chronzatogr. 14,869 Towns, JK; Regnier, FE (1991) Anal.Chem. 63,1126 Walbroehl, Y; Jorgenson, JW (1989) J.MicrocoZ.Sep. 1,41 Wallingford, RA; Ewing, AG (1987) Anal. Chem. 59,1762 Wallingford, RA; Ewing, AG (1988) Anal. Chem. 60,1972 Widhalm, A; Schwer, Ch; Blaas, D; Kenndler, E (1991) J.Chronzatogr. 546,446 Wiktorowicz, JE; Colburn, JC (1990) Electrophoresis 11,769 Zhu, M; Rodriguez, R; Wehr, T (1991) J.Chronzatogr. 559,479


11.4 Microseparation Techniques 111: Gel Electrophoresis for Sample Preparation in Protein Chemistry Hermann Schagger

1 Introduction The basic principle of all electrophoretic protein separation techniques is the migration of charged molecules in an electrical field. Differences between the many available techniques relate the way in which intrinsic or induced differences in charge/mass ratio or size of proteins are used for separation. Let us first conside proteins in their native state in free solution, i.e. in a buffered solution without using a gel matrix. The migration velocity of native proteins would then only depend on the charge/mass ratio at the given pH. Practical problems, especially with convection and diffusion, however, led to the use of agarose or polyacrylamide gel matrices. With agarose or polyacrylamide gels at low acrylamide concentration, the molecular sieving effect is low and the charge/mass ratio still dominates over size-dependent separation. One problem, however, is not solved with the introduction of the gel matrix. Proteins with isoelectric points (PI) above the buffer pH used will carry positive charges. They will not migrate into the gel (if protein application is on the cathodic side of the gel) and will be lost in the cathode buffer. This technique m a l n g use of the protein intrinsic charge is therefore rarely used, except in combination with acrylamide gradient gels (Colourless-Native-PAGE), and, more commonly, in isoelectric focusing (IEF), where a combination with extremes of pH in the electrode buffers and additional use of a pH gradient within the gel matrix is required. The extremes of pH in the electrode buffers prevent the escape of basic and acidic proteins from the gel into the electrode buffers. Proteins migrate within the gel at a variable migration velocity depending on the actual charge/mass ratio. The protein charge decreases during the run, since the protein approaches gel areas where the gel pH is closer to the PI of the protein. At pH = PI the charge/mass ratio and the mobility of the focused protein are zero. IEF is used both for separation of native proteins (native IEF) and for separation of proteins denatured by high urea concentrations. Denaturation by urea, in contrast to denaturation by SDS, does not alter the intrinsic protein

R. Kellner, F. Lottspeich, H. E. Meyer (1994) Microcharacterization of Proteins, VCH Weinheim

64

H. Schagger

charge. Since the technique using urea is less restricted by individual protein properties it is the more commonly used technique, and the term "IEF" usually means "denaturing IEF". As mentioned above, electrophoresis of native proteins in non-restrictive gel matrices at fixed pH is seldomperformed. The usually insufficient separation depending on the charge/ mass ratio, however, changes to the much better separation depending on size if acrylamide gradient gels are used. Proteins initially migrate "fast" in Colourless-Native-PAGE according to their charge/mass ratios. However, they decelerate considerably when they reach areas of smaller pore size, and finally will stop completely when the pore size limits, which depend on protein size, are reached. One disadvantage persists: only proteins with isoelectric points below the pH of the gel will be separated. More basic proteins will be lost in the cathode buffer. This disadvantage can be overcome by introducing charged compounds binding to the proteins, and thereby inducing a charge shift on the proteins. The added compounds may leave proteins either in the native state (e.g. Coomassie dyes in Blue-Native-PAGE) or denature proteins, as in SDS-PAGE systems. The induced charge shift changes the situation completely. All proteins now are negatively charged, independent of the initially basic or acidic PI of the native protein, and all proteins migrate to the anode. Besides the essential differences relating to the native or denatured state of proteins, native and denaturing techniques differ as follows: in BlueNative-PAGE, the binding of negatively charged dyes to individual proteins and the charge/mass ratios are variable, whereas in SDS-PAGE the SDS-binding generally is 1.4 g SDS/g protein (with some exceptions), and the charge/mass ratio of all proteins is the same. This difference has consequences for the choice of optimal gel types (in both cases a separation according to protein size is desired). In Blue-Native-PAGE, acrylamide gradient gels are required in order to cause proteins to stop completely at their individual pore size limits, and to exclude any influence of the varying chargelmass ratio on migration velocity, migration distance, and molecular mass calibration. In SDS-PAGE, the charge/ mass range is constant. It is therefore sufficient to use less restrictive gels that merely modulate the migration velocity according to protein size, but will not stop the proteins during the run. At the constant charge/mass ratio of proteins in SDS-PAGE, the pore size of the gel becomes the only factor that sorts proteins according to their size. This allows determination of sizes (molecular masses) of running proteins. Finally, if electrophoretic mobilities of proteins are in a relatively narrow range, conditions can be found that lead to concentration of proteins applied to gels, and to formation of extremely thin protein bands (protein stacks), before the separating gel is reached. This desired "stacking" effect [Hames 19901 is only obtainable in discontinuous electrophoresis systems, i.e. with different buffer systems in cathode buffer and gel buffer, as used in all techniques described below. Denaturing electrophoretic techniques are much more common for sample preparation in protein chemistry than native techniques. Mostly they are performed on the analytical scale that is just sufficient for N-terminal protein sequencing. However, the potency of electrophoretic techniques for isolation of proteins on the preparative scale in SDSdenatured as well as in the native state deserves much more attention than it has received to date. The focus of this chapter, besides giving a survey of standard techniques, is therefore on the preparative use of recently developed techniques. The methods that we have found to be the most useful and reproducible ones are discussed (Table 1).

11.4 Gel Electrophoresis

65

Table 1. Selection of electrophoretic techniques. Denaturing techniques (Blue)-Laemmli-SDS-PAGE (Blue)-Tricine-SDS-PAGE electroelution electroblotting Isoelectric focusing (IEF) 2D: IEF / SDS-PAGE 2D: BN-PAGE / SDS-PAGE

Native techniques Laemmli-PAGE without SDS Blue-Native-PAGE (BN-PAGE) Native Electroelution Native Electroblotting Native IEF (without urea)

2 Denaturing Techniques 2.1 Commonly Used SDS-Polyacrylamide Gel Electrophoresis Techniques for Protein Separation From the many available SDS techniques [Hames 19901, the Laemmli-SDS-PAGE [Laemmli 19701, with advantages for separation of large proteins, and the tricine-SDSPAGE [Schagger 19871, with advantages for small proteins and peptides, are selected and are compared in Figure 1 using identical gel types and identical protein samples. These two techniques together cover the whole molecular mass range of proteins. Table 2 provides a guide for selection of the appropriate electrophoresis system and the optimal gel type for particular applications.

Figure 1. Comparison of resolution of tricine-SDS-PAGE and Laemmli SDSPAGE. Identical polyacrylamide gel types(lO%T,3%C) wereusedfortricineSDS-PAGE (lane I ) and Laemmli-SDSPAGE (lane 2), and identical samples (cyanogen bromide fragments of myoglobin) were applied. The advantage of tricine-SDS-PAGE for resolution of small proteins is obvious. Proteins with molecular masses above 30 kDa would be resolved better by the Laemmli-SDSPAGE. %T and %C, see Figure 2. Reprinted from [Schagger 19871, with permission fromAnalyrica1Biochemistry.

66

H. Schagger

Figure 2. Resolution of tricineSDS-PAGE using different gel types. Lane I , 10% T, 3% C; lane 2, 16.5% T, 3% C; lane 3, 16.590T, 6% C; lane 4 , 16.5%T, 6% C plus 6 M urea. % T, total concentration of both monomers (acrylamideand bisacrylamide), % C, percentage of cross-linker relative to the total concentration. Reprinted from [Schagger 19871,with permission from Analytical Biochemistry.

Table 2. Choice of gel type and separation system. Total range Wa) 6 - >250 2 - 100 1 - 70

Gel type

System

8-16% gradient 10%uniform 16.5%

Laemmli Schagger Schagger

8% 13% 10% 16.590 16..5~%(~)

Laemmli Laemmli Schagger Schagger Schagger

Optimal range (kDa) 50 -100 2 0 - 60 5 - 50 2 - 30 1 - 20

(H)High crosslinker concentration is used.


67

Gradient gels are only required if a wide range of molecular masses has to be covered. If there is no protein of interest above 100 kDa or 70 kDa, the tricine-SDS-PAGE using uniform gels (Figure 2) is recommended, because casting of gels is easier, the low molecular mass range is covered, and there are advantages in electroblotting of large membrane proteins. Uniform acrylamide gels are also preferred for separation of proteins with similar masses. In addition to using the optimal gel types and electrophoresis systems, there are several other possibilities for achieving separation in problematic cases, as described in [Schagger 1994al.

2.2 Blue-SDS-PAGE for Quantitative Protein Recovery from Gels Proteins are usually fixed and stained within the gel after conventional, colourless SDSPAGE or electroblotted to inert membranes. In other cases, especially when working on the preparative scale, recovery of purified proteins in soluble form would be advantageous, e.g. for immunization, for protein fragmentation, and further N-terminal sequencing of the fragments. However, the recovery of proteins, especially of membrane proteins from conventional SDS-PAGE, can be problematic. Membrane proteins, once fixed within the gel, can hardly be resolubilized, and the yield after electroelution may be close to zero. But how can proteins be visualized without prior fixation? There is one quite simple possibility, the staining of proteins by Coomassie dyes (Serva Blue R, G or W) during electrophoresis, as exemplified in Figure 3. The basic electrophoretic methods are either the Laemmli-SDS-PAGE [Laemmli 19701 or the tricine-SDS-PAGE [Schagger 19871. In the cathode buffers of both systems the SDS concentration is halved and 25 mg Serva blue G is added per liter. Under these conditions Coomassie dye can compete with SDS for binding sites on the protein, and proteins migrate as blue bands through the gels. The migration behaviour in the Blue-Laemmli-SDS-PAGE and in the Blue-Tricine-SDS-PAGE [Schagger 19881 is almost identical to that of the conventional colourless techniques. More than 100 pg per protein band can be resolved on a preparative gel and recovered quantitatively. The minimal protein quantity that can be detected is in the range of 0.2-1 pg per protein band. Using Blue-tricine-SDS-PAGE even 1-3 kDa peptides can be detected, which escape detection after conventional fixationhaining procedures and also pass PVDF membranes during electroblotting. Alternative methods allowing staining of peptides within gels and adhesion to PVDF membranes during electroblotting are described in [Schagger 1994al. Guidelines for Blue-SDS-PAGE: Protein bands are detected best after fast runs and short migration distances. Staining and detection are better in blue Laemmli gels than in blue tricine gels. Protein spots should be excised directly after the run, because the spots disappear after prolonged standing due to diffusion.

68

H. Schagger

Figure 3. Comparison of protein staining intensity during Blue-SDSPAGE (right) and after conventional restaining of the same gel (left).A : Proteins in the molecular mass range from 17-68 kDa were resolved by BlueLaemmli-SDS-PAGE, using a 12% T, 3% C gel. B: Proteins in the molecular mass range from 1.45-17 kDa were resolved by Blue-tricine-SDS-PAGE, using a 16.5% T, 6% C gel. The minimal load was 0.27 pg of each protein per mm2.The highest load was 2.2 pg/mm2 in A and 0.82 pg/mm2 in B . Reprinted from [Schagger 19881, with permission from Analytical Biochemistry.

2.3 Electroelution of Proteins after Blue-SDS-PAGE Electroelution [Schagger 1994a, 19881was performed with an electroelutor/concentrator made according to Hunkapiller [ 19831.A similar apparatus is commercially available from C.B.S. Scientific Company, distributed by ITC Biotechnology, Heidelberg, Germany. The H-shaped electroelutor vessel is composed of two vertical and a connecting horizontal tube. The lower ends of the two vertical arms are sealed by dialysis membranes with acutoff limit of 2 kDa (Reichelt, Heidelberg). The vessel is placed on the barrier separating the anodic and cathodic compartments which are filled with electrode buffer (0.1% Na-SDS, 100 mM NH4HC03). The blue protein bands then are squeezed through a syringe directly into the cathodic arm,and incubated for 10 min in electrode buffer. The residual volume of the vessel then is filled with 50 mM NH4HC03 (without SDS) and electroelution is performed at 40 V overnight or at maximally 70 V (considerable heating) for about 5 h. The recovery is nearly quantitative with large membrane proteins as well as with small peptides in the 1-2 kDa range. The proteins collect as a blue solution at the anodic membrane. Since only the SDS present within the elutor vessel (from gel and incubation solution) accumulates at the dialysis membrane, the proteins can be used directly for immunization or protein fragmentation [Schagger 1994al. Before fragmentation by proteases tolerating low SDS concentrations or by chemical cleavage, e.g. by CNBr, NH4HC03 is removed by lyophilization. Excess NH4HC03 would prevent further resolution of fragments by SDSPAGE.


69

2.4 Electroblotting of Blue and Colourless SDS Gels Electroblotting techniques are discussed in Chapter 11.5. One further technique seems to be worth mentioning that helps to retain many peptides in the 1-2 kDa range on PVDF membranes. Essential prerequisites are the use of the Blue-tricine-SDS-PAGEfor separation instead of the colourless SDS-PAGE, and the addition of 20% methanol to the anode buffer (Table 3). Except for electroblotting of peptides, the electroblotting of Blue-tricine-SDS gels offers no advantage. On the contrary, the dye itself occupies binding surface on the PVDF membrane, and at high load causes more protein to pass. Addition of methanol is not recommended for transfer of large proteins.

Table 3. Buffers and electroblotting conditions Anode buffer:

300 mM Tris, 100 mM tricine (+20% methanol; only for transfer of 1-2 kDa peptides)

Cathode buffer:

300 mM aminocaproic acid, 30 mM Tris

Electrotransfer of proteins from 0.7 mm tricine-SDS-gels: 1 mA/cm2, 3 h 10%acrylamide: 16.5% acrylamide: 1 mA/cm2, 5 h Voltage: 5-10 V

2.5 Isoelectric Focusing in the Presence of Urea In contrast to SDS electrophoresis, which separates proteins according to size, isoelectric focusing (IEF) separates according to isoelectric points [Righetti 19901. In IEF using soluble carrier ampholytes a pH gradient develops within the gel during the run. In IEF using ampholytescovalently attached to the gel matrix (IPG-Dalt; Pharmacia) an immobilized pH gradient preexists. Besides this technical classification of IEF techniques, IEF can be divided into native (without urea) and denaturing variants. The most commonly applied technique uses urea at a concentration of 8 M or more, and neutral or zwitterionic detergents. The detergents would not necessarily denature proteins; however, 8 M urea does. IEF in the presence of urea is therefore a denaturing technique. It is almost exclusively used as the first step in 2D electrophoresis [O’Farelll975, Rickwood 1990, Anderson 1978, Gorg 19881. Advantages of 2D electrophoresis (IEF/SDS-PAGE): Starting from cell homogenates thousands of protein spots can be identified, and it is a very valuable technique for comparative analytical studies.

70

H. Schagger

Disadvantages of 2D-electrophoresis: Preparative application can be tedious, because collection of spots from many gels might be required to obtain sufficient protein for N-terminal protein sequencing. It is not known which fraction of membrane proteins might not enter the IEF gel because of aggregation problems. Individual spots in the 2D gel cannot be assigned directly to the physiological protein assemblies. This is quite different to 2D gels using native techniques for the first dimension as described below.

3 Native Techniques The native electrophoretic techniques are used for purification of proteins instead of or in addition to chromatographic procedures. They are pre-purification steps before denaturing techniques, as described above, are finally applied.

3.1 Colourless-Native-PAGE Native electrophoresisof water-solubleproteins and protein complexes is now preferentially tried with Laemmli-PAGE [Laemmli 19701, but SDS is completely omitted. All proteins with isoelectric points (PI) below 9.5 (running pH in the gel at room temperature) will migrate to the anode; however, the applicability of this technique will be restricted to rather pH insensitive proteins. Another colourless native technique [Schagger 19941 separates proteins at pH 7.5 and therefore is restricted to acidic proteins. Steep acrylamide gradient gels, e.g. from 4 to 20% acrylamide, are used. The migration of proteins should cease when appropriate pore sizes of the gel are reached, and migration distances should theoretically no longer be determined by the intrinsic protein charge, but by the protein size. Under experimental conditions (short running times) these end points are not reached, except for those proteins with PI values below 5.3 having sufficiently high charge/mass ratios. Therefore the migration distances of the same multiprotein complexes separated by Colourless-Native -PAGE (CN-PAGE) and BN-PAGE (see below) differ, and even inversions in the order of migration distances are observed (Figure 4).

3.2 Blue-Native-PAGE Blue-Native -PAGE (BN-PAGE) is a technique that was initially developed for isolation of enzymatically active membrane proteins at pH 7.5 [Schagger 19911. One essential component of the electrophoretic system is the dye Coomassie Blue G-250 (Serva Blue G) that is added to the cathode buffer. It competes with the neutral detergents required for membrane protein solubilization for binding sites on the protein surface. Since Coomassie is negatively charged four main effects are observed:


71

Figure 4. Comparison of resolution of Blue-native-PAGE and colourless-native-PAGE using solubilized bovine heart mitochondria. One hundred micrograms total protein was applied corresponding to about 2-10 pg of the individual membrane protein complexes (complexes I-V) comprising 4 (complex 11) to 41 (complex I) protein subunits. The identification of the complexes can be performed by enzymatic assays after native electroelution or by the characteristic polypeptide patterns observed after performing a second-dimension SDS-PAGE from each lane (see Figure 5 ) . Reprinted from [Schagger 19941, with permission from Analytical Biochemistry. All proteins that bind Coomassie dye (all membrane proteins and many water-soluble proteins) migrate to the anode at the running pH 7.5, even basic proteins. The application range is extended compared to CN-PAGE. Negatively charged protein surfaces will repel each other. Therefore the aggregation problem usually observed with membrane proteins is minimized. The negatively charged membrane proteins are soluble in detergent-free solution. Therefore any detergent can be omitted from the gel, and the risk of denaturation of detergent-sensitive membrane proteins is reduced. Proteins migrate as blue bands through the gel. This facilitates the detection and recovery of native proteins. The main fields of application of BN-PAGE are: The native proteins and protein complexes are separated according to size. BN PAGE can therefore be used for determination of molecular masses and oligomeric states of microgram amounts of partially purified proteins [Schagger 19941.The high resolution not only allows discrimination between monomeric and dimeric forms, but also detection of subcomplexes in addition to holocomplexes. In this respect BN-PAGE is superior even to analytical ultracentrifugation.

72

H. Schagger

Final purification of mg amounts of partially purified membrane proteins from a single preparative gel. The highly pure proteins recovered by native electroelution are suitable, e.g., for immunization, for functional studies, and for protein chemical work. Purification of membrane proteins directly from biological membranes, e.g. from cell organelles or even from homogenized animal tissue. About 1 pg per protein band is required for detection during BN-PAGE. Proteins are recovered from the gel by native electroelution or native electroblotting [Schagger 199 1, 1994b3. Combined with tricine-SDS-PAGE, in the second dimension the subunit composition of multiprotein complexes can be studied as shown in Figure 5. The minimal protein load to the first dimension BN-PAGE then only depends on the staining limits of Coomassie or silver staining procedures. This technique is currently used for localization of respiratory chain defects in human diseases, starting from 10 mg skeletal muscle. Since the membrane protein complexes resolved in the native state in the first dimension are denatured only by the second dimension SDS-PAGE, the subunits of the multiprotein complexes can be identified by characteristic polypeptide patterns arranged in a row (Figure 5).

A

B

FIRST DIMENSION: BLUE NATIVE PAGE I

COLOURLESS NATIVE PAGE >

COMPLEXES I v

I

I

>

t

COMPLEXES

Ill

IV

II

v

I

Ill IV

t

I

I

I

I

I 1

(kDa1

- 75 - 30

-

13

Figure 5. Two-dimensional resolution of the protein subunits of the complexes from bovine heart mitochondria. After first dimension native separation by BN-PAGE ( A ) or CN-PAGE ( B ) as shown in Figure 4,complete lanes were resolved by tricine-SDS-PAGE in the denaturing second dimension. Reprinted from [Schagger 19941, with permission from Analytical Biochemistry.


73

3.3 Native Isoelectric Focusing The technique of native IEF is similar to the commonly used denaturing IEF, but urea is completely omitted. Due to the omission of urea, proteins are more prone to aggregation, especially when approaching the gel pH corresponding to their isoelectric points. Especially problematic, with respect to the problem of aggregation, is the resolution of membrane proteins. The neutral or zwitterionic detergents required for solubilization are usually not very effective in keeping proteins solubilized, and they provoke the risk of dissociation of protein complexes and irreversible denaturation. Nevertheless, native IEF has been used with great success for final purification of special membrane protein complexes on the preparative scale [Tsiotis 19931. Only after that final purification were the isolated membrane protein complexes suitable for crystallisation. The purification was performed by the technique using mobile carrier ampholytes and Ultrodex (Pharmacia) as the gel matrix.

4 References Anderson, N.G., and Anderson, N.L. (1978)Anal. Biochem. 85,33 1-354.Two-Dimensional Analysis of Serum and Tissue Proteins: Multiple Isoelectric Focusing. Gorg, A., Postel, W., and Giinther, S. (1988) Electrophoresis 9,53 1-546. The Current State of Two-Dimensional Electrophoresis with Immobilized pH Gradients. Hames, B.D. (1990). One-dimensional polyacrylamide gel electrophoresis. In: Gel Electrophoresis of Proteins (Hames, B.D., Rickwood, D.; eds.), pp.1-139, IRL Press, Oxford. Hunkapiller, M.W., Lujan, E., Ostrander, F., andHood, L.E. (1983). Isolation of Microgram Quantities of Protein from Polyacrylamide Gels for Amino Acid Sequence Analysis. In: Methods in Enzymology (Hirs, C.H.W., Timasheff, S.N.; eds.), Vol. 91, pp.227-236, Academic Press, New York. Laemmli, U.K. (1970) Nature 227,680-685. Cleavage of Structural Proteins during the Assembly of the Head of Bacteriophage T4. O’Farell, P.H. (1975) J. Bid. Chem. 25O,4007-4021. High Resolution Two-Dimensional Electrophoresis. Rickwood, D., Chambers, J.A.A., Spragg, S.P. (1990). Two-Dimensional Gel Electrophoresis. In: Gel Electrophoresis of Proteins (Hames, B.D., Rickwood, D.; eds.), pp.217-272, IRL Press, Oxford. Righetti, P.G., Gianazza, E., Gelfi, C., and Chiari, M. (1990). Isoelectric Focusing. In: Gel Electrophoresis ofProteins (Hames, B.D., Rickwood, D.; eds.), pp. 149-216, IRL Press, Oxford. Schagger, H., and von Jagow, G. (1987) Anal. Biochem. 166,368-379. Tricine-Sodium Dodecyl Sulfate-Polyacrylamide Gel Electrophoresis for the Separation of Proteins in the Range from 1 to 100 kDa. Schagger, H., Aquila, H., and von Jagow, G. (1988) Anal. Bioclzem. 173,201-205. Coomassie Blue-Sodium Dodecyl Sulfate-Polyacrylamide Gel Electrophoresis for Direct Visualization of Polypeptides during Electrophoresis. Schagger,H.,andvon Jagow, G. (1991)Annl.Biochem.199,223-231.BlueNative Electrophoresis for Isolation of Membrane Protein Complexes in Enzymatically Active Form.

74

H. Schagger

Schagger, H., Cramer, W.A., and von Jagow, G. (1994) Anal. Biochem. 217,220-230. Analysis of Molecular Masses and Oligomeric States of Protein Complexes by Blue Native Electrophoresis and Isolation of Membrane Protein Complexes by TwoDimensional Native Electrophoresis. Schagger, H. (1994a) Chapter 3: Denaturing Electrophoretic Techniques. In: A Practical Guide toMernbrane Protein Purification (Von Jagow, G., Schagger, H.; eds.), in press, Academic Press, New York. Schagger, H. (1994b) Chapter 4: Native Electrophoresis. In: A Practical Guide fo Membrane Protein Purification (Von Jagow, G., Schagger, H.; eds.), in press, Academic Press, New York. Tsiotis, G., Nitschke, W., Haase, W., and Michel, H. (1993) Photosynth. Res. 35,285-297. Purification and Crystallisation of Photosystem I Complex from a Phycobilisome-less Mutant of the Cyanobacterium Synechococcus PCC 7002.


11.5 Microseparation Techniques IV: Blotting Membranes as the Interface Between Electrophoresis and Protein Chemistry Christoph Ecke rsko rn

1 Introduction The combination of sodium dodecylsulphate polyacrylamide gel electrophoresis (SDSPAGE) with electroblotting is one of the most versatile methods for the isolation of proteins at the microgram and submicrogram level the for further protein chemical analysis 1). First, the separation method offers high resolution potential of one(Figure and two This strategy has several advantages compared to other protein isolation procedures. dimensional( 1D and 2D) PAGE. This is advantageous for complex protein mixtures, and especially for the separation of membrane proteins. First, the separation method offers the high resolution potential of one and two dimensional( 1D and 2D) PAGE. This is advantageous for complex protein mixtures, and esFecially for the separation of membrane proteins. Second, the sample handling steps are minimized due to direct transfer of the separated protein onto a membrane. Then the immobilised protein could be either directly sequenced (Vandekerckhove 1985;Aebersold 1986;Matsudaira 1987;Eckerskorn 1988a)or subjected to amino acid composition analysis [Eckerskorn 1988 a,b; Ploug 1989; Tous 1989; Nakagawa 19891, mass spectrometry [Eckerskorn 1992; Strupat 19941 and chemical or prOteolytic cleavage [Aebersold 1987; Scott 1988; Eckerskorn 1989; Jahnen 1990; Patterson 19921.

:Third, proteins purified by this technique can be prepared in a short time relatively free from contamination by other proteins, amino acids and salts. Fourth, very small amounts of proteins can be handled with high yields. The key step in this technique is the quantitative transfer and immobilization of the proteins onto a suitable membrane.

R. Kellner, F. Lottspeich, H.E. Meyer (1994) Microcharacterization of Proteins, VCH Weinheim

76

C . Eckerskorn

2 Electroblotting The term "electroblotting" means the transfer of electrophoretically separated proteins from the polyacrylamide matrix onto a "protein adsorbing" membrane in an electric field. This technique, described by Renart [1979] and at the same time by Towbin (1979), was first established as "Western blotting" to detect electrophoretically separated proteins with specific binding properties (antigens, glycoproteins, enzymes, etc.) with antibodies, lectins, substrates, etc. directly on membranes [for review see Beisiegel 19861. With the development and introduction of sufficient chemically inert membranes, proteins are now directly amenable to protein chemical analysis [for review see Aebersold 1991;Eckerskorn 19901.

2.1 Polyacrylamide Gel Electrophoresis Proteins which are to be transferred for further analysis onto a membrane can be separated with the most common gel systems according to published protocols [Laemmli 1970; O'Farrell 1975; Gorg 1988; Swank 1971; Schagger 19871 with respect to optimal separation of the protein(s) of interest. The concentration of the protein in the gel matrix should be as high as possible (concentrated, salt-reduced sample; small sample slots; thin gels). The concentration of the polyacrylamide and/or the cross-linker should be as low as possible without disturbing the separation of the desired protein. This facilitates the elution of the proteins from the gel matrix during the electrotransfer in the subsequent blotting step. For any procedure used to enrich or isolate proteins, prevention of chemical modification of the a-amino group (sequence analysis) and the reactive side groups (mass spectrometry, amino acid composition analysis) is a major concern. Sources of N-terminal blocking common to any isolation technique involvingpolyacrylamideseparation include unidentified impurities. These are contained in some batches of buffers or acrylamide and, possibly, amino-reactive moieties generated during polymerization of the acrylamide. For this reason, extensively polymerized gels should be used and each batch of chemicals should be tested with a standardized protein preparation to ensure the absence of blocking impurities.This could be done by comparingsequencingresults of aliquots of (radioiodinated) proteins (P-lactoglobulin, trypsin inhibitor) either directly subjected to Edman degradation, or sequenced after gel electrophoresis and electroblotting onto a membrane. We found no statistically significant N-terminal blockage caused by electroblotting when high-purity reagents were used (such as electrophoretic-grade reagents) and allowing the gel to stand several hours at room temperature to complete polymerization. Precautions such as preelectrophoresis prior to sample loading are usually not recommended.

2.2 Blot Systems 2.2.1 Tank Blotting The standard system for tank blotting is designed according to a construction of Bittner [ 19801 as vertical buffer tank with platinum wires as electrodes on the side walls. Gel and

11.5 Electroblotting

77

ELECTROPHORESIS (1 D-, 2D-PAGE)

J

Polyacryl-

ELECTROBLOTTING

Cleavage on the

J PROTEIN CHEMISTRY

Amino acid sequence analysis Amino acid composition analysis Mass spectrometry

1

Figure 1. Strategy for obtaining protein chemical information from electrophoretically separated proteins. 1D and 2D gel electrophoresis are the most versatile and highly resolving methods for the separation of proteins. The subsequent isolation of the separated proteins from the gel matrix in a form suitable for protein chemical analysis is a key step in the characterization of these proteins. With the development of efficient blotting techniques and the introduction of chemically inert membranes it is now possible to retain the proteins present in low quantities from the polyacrylamide gel with high yields. The immobilized proteins are suitable for direct sequence analysis, amino acid composition analysis, mass spectrometry and proteolysis. This combination of protein chemical and electrophoretic techniques makes it possible to obtain chemical information from subpicomole quantities of protein, resulting in access to a new set of biologically important proteins. membrane are assembled between a stack of filter papers and mounted into a grid cassette. Constant slight pressure has to be applied to the blot sandwich to avoid displacement (shifting) of the gel and membrane. This entire "packing procedure" has to be performed in a tube under enough buffer to avoid air bubbles between the sandwich layers. The assembled cassettes are fixed vertically in the buffer tank. Between 2 and 4 1 buffer is necessary, depending on the design of the system. Most experiments have been run with a constant voltage of about 50 V to exert uniform electrical power on the charged particles. The initial value of the current is typically 500 mA or higher, depending on the size of the tank and the molarity of the buffer used. During transfer, the value of the current increases due to a continuous increase of the electric resistance of the solution. Under these conditions very efficient cooling is necessary, which is achieved through a vertical cooling insert and sufficient buffer circulation.

78

C. Eckerskom

2.2.2 Semidry Blotting The semidry blotting system first described by Kyhse-Andersen [ 19841 and it consists of two plates as electrodes in which a blot sandwich made out of filter papers, gel and membrane is mounted horizontally. Compared to tank blotting this assembly is much simpler because no cassettes are required. The filter papers are soaked in buffer, the gel and the blotting membrane are successively arranged in layers on the anode in order (Figure 2). If necessary, air bubbles can easily be removed by rolling aglass rod gently over each layer. The amount of buffer required depends on the size of the blot sandwich and is typically less than 100 ml. Because of the considerably less amount of buffer compared to the tank blotting systems and the shorter transfer times, the proteins are exposed to fewer impurities if the buffer systems during the electrotransfer. Several companies offer different materials for the electrode plates (e.g. graphite, sintered glass carbon, metal covered with platinum, graphite dispersed in a polymeric matrix, conductive polymeric matrices). These differ in their conducting capacity, and their stability against oxidation processes at the anode and at extreme pH values (see below). In most cases, the semidry blotting experiments are performed with constant current (e.g. 1 mA/cm2 surface of the blot sandwich), resulting in low voltage values (< 5 V) at the beginning. During transfer, the voltage increases with the electrical resistance depending on the buffer system used, the absolute amount of buffer (= numbers of filter papers and the degree of saturation), the thickness of the gel, and the material of the electrode. After about 3h, the voltage typically reaches values between 20 V and 50 V. Because of the high electrical conductivities of the electrodes and the relatively low electric power, cooling of the semidry apparatus is not necessary.

cathode filter papers I

1-

_.

tF'

membrane filterpapers

Figure 2. Schematic diagram showing the assembly of the blotting sandwich. Electroblotting can be performed essentially as follows: Filter papers and the blotting membrane should be trimmed exactly to the size of the separation gel. The filter papers should be washed three times for 15 min in transfer buffer (e.g. 50 mM boric acid, adjusted to pH 9.0 with 2 M sodium hydroxide, 20 % methanol for the filter papers on the anodic side and 5% methanol for the filter papers on the cathodic side). The blotting membrane should be washed three times in 70% methanol. Immediately after electrophoresis the blot sandwich should be assembled. Electrodes should be rinsed extensively in water before use and the sandwich is built from the lower anode to the upper cathode plate as shown in the schematic diagram above. The thickness of each filter paper stack should be at least 3 mm. Air bubbles between layers can easily be avoided by gently rolling a glass rod over each layer. Typical electrotransfer can be performed with constant current (1 mA/cm*) for 3-5 h.


79

The electrochemical reaction of water generates a pH gradient (Figure 3 ) of ca. pH 12 at the cathode (4 H,O f 4 e. ---> 2H, + 4 OH-) and ca. pH 2 at the anode (6 H,O -> O2 + 4 H,O++4 e-)and acontinuous gas evolution. The gas expansion causes the blot sandwich to expand and thereby increases the electrical resistance. This results in a rapid and non reproducible increase of the electrical field (Figure 4). This effect can be reduced by loading a top weight of approximately 2 kg onto the semidry apparatus (Figure 2 ) . The stability of the electrodes varies greatly: all pure graphite electrodes and electrodes with graphite dispersed in a polymeric matrix are oxidized from the oxygen originating at the anode to CO,. Some of the polymeric matrices are dissolved at the surface due to the high pH at the cathode. Platinum electrodes or electrodes covered with platinum are almost completely stable. Over the last few years, semidry blotting has become more popular than tank blotting for several reasons: the reduction of buffer consumption and minimization of potential reactive impurities, reduction of air bubbles due to the straightforwardness of assembly, and the significant reduction of transfer time and heat development. In addition to the much greater ease in handling the semidry apparatus, a systematic comparison of the two systems by Tovey and Baldo [ 19871 showed more efficient protein transfer in a more homogeneous voltage field in semidry blotting. The authors also described slightly more sensitive staining of proteins on the membranes after semidry blotting, because the proteins obviously stack more at the membrane surface under these transfer conditions. The following experiments all refer to semidry blotting.

Anode

Cathode 12

10

4

2 0

cathodic filter pape-rs gel

~

membrane

anodic filter papers

Figure 3. PH values of filter papers in a semidry blotting experiment. The pH values of individual filter papers (thickness 1 mm) were determined after 1 h transfer time. The thickness of the polyacrylamide gel was 1.5 mm. The transfer buffers were (+): 10 mM 3cyclohexylamino-1-propanesulphicacid (CAPS), pH 1 1, 10 % methanol, (I): 50 mM Naborate, pH 9, 20 % methanol (0):bidistilled water.

80

C. Eckerskom

transfer time [h]

Figure 4. Progression of the voltage during a semidry blotting experiment. The progression of the voltage of a standard blotting assembly between pure graphite electrodes (compare Figure 2) was measured without a top weight (upperpicture) and with a top weight of 2 kg (lowerpicture) mountedonto the semidry apparatus. Blotting was performed with constant current ( lmA/cm2).

2.3 Blotting Parameters 2.3.1 The Blotting Process During SDS-PAGEproteins migrate in the polyacrylamidematrix as protein-SDS complexes at a velocity of several centimeters per hour. Precipitation is prevented by SDS bound to the proteins. SDS, through its strongly acidic sulphate groups, also provides the main driving force for protein movement through the gel matrix in an electrical field. The required constant reloading of SDS molecules on the proteins is guaranteed by a sufficient concentration of SDS in all electrophoresis buffers. Unlike the electrophoresis buffers, electroblotting buffers cannot be supplemented with significant amounts of SDS, because even low concentrations of this detergent prevent protein adsorption to the membranes. Therefore, in the blotting process the elution of the protein from the gel matrix towards the membrane surface is driven by the SDS bound during the electrophoresis to the proteins and free SDS molecules from the electrophoresis buffer remaining in the gel volume. During blotting the protein-SDS complexes migrate towards the anode in a medium of continuously decreasing concentration of free SDS. In different protein-SDS complexes, binding of the detergent depends on the SDS-to-protein ratio [Putnam 194.51. Therefore, a decreased SDS concentration will reduce the SDS concentration on the protein surface, disclosing hydrophobic sites which may interact with the hydrophobic membrane. Low molecular weight proteins bind to a lower degree because


81

the protein-SDS complexes migrate faster, so that the amount of SDS in these complexes is not sufficiently decreased. As aresult of the dissociation of SDS, migration of the proteinSDS complexes will slow down and some of the proteins, especially high molecular weight proteins, may even become insoluble and precipitate in the gel. The physical properties of the proteins have an additional important influence on the blotting efficiency. The migration velocity is dependent on the size of the proteins. The strength of the interaction between proteins and SDS, polyacrylamide, and membrane surface depends on the amino acid composition and sequence. A detailed analysis of the blotting parameters and a postulated blotting mechanism is described by Jungblut [19901.

2.3.2 Transfer Buffers A series of blotting buffers is described in the literature:

25 mM N-ethylmorpholine, pH 8.3, 0.5 mM dithiothreitol [Aebersold 19881. 25 mM Tris/HCl, 10 mM glycine, pH 8.3, 0,5 mM dithiothreitol [Aebersold 19861. 50 mM Na-borate, pH 8.0,0.02%13-mercaptoethanol,20%methanol [Vandekerckhove 19851. 10 mM 3-cyclohexylamino-1-propanesulphicacid (CAPS), pH 11, 10% methanol [Matsudaira 19871. 50 mM Na-borate, pH 9,20% methanol [Eckerskorn 1988). The choice of blotting buffer is not critical if buffers with similar ion strength are used. The influence of the molarity of the transfer buffers was analysed in detail for Na-borate buffers by Jungblut [ 19901.The authors found a significantly decreased transfer yield if the molarity of Na-borate was reduced to less than 10 mM or increased above 100 mM. The pH of the transfer buffer is mainly determined - as described above - with increasing transfer time by the electrochemical reaction of the water at the electrodes. The pH between gel and membrane was maintained around 8.3 due to the (limited) buffer capacity of the running electrophoresis buffer remaining in the gel matrix. The effect of the extreme pH values could be minimized for a gel with a thickness of 1.5 mm if the filter stacks at both electrodes were at least 3 mm and the pH of the Na-borate buffer was 2 9. Another way to reduce the influence of the electrochemical reactions on the transfer pH is to use discontinuous buffer systems, eg. 300 mM Trisl20 % methanol, pH 10.4 [Kyhse-Andersen 19841for the filter papers directly contacting the anode to neutralize the protons produced during blotting.

2.3.3 Addition of SDS in SDS-free transfer buffer hydrophobic proteins, especially membrane proteins with a reduced solubility, and proteins with high molecular weight elute out of the gel matrix in only small amounts. The majority of the proteins remain as precipitates in the gel. The addition of small amounts of SDS (up to 0.01 %) to the transfer buffer for the cathodic site leads to continuous subsequent delivery of SDS. In this way most of these hydrophobic proteins are maintained sufficiently in solution to achieve nearly quantitative elution out of the gel. However, for the reason already discussed above, SDS generally prevents protein adsorption to the membrane, and very low transfer yields were obtained, especially for the hydrophilic and small proteins.

82

C . Eckerskorn

2.3.4 Addition of Methanol With the addition of methanol, a significant increase of transfer yields to the membranes was obtained. The stability of the SDS-protein complexes is influenced by methanol, as they dissociate more easily at increasing concentrations of methanol. The dissociated SDS molecules are subtracted towards the anode, which leads to an increasing interaction between transferring protein and membrane as well as between protein and polyacrylamide. With the decrease of free SDS molecules during the blotting process, both the solubility and the velocity of protein-SDS migration in the electrical field are influenced. Because of this "retarding" effect of methanol, small proteins and peptides can be more efficiently transferred if methanol is added (up to 40 %) to the cathodic transfer buffer, whereas for hydrophobic or large proteins, methanol on the cathode side should be avoided. The hydrophobic, uncharged blotting membranes require the presence of methanol (10 - 20%) to render the interaction between membrane surface and protein possible.

2.3.5 Influence of Protein Concentration Beside the physico-chemical properties of individual proteins (molecular weight, charge, charge distribution, etc.), the influence of the blotting parameters is also dependent on the concentration of the proteins in a given volume of gel. A low protein concentration in a gel band leads to a high ratio of the constituents of the electrophoretic running buffer (SDS, Tris, glycine) to the protein and vice versa. A (nearly) quantitative transfer can only be expected if the blotting conditions are optimized for the particular protein concentration. In a 2D separation of complex protein mixtures derived from cell lysates, for instance, the ratio of protein to gel matrix is determined by the abundance of the corresponding protein in the cell. To obtain an overview of transfer yields of proteins with different sizes, charges and concentrations, 14C-labelledprotein lysates of liver cells were separated in a 2D gel and blotted [Jungblut 19901.The transfer yields of 50 arbitrarily selected proteins was between 60% and loo%, quantified by autoradiography. About 80% of the evaluated proteins gave transfer yields between 75% and 85%.

3 Blotting Membranes A variety of membranes for blotting have been introduced (Table 1). These supports differ in their composition and texture, being either glass fibre based, modified with positively charged organic groups [Vandekerckhove 1985; Aebersold 19861 or hydrophobic and uncharged [Eckerskorn 1988a1, or being pure organic polymers such as polyvinylidene fluoride [Pluskal 1986; Matsudaira 19871 or polypropylene [Lottspeich 19891. However, with the increasing number of membranes available, determining which membrane is most suitable for further protein chemical analysis becomes more and more important. Many investigations have been performed to elucidate the parameters for high protein transfer onto membranes and to improve the initial and repetitive yields in protein sequence analysis of electroblotted proteins [Xu 1988; Eckerskorn 1988a, 1990, 1991; Moos 1988; Walsh 1988; Lottspeich 1989, 1990; Jungblut 1990; Jacobsen 1990; Lissilour 1990; LeGendre 1990; Baker 1991; Aebersold 1991; Mozdzanowski 1992; Reim 19921. The key to


83

understanding these divergent results is comprehension of the nature of the membranes and the parameterswhich influence successfulblotting and protein analysis. Structuralparameters including specific surface area (Table 2), pore size distribution and pore-volumes (Table 3), and permeabilities of different solvents have been analysed and allow discrimination between membranes relative to their accessible surfaces and membrane densities [Eckerskorn 19931. Protein binding capacities as well as protein recoveries in electroblotting correlate with these membrane properties. Almost quantitative retention of proteins during electroblotting from gels was obtained for membranes with a high specific surface area and narrow pores (Trans-Blot, Immobilon PSQ, Fluorotrans), whereas membranes with a relatively low specific surface area (Immobilon P, Glassybond) showed reduced recoveries of about 10-20 % for the tested proteins [Eckerskorn 19931.

Table 1. Hydrophobic blotting membranes suitable for the isolation of proteins from ployacrylamide gels for a subsequent protein chemical analysis. Name

Description

Manufacturer / Distributor

QA-GF

glass fibre membranes covalently modified with silanes

PGCMl

glass fibre membranes coated noncovalently with polybases

Life Science Products, Janssen, Belgium

Glassybond

glass fibre membranes covalently

Biometra, Gottingen, Ger

Immobilon P

Polyvinylidenefluoride

Millipore, Bedford, USA

Immobilon PSQ Polyvinylidenefluoride

Millipore, Bedford, USA

Fluorotrans


Pall, Dreieich, Ger

Problott


PE-ABI, Foster City, USA

Trans Blot


Bio-Rad, Munich, Ger

Westran

Polyviny lidenefluoride

Schleicher & Scull, Dassel, Ger

Selex 20

Polypropylene

Schleicher & Schull, Dassel, Ger

SM 17558

Polypropylene

Satorius, Gottingen, Ger

SM 17507

Polypropylene


SM 17506

Polypropylene


~~

84

C. Eckerskom

Table 2. Specific surface areas and thickness of blotting membranes. The specific surface area is related to 1 m2 geometric surface area of the corresponding membrane [Eckerskorn 19931.

Membrane

Specific surface area (m2)

Thickness (Pm) ~

Trans-Blot

2900

140

Immobilon PSQ

1900

195

Fluorotrans

1600

140

Selex 20

900

45

SM 17.507

880

so

SM 17506

810

60

SM 17558

610

100

Westran

570

1so

Immobilon P

380

130

Glassybond

130

3 15

In the same study standardized, radioiodinated protein samples were used to quantitatively assess initial and repetitive sequencing yields of either electroblotted proteins or proteins loaded by direct adsoption. The results showed that for the tested membranes. their different permeabilities for solutions of the Edman chemistry have a major influence on initial yields. The glass fibre based membranes with an extremely low flow restriction produced consistently higher initial yields irrespective of the mode of application of the protein (spotted or electroblotted) or the application of the membranes into the cartridge (discs or small pieces). In contrast, the polymeric membranes showed decreasing initial yields with increasing membrane density for spotted and electroblotted proteins. Yields varied considerably when the membranes were applied as discs into the cartridge. This effect could be minimized if the tnembraties were cut into pieces as small as possible, as demonstrated for electroblotted proteins.


85

Table 3. Pore size distribution of blotting membranes [Eckerskorn 19931

Membrane

Pore Size Minimum (Distributer) Pore Size [Pml [PI

Maximum Pore Size [Wl

Mean Pore Pore Size Volume

[WI

[%I

SM 17558

0.10

0.107

0.223

0.136

62.1

Immobilon PSQ

0.10

0.169

0.447

0.248

78.1

Selex 20

0.20

0.190

0.533

0.269

72.0

SM 17507

0.20

0.203

0.655

0.278

70.4

Trans-Blot

0.20

0.225

0.532

0.315

75.2

Fluorotrans

0.20

0.232

0.572

0.333

77.1

SM 17506

0.45

0.323

0.880

0.393

80.1

Immobilon P

0.45

0.517

1.136

0.692

68.4

Westran

0.45

0.571

1.361

0.779

62.7

>>

1.921

7.263

2.296

93.3

Glassybond

For amino acid composition analysis no significant influence of the membrane used was observed. All hydrophobic polymer membranes are directly compatible with acid hydrolysis of the immobilized protein and subsequent quantification of the amino acids liberated [Lottspeich 19941.A special application is shown in Figure 4. Electroblotted proteins were identified by comparison of the experimentally determined amino acid composition with a data set derived from a protein database (Figure 5). In experiments with MALDI-MS (matrix-assisted laser desorptionhonization mass spectrometry) of electroblotted proteins direct from the membrane surface, the choice of the membrane plays an important role [Strupat 19941. The authors obtained better quality mass spectra from membranes which exhibit high specific surfaces and low mean pore sizes. The spectra of the tested proteins document that, despite comparable performance with respect to signal intensities, shot-to-shot reproducibility and signal-to-noise ratio, a considerably smaller peak width is observed for a membrane with a high specific surface.

86

C. Eckerskorn

Figure 5. Identification of proteins after 2D-PAGE and electroblotting by amino acid composition analyis [Eckerskorn 1988bl. Patterns of mouse brain proteins obtained by ( A ) 2D-PAGE and ( B )electroblotting onto siliconized glass fibre membranes. The immobilized proteins were stained with Coomassie Blue, and twelve protein spots were then subjected to both Edman degradation and amino acid analysis. Proteins were identified by comparison of the experimentally determined amino acid composition with a dataset derived from the Protein Identification Resource (PIR) protein database. Eight out of twelve proteins tested were identified by amino acid composition analysis and confirmed by N-terminal sequence analysis. [Spot 1: Serum albumin, Spot 2: Hemoglobin, a-chain, Spot 3: Hemoglobin P-chain, Spot 4: no homology found, Spot 5: dehydrogenase (GAPDH), Spot 6:no homology found, Spot 7: creatine kinase, Spot 8: no homology found, Spots9 and 10: Triosephosphate isomerase, Spots 11 and 12: no homologies found.]


87

4 References Aebersold, R., Teplow, D.B., Hood, L.E., Kent, S.B. (1986) J.Biol.Chem. 261,4229-4239. Electroblotting onto activated glass: High efficiency preparation of proteins from analytical sodium dodecyl sulfate-polyacrylamide gels for direct sequence analysis. Aebersold, R. H., Leavitt, J., Hood, L. E., Kent, S. H. (1987) Proc.Natl.Acad.Sci.USA 84,6970-6974. Internal amino acid sequence analysis of proteins separated by one- or two-dimensional gel electrophoresis after in situ protease digestion on nitrocellulose. Aebersold, R. (1991). In: Advances in Electrophoresis (Chrambach, A.; Dunn, M. J.; Radola, B.J.; eds.), pp 81-168, VCH Verlag, Weinheim. Baker, S. C., Dunn, M., Yacoub, M. H. (1991) Electrophoresisl2,342-348.Evaluation of membranes used for electroblotting for direct automated mirosequencing. Beisiegel, U. (1986) Electrophoresis 7,l-18. Protein blotting. Bittner, M., Kupferer, P., Morris, C. F. (1980)AnaLBiochem.102,459-471. Electrophoretic transfer of proteins and nucleic acids from slab gels to diazobenzyloxymethyl cellulose or nitrocellulose sheets. Eckerskorn, C., Mewes, W., Goretzki, H.W., Lottspeich, F. (1988a) Eur.J.Biochem. 176,509-5 19. A new siliconized-glass fiber as support for protein chemical analysis of electroblotted proteins. Eckerskorn, C., Jungblut, P., Mewes, W., Klose, J., Lottspeich, F. (1988b) Electrophoresis 9,830-838. Identification of mouse brain proteins after two-dimensional electrophoresis and electroblotting by microsequence analysis and amino acid composition. Eckerskorn, C. und Lottspeich, F. (1989) Chromatographia 28,92-94. Internal amino acid sequence analysis of proteins separated by gelelectrophoresis after tryptic digestion in the polyacrylamide matrix. Eckerskorn, C. und Lottspeich, F. (1990a) Electrophoresis 11,554-561. Combination of two-dimensional gel electrophoresis with microsequencing and amino acid composition analysis: Improvement of speed and sensitivity in protein characterization. Eckerskorn, C., Lottspeich, F. (1990b) J.Prot.Chem. 9,272-273. The initial yield in automated Edman-degradation depends on the choice of the membrane and the mode of protein application. Eckerskorn, C., Lottspeich, F. (1991) The initial yield in automated Edman-degradation depends on the choice of the membrane and the mode of protein application. In: 2 - 0 PAGE "91 (Dunn, M.J., ed.), pp 111-115, Zebra Printing, London. Eckerskorn, C., Strupat, K., Karas, M., Hillenkamp,F.,Lottspeich,F. (1992)Electrophoresis 13,664-665. Matrix-assisted laser desorptionlionisation mass spectrometry of proteins electroblotted after polyacrylamide-gelelectrophoresis. Gurg, A., Postel, W., Gunther, S. (1988) Electrophoresis 9,531-546. The current state of two-dimensional electrophoresis with immobilized pH gradients. Gultekin, H., Heermann, K. H. (1988) AnaLBiochem. 172,320-329. The use of polyvinylideneflouride membranes as a general blotting membrane. Jacobson, G., Karsnh, P. (1990) Electrophoresis 11,4642. Important parameters in semidry electrophoretic transfer. Jahnen, W., Ward, L. D., Reid, G. E. Moritz, R. L., Simpson, R. J. (1990) Biochem.Biophys.Res.Com. 166,139- 145.Internal amino acid sequencing of proteins by in situ cyanogen bromide cleavage in polyacrylamide gels.

88

C. Eckerskorn

Jungblut., P., Eckerskorn, C., Lottspeich, F., Klose, J. (1990) Electrophoresis 11,581-588. Blotting efficiency investigated by using two-dimensional electrophoresis, hydrophobic membranes and proteins from different sources. Kyhse-Andersen, J., J. (1984) Biochem.Biophys.Methods 10,203-209. Electroblotting of multiple gels: a simple apparatus without buffer tank for rapid transfer of proteins from polyacylamide to nitrocellulose. Laemmli, U. K. (1970) Nature 227, 680-685. Cleavage of structural proteins during the assembly of the head of bacteriophage T4. LeGendre, N. (1990) BioTechniques 9,788-805. Immobilon-P transfer membrane: Applications and utility in protein biochemical analysis. Lissilour, S., Godinot, C. (1990) BioTechniques 9,397-401. Influence of SDS and methanol on protein electrotransfer to Immobilon P membranes in semidry blot systems. Lottspeich, F., Eckerskorn, C., Grimm, R. (1994) Amino acid analysis on microscale from electroblotted proteinsh: Cell Biology: A laboratory handbook (Celis, J.; ed.), in press. Lottspeich, F., Eckerskorn, C. (1989) Two dimensional separated ptoteins are available for protein chemical analysis on microscale. In: Electrophoresis forum '89 (Radola, B.; ed.), p.72-83 Technical University Munich, Munich. Lottspeich, F. (1990) J. Prot. Chem. 9, 268-269. Initial yield in amino acid sequence analysis is a surface dependent phenomen. Matsudaira, P. (1987) J.Biol.Chem. 262,10035-10038. Sequence frompicomolequantities of proteins electroblotted onto polyvinylidene fluoride membranes. Moos, M., Nguyen, N. Y., Liu, T-Y. (1988) J.Biol.Chem. 263,6005-6008. Reproduciblen high yield sequencing of proteins electrophoretically separated and transferred to an inert support. Mozdzanowski, J., Speicher, D. (1992)AnaL Biochem. 207,ll-18. Micosequence analysis of electroblotted proteins: Comparison of electroblotting recoveries using different types of PVDF membranes. Nakagawa, S., Fukuda, T. (1989)AnaLBiochem.181,75-78. Direct amino acid analysis of proteins electroblotted onto polyvinylidene fluoride membranes from sodium dodecyl sulfate-polyacrylamide gel. Patterson, S. D., Hess, D., Yungwirth, T., Aebersold, R. (1992) AnaLBiochem. 202,193203. High-yield recovery of electroblotted proteins and cleavage fragments from a cationic polyvinylidene fluoride-based membrane. O'Farrell, P. H. (1975) J. Biol.Chem. 250,4007-4021. High resolution two-dimensional electrophoresis of proteins. Ploug, M., Jensen, A. L., Barkholt, V. (1989)AnaLBiochem.181,33-39. Determination of amino acid compositions and NH,- terminal sequences of peptides electroblotted onto PVDF membranes from tricine-sodium dodecyl sulfate-polyacryl-amide gel electrophoresis: Application to peptide mapping of human complement component C3. Pluskal, M. F., Przekop, M. B., Kavonian, M. R., Vecoli, C., Hicks, D. A. (1986) BioTechniques 4, 272-282. A new membrane substrate for western blotting of proteins. Putnam, F. W., Neurath, H. (1945) J.Biol.Chem. 159,195-209. Interaction between proteins and synthetic detergents 11. Electrophoretic analysis of serum albumin-sodium dodecyl sulfate mixtures. Renart, J., Reiser, J., Stark, G.R. (1979) Proc.Nat1Acad.Sci. USA76,3116 -3120. Transfer of proteins from gels to diazobenzyloxymethyl-paper and detection with antisera: A method for studying antibody specifity and antigen structure.


89

Scott, M.G., Crinimins, D.L., Mc Court, D.W., Tarrand, i.i.,Eyerman, M.C., Nahm, M.H. (1988) Biochem.Biophys.Res.Com. 155,1353- 1359.A simple in situ cyanogen bromide cleavage method to obtain internal amino acid sequence of proteins electroblotted to polyvinylidene fluoride membranes. Schagger, H., von iagow, G. (1987)AnaLBiochem.166,368-379. Tricine-sodium dodecyl sulfate-polyacrylamide gel electrophoresis for the separation of proteins in the range from 1 to 100 kDa. Strupat, K., Karas, M., Hillenkamp, F., Eckerskorn, C., Lottspeich, F. (1994) Anal.Clzem. 66,464-470. Matrix-assisted laser desorptiodionisation mass spectrometry of proteins electroblotted after polyacrylamide-gelelectrophoresis. Swank, R., Munkres, K. (1971) AnaLBiochern. 39,62-477. Molecular weight analysis of oligipeptides by electrophoresis in polyacrylamide gel with sodium dodecyl sulfate. Tous, G. I., Fausnaugh, J.L., Akinyosoye, O., Lackland, H., Winter-Cash, P., Vitoria, F. J., Stein, S. (1989) Anal.Biochem. 179,50-55. Amino acid analysis on polyvinylidene fluoride membranes. Tovey, E. R., Baldo, B. A. (1987) Electrophoresis 8,384-387. Comparison of semi-dry and conventional tank-buffer electrophotransfer of proteins from polyacrylamide gels to nitocellulose membranes. Towbin, H., Staehelin, T., Gordon, J. (1979) Proc.Nat1.Acad.Sci. USA 76,4350 -4356. Electrophoretic transfer of proteins from polyacrylamide gels to nitrocellulose sheets: Procedure and some applications. Vandekerckhove, J., Bauw, G., Puype, M., Van Damme, J., Van Montagu, M. (1985) Eur.J.Biochem. 152,9-19. Protein-blotting on polybrene-coated glass-fiber sheets: A basis for acid hydrolysis and gas-phase sequencing of picomole quantities of protein previously separated on sodium dodecyl sulfate-polyacrylamide gel. Walsh, M. J., McDougall, J., Wittmann-Liebold, B. (1988) Biochemistry 27,6867- 6876. Extended N-terminal sequencing of proteins of archaebacterial ribosomes blotted from two-dimensional gels onto glass fiber and polyvinylidenefluoride membrane. Xu, Q-Y., Shively, E. (1988) AnnLBiochem. 170,19-30. Improved electroblotting of proteins onto membranes and derivatized glass-fiber sheets.

Section 111: Amino Acid Analysis


111.1

Amino Acid Analysis Roland Kellner; Helmut E. Meyer and Friedrich Lottspeich

1 Introduction Amino acid analysis is used routinely to estimate the amount and to determine the composition of proteins, peptides or free amino acids. It involves two stages: complete hydrolysis of proteins and peptides, followed by quantitation of the released amino acids. The steps are laborious and time-consuming, and there is a continuing need to improve the technique. Whereas in the past nanomoles ( M) of sample were available for amino acid analysis, researchers today frequently have only picomole amounts (10 l2 M). At this low level contaminations present in chromatographic buffers, on glass surfaces and in hydrolysis acids make accurate analysis difficult. Todays demands on precision and sensitivity of amino acid analysis are a challenge to automation and instrument design. The first part of the story of amino acid analysis was written by W. H. Stein and S. Moore. In the 1940s they worked out amino acid separations on starch columns [Moore 19481. Starch columns had the disadvantage that fluids of high salt content required desalting prior to chromatography and the capacity was relatively low [Stein 19511. Therefore they turned to the sulphonated polystyrene resin Dowex-50 for analytical experiments (0.9 x 100 cm column). Stein and Moore reported that an amino acid analysis of a protein hydrolysate could be completed in five days and that the synthetic resin eluted in about half the time required with starch [Moore 19511. Asearly as in 195 1 they discussed lithium und potassium citrate buffers and the effects of variations in resin and eluent. It was only the amount of amino acid ( 3 - 6 mg) they analysed that differed notably from today. Only three years later a synthetic mixture of SO components (0.05 - 0.20 mg) was chromatographed using a lower cross-linked resin (Dowex 50-X4, 4% cross-linking), a pH and ionic strength gradient, a longer column (0.9 x 150 cm) and a modified photometric ninhydrin detection method [Moore 1954 a,b]. Applications with blood plasma, tissue extracts, and protein hydrolysates indicated successful results [Stein 19.54; Tallan 1954; Hirs 19541. In 1958 Moore, Spackman and Stein reported on chromatographic improvements, achieving a separation within 24 h [Moore 19581. But even more important was the introduction of the first "instrument for automatically recording the ninhydrin color value of the effluent from ion exchange columns" [Spackman 19581. The quantitative determination of amino acids down to 100 nmol (ca. 0.1 mg) with a precision of 3% was shown. This became the classical system for amino


94

R.Kellner, H.E. Meyer and F. Lottspeich

acid analysis. In 1972 Stein and Moore were awarded the Nobel Prize for Chemistry for their work [Manning 19931. Amino acid residues are small and polar compounds which are difficult to handle for most separation techniques except for ion exchange chromatography. Furthermore, these molecules possess almost no UV or fluorescence activity that would allow their detection. The specific derivatization of amino acids causes a substantial change in their chromatographical behaviour as well as in their detectability. Throughout the 1970s, developments in chromatography and new derivatization chemistries provided alternative methods of amino acid analysis. High performance liquid chromatography gained wide popularity for the analysis of complex organic, synthetic and biological compounds. In particular, the reversed-phase mode initiated work with new amino acid derivatives, and provided an increase in speed and sensitivity. Analysis times were significantly reduced using reversed-phase columns because the rigidity of packing materials and the high resolution of the columns allowed higher flow rates and efficient separations. In addition, the detection characteristics of several new amino acid derivatives enabled far higher sensitivity to be reached. Nevertheless, both techniques, ion exchange chromatography with ninhydrin detection and precolumn derivatization with reversed-phase HPLC, are of importance for amino acid analysis today and will be discussed in the following pages.

2 Sample Preparation In contrast to Edman degradation, where unwanted contaminants are lost in the first cycles, amino acid analysis has to contend with their presence in every sample. Hence, in analysing in the low picomole range, sample contamination is of primary concern, and the need arises to coordinate the complete work-up procedure, and to optimize handling and chemicals. Therefore, all glassware must be muffled for at least 12 h at 400 "C to destroy all possible amino acid contaminants. All chemicals for hydrolysis, derivatization and chromatography must be tested for high sensitivity amino acid analysis.

2.1 Peptides and Proteins The first step in the process to determine the amino acid composition of a protein or peptide is the release of single amino acid residues by cleaving the peptide bonds. Hydrolysis of peptide bonds may be done either by enzymatic or chemical cleavage.

2.1.1 Enzymatic Hydrolysis For complete enzymatic digestion both endo- and exoproteases are required, e.g. incubation with subtilisin is followed by leucine aminopeptidase, prolidase and carboxypeptidase. The proteases used should be of broad specifity and resistant to autolysis (for details see [Kay 19761).Complete enzymatic hydrolysis is performed only in special cases, and has gained no widespread application.

111.1 Amino Acid Analysis

95

2.1.2 Acid Hydrolysis The standard hydrolysis procedure using 6 N HC1 at 110 "C for 24 h [Moore 19631 is performed for most purposes. Obviously the procedure can be varied in regard to a number of parameters: acid, temperature, time, and gas- or liquid-phase mode can be used, and different scavengers are available. Modified hydrolysis methods are of interest to overcome the problems brought about by the different properties of individual amino acids. In general, losses of 5 4 0 % of serine, threonine and tyrosine, and 50- 100% of cysteinelcystine, tryptophan, amino sugars and phosphorylated amino acids are expected with standard conditions. Asparagine and glutamine are destroyed during hydrolysis due to conversion of the amide group to a carboxy function, yielding aspartic acid and glutamic acid, respectively. Hydrolysis is the most crucial step and the most susceptible to contamination or loss of sample. A serious amount of background contamination is introduced when digesting a sample in liquid acid because of surface contamination and the quality of chemicals. Thus, the decision whether to perform gas- or liquid-phase hydrolysis is related to the sensitivity of the analysis because the ratio between sample amount and background contamination is limiting: with high sensitivity less contamination can be tolerated. Gasphase hydrolysis is the method of choice when analysing minute amounts of sample, and even then the HC1 used for hydrolysis must be of the highest purity grade available. Standard hydrolysis conditions are a compromise in terms of time and temperature because there are variations in the rate of peptide bond cleavage depending on the amino acids involved and differences in amino acid stabilities. Sample hydrolysis times at 110 "C can range from 20 h, giving minimal loss of sensitive residues, to 96 h, giving nearly complete release of amino acids from hydrophobic linkages. Hydrolysis time may be shortened by raising the temperature. High-temperature short-time hydrolysis is performed at 145 "C in 4 h [Gehrke 19851 or at 165 "C in 1.5 h [Dupont 19881. In combination with acid mixtures even a 15 min hydrolysis (160 "C, propionic acid/HCl 1:1 [Westall 19741) and a 25 min hydrolysis (166 "C, TFNHCl 1:2 [Tsugita 19821) have been described. A time study is needed using different hydrolysis times to determine the exact composition of a protein. For this purpose gas-phase hydrolysis is performed on triplicate samples at 110 "C for 24,48 and 72 h. Extrapolation to 0 h hydrolysis time is then used for serine, threonine and other instable amino acids, whereas extrapolation to 100 h hydrolysis time is applied for hydrophobic amino acid residues which are slowly released, like valine, isoleucine or leucine. By using organic acids for hydrolysis a suitable method for recovering tryptophan was developed. Mercaptoethanesulphonic acid [Penke 1974, Maeda 19841,methane sulphonic acid [Simpson 1976, Inglis 19831, and toluene sulphonic acid [Liu 19711 have been used and yields of up to 93% for tryptophan obtained. Once again hydrolysis is performed at about 110 "C and recoveries of all the other amino acids are good. When organic acids are used for hydrolysis, evaporation is neither feasible nor necessary. Hydrolysates may be simply neutralized with sodium hydroxide and then directly applied to the column. Especially with tryptophan it is necessary to ensure that oxidants in the hydrolysis acid are minimized, in particular dissolved oxygen. Tryptophan is decomposed by a freeradical auto-oxidation mechanism [Hunt 19861. Oxygen can be removed from the

96


hydrolysis vial by evacuation prior to tube sealing and/or by saturation of the acid as well as the hydrolysis vial with inert gas before closing. The addition of various anti-oxidants to the hydrolysis acid is also helpful. Scavengers like phenol (1%) [Moore 1963; Muramoto 19901, thioglycolic acid (0.1 - 1.0%, v/v) [Matsubara 1969; Yano 1990],2-mercaptoethanol(0.1%) [Ng 19871, tryptamine (3-(2aminoethy1)indole [Simpson 19831 or sodium sulphite [Swadesh 19841 are added to a 6 N HCl solution. Due to the incompatibility of cysteine, asparagine and glutamine with acid hydrolysis, determination of these residues needs a special pretreatment. Cysteine and cystine must either be converted with performic acid, yielding cysteic acid [Hirs 19631, or prior to hydrolysis the protein to be analysed is reduced with a thiol and then an alkylation is performed either with iodoacetic acid [Hirs 19631 or 4-vinylpyridine [Raftery 19661. The amide groups from asparagine and glutamine can be converted with bis-( 1 , l trifluoroacetoxy)-iodobenzene to their corresponding diaminopropionic and y-butyric acid residues [Soby 19811. The number of residues is determined by the difference in aspartic and glutamic acid with and without pretreatment. Proteins may be isolated by blotting onto a membrane, and the blotted materials used for further characterization such as N-terminal sequencing and/or amino acid composition. Matrices suitable for that purpose are PVDF (polyvinylidene fluoride) or modified glass fibre filter. Hydrolysing of membrane bound proteins is performed under quite similiar conditions as described above [Nakagawa 1989; Eckerskorn 19881.

2.1.3 Alkaline Hydrolysis Alkaline hydrolysis is performed in order to facilitate the recovery of tryptophan, and is appliedespecially with complex samples like food, when hydrolysis with methanesulphonic acid is not suitable. The reagents usually employed are barium, sodium or lithium hydroxide, applied at a 4 M concentration [Hugli 1972, Knox 1970, Spiess 19811. Once more, hydrolysis is done at 110 "C for between 18 and 70 h depending on the sample. However, this technique does present technical and manipulative difficulties. Special reaction vessels have to be used together with alkali, since ordinary glass tubes are etched, releasing silicates and promoting side reactions. After hydrolysis the reagent itself must be neutralized and barium ions have to be removed before analysis, either by precipitation as the carbonate or as the sulphate. Losses of amino acids due to incurring adsorption to the precipitated barium salt will occur. Sodium or lithium hydroxide may be simply neutralized with hydrochloric acid; however, this increases the volume of the amino acid solution and the application to the analyser column may become difficult. The method also causes partial or complete destruction of tryptophan and other amino acids. Base hydrolysis is also used in the study of phosphoproteins. The phospho-amino acids of serine, threonine and tyrosine show a different stability under alkaline conditions. Especially phosphoserine will mostly be destroyed, whereas phosphothreonine and phosphotyrosine are relatively stable. So alkaline stability is used as a criteria for identifying phosphoproteins. 32P-labelledproteins fixed either in polyacrylamide gels


97

[Cooper 19811 or bound on PVDF [Kamps 19891 are incubated in 1 M KOH at 55 "C for 2 h, and autoradiographic exposures before and after that treatment are compared. Acid hydrolysis might be performed afterwards to detect the phosphoamino acids.

2.2

Free Amino Acids

Determination of free amino acid content is of interest for complex physiological samples (see Figure l), for foods, and for bioresearch and bioindustrial use. Samples are mostly a complex mixture of different substances, such as proteins, lipids, etc. These compounds interfere with the analysis, since they may bind to the stationary phase, thereby limiting capacity, or even may block the column. These analyses are among the most demanding, since especially with physiological samples up to 50 components have to be separated and quantitated. In order to improve chromatographic separation, especially for glutamine, asparagine, and glutamic acid, the eluting buffer is changed from a sodium to a lithium salt. Complex matrices like plasma, foodstuff, or fermentation media can contain high salt concentrations or enzymes. Methods such as precipitation, filtration, and centrifugation are utilized to remove them prior to amino acid analysis. The most common method is the precipitation of protein with 5-sulphosalicylic acid and separation of the sample by subsequent centrifugation [Godel 19841.Alternatively, high molecular weight compounds may be removed by ultrafiltration, which is a very smooth sample preparation technique.

3 Derivatization Amino acids require analytical derivatization to improve their chromatographic behaviour andfor detectability. There are two approaches: first amino acids can be separated and then derivatized and detected (post-column technique), or derivatization is done prior to chromatography (pre-column technique). An ideal derivatization reagent has the following characteristics: It reacts with all primary and secondary amino acids. It gives a quantitative and reproducible reaction. It yields a single derivative of each amino acid. It needs mild reaction conditions. It gives stable derivatization products. It gives highly UV-absorbing or fluorescent derivatives. By-products and excess reagent do not interfere with chromatography. An important characteristic of a derivatization technique is the amount of sample required for detection. It must be stressed that the detection limit for a derivatization method is an optimized result, and it should be distinguished from the amount that is routinely detected with unknown samples. The structures of the derivatization reagents used in amino acid analysis are shown in Figure 2.

-N; o-

LEU

I

i

cC

c

:. j-ME-HIS A l A CIT

M

.

I

lr,.gC.

r r-

TAU

;I..+.'

AlfC

CAI1

I

m

h

-

.

.

RELATIVE

4

~

SSA

-1

FLUORESCENCE

CLU

I

I

L-

--


99

3.1 Post-Column Derivatization The post-column derivatization technique separates free amino acids by ion exchange chromatography and then the reagent is introduced into the effluent stream from the column. This mixture passes through a reaction coil and finally appropriate detection is done. Ninhydrin is the classical reagent for post-column derivatization. Alternatively, orthophthaldialdehyde (OPA) and fluorescamine can be used.

3.1.1 Ninhydrin Since the 1950s when Stein and Moore developed ion exchange separation and postcolumn derivatization using ninhydrin, remarkable improvements in speed, sensitivity and automation have been achieved, but the fundamental technique has remained the same. Ninhydrin reacts quantitatively with primary and secondary amino acids as the separated amino acid residues and reagent pass through the post-column reaction coil, which is maintained at a temperature of 100-135 "C. There are no interfering by-products or multiple derivatization products. Amino acid derivatives are formed with UV absorption maxima at 570 nm (primary amino acids) and 440 nm (secondary amino acids). Samples are applied to the ion exchange column ( a spherical resin 10% DVB cross-linked polystyrol, 4x125 mm) in citrate buffer at pH 2.2. Separation is achieved with a multi-step pH and ionic strength gradient. The detection limit is ca. 50 pmol (ca. 6 ng) amino acid derivative.

3.1.2 Orthophthaldialdehyde When orthophthaldialdehyde (OPA) was introduced for amino acid analysis in 1971 it was used exclusively in the post-column mode [Roth 19711, but today most applications are performed with the pre-column technique. OPA reacts with primary amino acids in the presence of thiol to give highly fluorescent l-alkylthio-2-alkyl-substitutedisoindoles [Simmonsl978; Alvarez 19891. The derivatization is fast (1-3 min) and is perfomed at room temperature in alkaline buffer pH 9.5. However, OPA amino acids are not stable. OPA derivatives can either be detected by UV absorption or fluorescence emission. The reagent itself is not fluorescent. The UV spectra and the fluorescence excitation spectra are symmetrical. There are two maxima, one at 230 nm which is 5 times more intensive than the second one at 335 nm. However, the detection at 230 nm can cause trouble because of UV absorbing contaminants or the intrinsic UV absorption of the buffers. Fluorescence emission occurs at 455 nm and is generally not plagued by background problems. Secondary amino acids will not react with OPA and must be converted to primary amines, i.e. by NaOCl or chloramine T, prior to derivatization [Bohlen 19791. These reagents may be added either continuously or when the secondary amino acid is eluting from the column. A continuous addition gives a more stable baseline. The post-column technique gives a detection limit for OPA amino acids of about 10 pmol (ca. 1.2 ng).

100


Derivatizing reagent

Amino acid derivative 0

Detection h = 570 nm

0

(prim.)

a

0 0

\\

0

h = 440 nm

0

(sec.)

Ninhydrin NH C NH CH COOH

N=C=S

Phenylisothiocyanate

K::

+

R’SH

S

R

h = 338 nm Iex= 230 nm hex= 335 nm hem=455 nm

R

3 3

Orthophthaldialdehyde

R I CH COOH

0

h = 245 nm

COOH

hex=390 nm 475 nm hem=

Fluorescamine

h = 260 nm C H 2 0 C 0 CI

CH~OCONH cn COOH

I

Fluorenylmethylchloroformate

R

hex=266 nm he,= 305 nm

LeX=3 1 0 nm hem=540 nm so2 CI

Dansyl chloride

SO2 N H C H C O O H

I R

Figure 2. Structures of reagents and derivatives used for amino acid analyses.


101

3.1.3 Fluorescarnine Fluorescamine was the first reagent tested as an replacement for ninhydrin in order to improve sensitivity. In alkaline media it forms a fluorescent derivative with primary amines, but not with secondary amines [Udenfried 1972; Weigele 1972; Castell 19791; the reaction is very fast (seconds) and fluorescamine itself is not fluorescent. The fluorescence is recorded at 475 nm, after excitation at 390 nm. There are a number of drawbacks with this reagent; fluorescamine is not stable in aqueous solutions and furthermore the amino acid derivatives have a fluorescence optimum at pH 9, which is difficult to coordinate with ion exchange chromatography. These problems have prevented fluorescamine from gaining significance in amino acid analysis.

3.2 Pre-Column Derivatization Developments in high performance liquid chromatography (HPLC), namely the reversedphase method, have enabled the use of a number of derivatizing reagents which alter the amino acids chromatographic behaviour prior to separation. The polar amino acids become more hydrophobic after coupling to an aromatic compound, and reversed-phase HPLC is then ideally suited to separate this derivatized mixture. High efficiency chromatographic packings together with improved LC equipment make it feasible to separate derivatized amino acid mixtures in as little as 10 min. There are pre-column derivatizing reagents available that can be detected at very low concentrations (femtomoles, M; ca. 100pg) either by UV or fluorescence detection. Pre-column fluorescent derivatives provide the highest sensitivity in the range down to 50 fmol, more or less on a routine basis. It must be stressed that contaminating background levels of amino acids are a severe problem at this level of sensitivity; the presence of only a few bacteria in such a sample can make the analysis useless. In other words, it is the sample preparation that is the limiting factor and not the sensitivity of detection. P

3.2.1 Phenylisothiocyanate The reaction of phenylisothiocyanate (PITC) with amino acids is well understood since it is the first step in the protein sequencing reaction by Edman chemistry [Edman 19501. PITC reacts with primary and secondary amines under alkaline conditions within about 20 min [Heinrickson 1984; Bidlingmeyer 19841. The resulting phenylthiocarbamyl (PTC) derivatives are moderately stable and there are almost no interfering by-products. There are numerous examples given for separating PTC amino acids on RP-HPLC [Ebert 19861. A UV absorbance maximum occurs at 245 nm, but no fluorescence activity is obtained. The detection limit using this method is at about 1 pmol (ca. 0.1 ng). PTC amino acid analysis is the most often used pre-column derivatization method.

102


3.2.2 Orthophthaldialdehyde Orthophthaldialdehyde may be used either for the pre- or post-column technique and a description of the chemical reaction is already given above. There are different thiols that may be used for derivatization (2-mercaptoethanol, ethanethiol, 3-mercaptopropionic acid). They affect the hydrophobicity and stability of the resulting OPA derivatives, and therefore chromatographic parameters vary (stationary phase, eluting buffer). In general the separation is achieved with an acetonitrile gradient and 12.5 mM sodium phosphate buffer at pH 7.2. Pre-column derivatization with OPA and subsequent separation by microbore chromatography provides a sensitivity in the low picomole range (UV detection) or in the 100 fmol range (fluorescence detection).

3.2.3 Fluorenylmethyl Chloroformate After its introduction as a base-sensitive amino acid protecting group [Carpino 19701 and its use for solid-phase peptide synthesis, the 9-fluorenylmethyloxycarbonyl group (FMOC) has been used as a derivatization reagent for amino acid analysis since 1983 [Einarsson 1983; Miller 19901. Derivatization is fast ("

2

25-

[M-Glu~osyl+2H]~~

\ 807.5

500

mlz

Figure 5. Daughter ion spectrum of the gluco-arginine-containing peptide of corn starch. The mass spectrum was established by collisional activation of the (M+2H)2+ion ( d z 889). About 50 pmol of the peptide was consumed to establish the tandem mass spectrometric experiment. The mass spectrum was recorded on a Sciex (Toronto) API I11 triple-quadrupole mass spectrometer with 2400 Da mass range equipped with an ion spray source. The mass spectrometer was operated under unit-mass resolution conditions. Ion spray voltage was 5 kV. All other details are as given in [Heilmeyer1992].

IV.2 Post-translational Modifications

139

3.6 Farnesyl-Cysteine This kind of post-translational modification was first discovered in fungal and yeast mating factors [Anderegg 19881. A C-terminal CAAX-motif (C is cysteine, A are aliphatic amino acids and X is any uncharged amino acid) is the recognition signal for a protein polyisoprenyltransferase. After we had completed the determination of the primary structure of the a and p subunit of the phosphorylase kinase from rabbit skeletal muscle, we identified the two C-terminal Lys-C peptides of both subunits as post-translationally modified peptides [Heilmeyer 19921. Since both carry the motif for farnesylation we suspected that they might be farnesylated at the single cysteine in the fourth last position. Sequence analysis of these peptides revealed a gap in both cases and is most probably due to the very hydrophobic structure of the farnesyl-cysteine residue, preventing the emergence of the PTH-derivative during the usual time of chromatography. Therefore, we applied mass spectroscopic techniques in this case, too. Figure 6 shows at the top the ion spray mass spectrum of the C-terminal Lys-C peptide of the a subunit, performed with 100 pmol of the purified peptide. The double protonated molecule yields the most intense signal and its m/z value of 1092.5 demonstrates the presence of a peptide with a molecular mass of 21 84 Da. This is 204 Da higher than the calculated mass for the unmodified peptide. However, the residual mass of a farnesyl residue is 204 Da. To prove the location of this farnesyl group at the cysteine residue we performed tandem M S N S spectrometry with the double charged molecular ion. Figure 6b shows the mass spectrum of this experiment performed with 50 pmol of the peptide. There are two fragments Y3 and Y4 showing the presence of a peptide fragment comprising the last three or last four amino acids. The mass difference (656.0 - 349.0) of 307 Da demonstrates that the cystein carries the farnesyl group.

m/l

Figure 6 . (a) Ion spray mass spectrum of the C-terminal peptide from the a subunit. An aliquot of the purified C-terminal Lys-C peptide of the a subunit of phosphorylase kinase was flow injected (5 pl/min) into the ion spray mass spectrometer. About. 100 pmol of the peptide was employed to establish the spectrum. The total number of ions obtained in the spectrum is given in the upper right corner.

140

H.E. Meyer

ri Yl3

9000

1008

I.

YI 0 1351

I

00

In//

Figure 6. (b) Daughter ion spectrum of the C-terminal peptide from the a subunit. The spectrum was obtained by collisional activation. Approximately 50 pmol of the peptide was employed to carry out the tandem MS/MS experiment of the (M+2H)2+ = 1092.5 ion as described in [Heilmeyer 19921. N- or C-terminal fragments, formed during collisioninduced dissociation, are designated A, B, C and X, Y, Z, respectively: two dashes represent addition of two hydrogen atoms. The main fragment species formed belong to the B and Y" series reflecting splitting of the peptide bond. Taken from [Heilmeyer 19921 with permission.

3.7 Phospho-Serine Phospho-serine is one of those amino acid derivatives which are stable during peptide purification, but which will be destroyed during Edman degradation [Meyer 1991bl. However, as shown in Figure 7, phospho-serine yields a characteristic PTH derivative detectable in the reversed-phase separation of the PTH amino acids. The detailed mechanism by which the product, PTH derivative of the dithiothreitol adduct of dehydroalanine, is formed during Edman degradation is shown in [Meyer 19881. On the left of Figure 7 there is the phospho-peptide analyzed by Edman degradation and on the right the dephosphoform of the same peptide. Very characteristically, in the case of phospho-serine the PTH-derivative of the dithiothreitol-adduct of dehydroalanine is formed exclusively, whereas an unmodified serine yields a mixture of this derivative and PTH serine.

141


V

B

I

I

I

I

I

I

0

4

8

12

16

20

Time [minl

8

24

0

I

I

I

4

8

12

16

L I

I

20

24

Time [min]

Figure 7. Sequence analyses of the phospho- and dephosphopeptide EP2 from the Psubunit of phosphorylase kinase without S-ethyl-cysteine modification. In A, about 100 pmol of the phosphopeptide EP2 (E refers to glutamic acid residue number 2 of the whole p subunit sequence) is applied onto the gas-phase sequencer and sequenced without Sethyl-cysteine modification. The PTH chromatograms of degradation steps 1 to 4 are shown. The chromatogram of cycle 2 demonstrates that the phosphoserine present in this position is quantitatively transformed to the dithiothreitol adduct of dehydroalanine (DTTS). In B , 60 pmol of the same peptide in the nonphosphorylated form is analyzed. A clear difference between the phosphoserine in A and the serine in B can be seen. This demonstrates that phosphoserine can be located unambiguously in such a case without Sethyl-cysteine modification in the gas-phase sequencer. Taken from [Meyer 1991bl with permission.

H.E. Meyer

142

3.8 Phospho-Threonine As an alternative, phospho-serine and phospho-threonine can be chemically modified to give stable derivatives which can be detected during sequence analysis by their specific derivatives [Meyer 1993aJ. Figure 8 shows, as an example, the sequence analysis of a phospho-threonine-containing peptide after transformation of the phospho-threonine to pmethyl-S-ethyl-cysteine. This reaction proceeds quantitatively treating the phosphopeptide with an alkaline ethanethiol reagent at 50 "C for 1 h. The details of this procedure are given in [Meyer 19931. As can be seen in Figure 8, the phospho-threonine in position 5 of the peptide is transformed to P-methyl-S-ethyl-cysteine which elutes from the reversed phase column as its PTH derivative. The signal doublet is to be expected, since p-methyl-S-ethyl-cysteine contains two asymmetric carbon atoms and we get a racemic mixture of the D/L forms in both positions. Thus, what we see is the separation of the diastereomeric forms of two pairwise enantiomers.

B

li

16 2b 2h

Time (mml

28

Figure 8. Sequence analysis of the phosphopeptide LRRATpLG after modification with ethanethiol. 2 nmol of the peptide was used for the modification and was directly applied onto thepolybrene treated glass fibre disk of the gas-phase sequencer. Instead of butylchloride, used in the older model 470 gas-phase sequencer, ethylacetate was used as transfer solvent. The chromatograms of the on-line PTH amino acid analyses are shown. PTH-p-methyl-S-ethylcysteine (pMe-SEC) is clearly visible in cycle 5 as a signal doublet. All other details are as given in [Meyer 1993al.


143

3.9 Screening for Phospho-Serinemhreonine Containing Peptides by HPLCMS A powerful new technique in searching for phosphopeptides is HPLC coupled with an electrospray mass spectrometer which permits the direct identification of phosphopeptides containing phospho-serine or phospho-threonine [Meyer 1993bl. Figure 9 gives an example of this strategy. Purified insertin, an actin-inserting domain of the protein tensin from chicken gizzard muscle [Weigt 19921, is fragmented by tryptic digestion. We separated these fragments by reversed phase HPLC connected on-line to an electrospray triple quadrupole mass spectrometer. The mass spectrometer works like a highly sophisticated detector, sampling a complete mass spectrum every few seconds. By summing up all the detected ions, a total ion chromatogram can be reconstructed (Figure 9b). All expected and identified tryptic peptides are labelled in the total ion chromatogram (Figure 9b). Two phosphopeptides, marked P1 and P2, exhibit the characteristic behaviour of a phosphoserine/threonine-containing peptide. P2 is identical with the phospho-form of the tryptic peptide T27. The calculated molecular mass of P2 is 2592 Da, 80 Da higher than the molecular mass of T27 due to the additional phosphate group. We extracted the doubly protonated molecular ions from the total ion chromatogram (Figure 9b) with the expected mlz values of 1296 and 1257 (Figure 9c) to demonstrate the presence of the phospho-(P2) and dephospho-form of the tryptic peptide T27. In the lower trace of figure 9c ( d z 1257) two signals separated by 0.2 min are visible. The earlier eluting component represents the double protonated molecular ion of the phosphopeptide P2 ( d z 1296 upper trace) which has lost its phosphate group by collision with nitrogen before entering the mass spectrometer. The later eluting component demonstrates the presence of the unphosphorylated peptide T27 which is more hydrophobic and elutes later from the reversed-phase column after its phosphorylated counterpart. The mass spectrum of the phosphorylated peptide P2 (Figure 9d) reveals that the double protonated molecular ion [M+2HI2+ has an m/z value of 1296.3 and the triple protonated molecular ion [M+3H]3+ has an m/z value of 864.0. The molecular mass calculated from these m/z values is 2592 Da, which corresponds to the tryptic peptide T27 carrying one additional phosphate group (+80 Da). This phosphopeptide loses some of the phosphate moiety during electrospray mass spectrometry due to collision with the drying gas before entering the mass spectrometer, generating additional molecular ions with mlz values of 1257.0 and 837.8, respectively, which are present too. This indicates the neutral loss of the phosphate group which is very characteristic for phosphoserine- and phosphothreonine-containingpeptides. Using this technique the identification of phosphopeptides is already completed during peptide separation and in some cases the direct sequencing by tandem mass spectrometry allows the localization of the phosphoamino acid. However, in most cases electrospray mass spectrometry or tandem mass spectrometry demand assistant protein chemical techniques to localize the phosphoamino acid in the primary structure of a phosphopeptide as shown before.

144

a

H.E. Meyer

c

0

r4

0,

m

a 4

is is

isC

-I

S

.-

IV.2 Post-translationalModifications

145

3.10 Lanthionine, 3-Methyl-Lanthionine,Dehydroalanine, Dehydroa-aminobutyric Acid As already shown in Figure 4, the lantibiotic peptides contain in addition to N-a-ketoacyl groups some unusual amino acids like lanthionine, 3-methyl-lanthionine, dehydroalanine, and dehydro-a-aminobutyric acid. When applying Edman degradation directly to those peptides, lanthionine and 3-methyllanthionine do not allow a normal course of sequencing, presumably due to the tension and difficult accessibility of the ring structures formed by these amino acids. A heavy drop and high chemical lag occur during sequence analysis. Furthermore, sequencing finally stops at dehydroalanine or dehydro-a-aminobutyric acid [Kellner 19881. As shown in Figure 4, all these difficulties are overcome by alkaline ethanethiol treatment, trifluoroperacetic acid oxidation followed by a second treatment with alkaline ethanethiol. Lanthionine and 3-methyl-lanthionine are thereby transformed to a mixture of cysteine/S-ethylcysteine or cysteine/p-methyl-S-ethylcysteine, respectively. This way, Edman degradation takes its normal course and most of the primary structure of these lantibiotics can be elucidated in a single sequence analysis [Meyer 19941.

4 References Anderegg, R.J., Betz R., Can, S.A., Crabb, J.W., Duntze W. (1988) J.Biol.Chem. 263,18236-18240. Structure of Succharomyces cerevisiae mating hormone a-factor. Bunton, C. A. (1949) Nature 163,444. Oxidation of a-Diketones and a-Keto-Acids by Hydrogen Peroxide. Edman, P. (1949) Arch.Biochem. 22,475-476. A Method for the Determination of the Amino Acid Sequence in Peptides. Gooley, A.A., Classon, B.J.,Marschalek, R., Williams, K.L. (1991) Biochern.Biophys. Rex Comm.178,1194-1201. Glycosylation sites identified by detection of glycosylated amino acids released from Edman degradation: The identification of Xaa-Pro-Xaa-Xaa as a motif for Thr-0-glycosylation.

4 Figure 9. On-line HPLC electrospray analysis of a tryptic digest of the phosphoprotein insertin. A Amino acid sequence of insertin. Phosphopeptides P1 and P2 are underlined. Identifiedphosphoserine residues are printed in boldface. B Total ion current chromatogram of separated peptides from the insertin molecule. The peptides identified are labelled with their tryptic fragment number. Identified phosphopeptides are marked PI and P2, respectively. C Single ion chromatogram of the doubly protonated ion of peptide T27(P2) in phosphorylated form (upper truce) and dephospho-form (lower trace) extracted from the total ion current chromatogram B. Notice the signal doublet in the lower trace. The m/z 1256 ion detected at 25.63 minis due to the neutral loss of the phosphate group from the phosphorylated peptide. D Single scan spectrum of the phosphopeptide P2. The spectrum is taken at 25.63 min the point of the highest signal intensity for this ion. Notice both the presence of the phosphorylated form (m/z 1296.3 and 864.0, respectively) and of the dephosphorylated form (m/z 1256.9 and 837.8, respectively) of the double and triple protonated molecular ions which is due to the neutral loss of the phosphate group from the phosphorylated peptide. Taken from [Meyer 1993bI with permission.

146

H.E. Meyer

Heilmeyer, Jr.,L.M.G., Serwe,M., Weber, C.,Metzger, J., Hoffmann-Posorske, E., Meyer, H.E. (1992) Proc.Natl.Acad.Sci. USA89,9554-9558. Farnesylcysteine, aconstituent of the a and P subunits of rabbit skeletal muscle phosphorylase kinase: Localization by conversion to S-ethylcysteine and by tandem mass spectrometry. Kellner, R., Jung, G., Horner, T., Zahner, H., Schnell, N., Entian, K.D., Gotz, F. (1988) Eur.J.Biochem. 17733-59.Gallidernin,anew lanthioninecontainingpolypeptide antibiotic. Kellner, R., Jung, G., Sahl, H.G. (1991) Structure elucidation of the tricyclic lantibiotic Pep5 containing eight positively charged amino acids. In: Nisin and novel lantibiotics (Jung, G., Sahl, H.G.; ed.), pp.141-158, Escom, Leiden. Krishna, R. G., Wold, F. (1992) Post-translational modifications: unique amino acids in proteins. In: Frontiers and New Horizons in Amino Acids Research (Takai, K.; ed.), Elsevier, Amsterdam. McGregor, W.H., Carpenter, F.H. (1962)Biochemistry 1,53-60.Alkaline Bromine Oxidation of Amino Acids and Peptides: Formation of a-Ketoacyl Peptides and their Cleavage by Hydrogen Peroxide. Meyer, H.E., Mayr, G.W. (1987) BioLChem. Hoppe-Seyler 368,1607-1611. NpMethylhistidine in Myosin-Light-Chain-Kinase. Meyer, H.E., Hoffmann-Posorske E. Kuhn C.C., Heilmeyer, Jr., L.M.G. (1988) Microcharacterization of Phosphoserine Containing Proteins. Localization of the Autophosphorylation Sites of Skeletal Muscle Phosphorylase Kinase. In: Modern Methods in Protein Chemistry,Vol. 3 (Tschesche,H.; ed.), pp.185-212, de Gruyter, Berlin. Meyer, H.E., Hoffmann-Posorske,E., Donella-Deana,A., Korte,H. (1991a) SequenceAnalysis of Phosphotyrosine-Containing Peptides. In: Methods in Enzymoloy, V01.201 Protein Phosphorylation (T. Hunter, B. M. Sefton; eds.), pp.206-224, Academic Press, San Diego. Meyer, H.E., Hoffmann-Posorske E., Heilmeyer, Jr., L.M.G. (1991b) Determination and Location of Phosphoserine in Proteins and Peptides by Conversion to S-Ethyl-Cysteine. In: Methods in Enzymology, V01.201 Protein Phosphorylation (T. Hunter and B. M. Sefton eds.), pp. 169-185, Acadaemic Press, San Diego. Meyer, H.E., Eisermann, B., Donella-Deana, A., Perich, J.W., Hoffmann-Posorske, E., Korte, H. (1993a) Protein Sequences & Data Analysis5,197-200. Sequence analysis of phosphothreonine-containing peptides by modification to P-methyl-S-ethylcysteine. Meyer, H.E., Eisermann, B., Heber, M., Hoffmann-Posorske, E., Korte, H., Weigt, C., Wegner, A., Donella-Deana, A., Perich, J.W., (1993b) FASEB J. 7,776-782. Strategies for non-radioactive methods in the localization of phosphorylated amino acids in proteins. Meyer, H.E., Heber, M., Eisermann, B., Korte, H., Metzger, J.W., Jung, G. (1994) Anal.Biochem., submitted. Sequence analysis of lantibiotics: novel derivatization procedures allow a fast access to complete Edman degradation. Stocker, G., Meyer, H.E., Wagener, C., Greiling H. (1990) Biochemical J. 274,415-420. Purification and N-terminal amino acid sequence of a chondroitin sulphateldermatan sulphate proteoglycan isolated from intimdmedia preparations of human aorta. Tsunasawa S., Hirano H. (1993) Deblocking and subsequent microsequence analysis of Nterminally blocked proteins immobilized on PVDF membrane. In: Methods in Protein Sequence Analysis (Imahori, K., Saluyama, F.; eds.), pp.45-53, Plenum Press, New York. Weigt, C., Gaertner, A., Wegner, A., Korte H., Meyer, H.E. (1992) J.Mo1.Biol. 227,593595. Occurrence of an actin-inserting domain in tensin.

Section V: Bioanalytical Mass Spectrometry

Microcharacterization of Proteins by R. Kellner, E Lottspeich & H. E. Meyer 0 VCH VerlagsgesellschaftmbH, 1994

V.1

Analysis of Biopolymers by Matrix-Assisted Laser Desorption/Ionization (MALDI) Mass Spectrometry* Ute Bahr, Michael Karas and Franz Hillenkamp

1 Introduction During the two past decades, important achievements in bioorganic mass spectrometry have been made by the development of new ionization techniques for the analysis of biopolymers such as proteins and carbohydrates. Traditional mass spectrometric methods, which proved so useful for analyzing compounds with low molecular masses, were of little use for measuring underivatized compounds with high molecular masses. The general problem to be solved was to convert the polar, nonvolatile biopolymer macromolecules into intact, isolated ionized molecules in the gas phase. The so-called desorptionhonization techniques use different physical approaches for the conversion; field desorption [Beckey 19771 applies a high electric field to the sample; in fast atom bombardment [Barber 19811 and 252Cfplasma desorption [Macfarlane 19761the sample is bombarded by highly energetic ions or atoms; thermospray ionization [Blakely 19801 and electrospray ionization [Fenn 19901 form ions directly from small, charged liquid droplets. Laser desorption [Hillenkamp 1992;Cotter 19871and the newly developed version of this method, matrix assisted laser desorptionhonization (MALDI) [Hillenkamp 19911 make use of short, intense pulses of laser light to induce the formation of intact gaseous ions. Electrospray ionization, as well as matrix assisted laser desorption/ionization have already demonstrated their capabilities for mass spectrometric analysis of biopolymers in the molecular mass range between thousand and a few hundred thousand Daltons. In this article, the authors will provide an overview on MALDI mass spectrometry, particularly its principles, instrumentation and application to biopolymer analysis.

* Reprint from Fresenius, J.Anal.Chem. (1994) 348,783-791. With the kind permission of SpringerVerlag, Berlin, Heidelberg, New York. R. Kellner, F. Lottspeich, H. E. Meyer (1994) Microcharacterization of Proteins, VCH Weinheirn

150

U.Bahr, M. Karas and F. Hillenkamp

2 Development of MALDI First attempts to use laser light as mass spectrometric ionization method for organic molecules dateback to the 1970s [Vastola 1970; Posthumus 1978; Stolll979; Cotter 1981; Hardin 19811. Among the variety of lasers tested for laser desorption two types of lasers proved to be successful: C0,-lasers with a wavelength of 10,6 pm in the infrared (IR) and lasers emitting in the far ultraviolet (UV). With both laser types an efficient and controllable energy transfer to the sample by resonance absorption of the sample at the irradiation wavelength is possible, in the IR by excitation of rovibrational states, in the UV by electronic excitation. To avoid thermal decomposition of labile organic molecules lasers with pulse widths on the nanosecond timescale, such as Q-swiched Nd-YAG, excimer or TEA-C0,-laser are used to transfer the energy within a very short time. The pulsed desorption of ions favours the combination of the laser desorption ion source with a timeof-flight (TOF) mass analyzer [Hillenkamp 1975; Van Breemen 19831 or a fourier transform ion cyclotron resonance (FT-ICR) mass analyzer [Weller 1990; Koster 19921, both make it possible to record complete mass spectra for each laser shot. All early experiments on laser desorption of organic ions were restricted to the analysis of molecular masses below 2000 Da. Results obtained in UV-laser desorption revealed a ,,soft" desorption of molecular ions only from highly absorbing molecules. The limitation in mass range was believed to result from energy tranfer also into photodissociation channels. Samples which cannot be resonantly excited at the laser wavelength used need very high irradiances (laser powedarea, W/cm2) for ion production which inevitably destroy large organic molecules. These limitations, which prohibited a general use of laser desorption for organic mass spectrometry, promoted the search for a matrix enabling the analysis of high molecular mass biopolymers [Karas 19871. The high mass molecules are finely dispersed in a matrix, consisting of small highly absorbing species. In this way they can be desorbed and ionized irrespective of their individual absorption characteristics [Tanaka 1988; Karas 1988, 1989al.

2.1 Mechanisms of Matrix-assisted Laser DesorptiodIonization The matrix is believed to serve three major functions: 1. Absorption of energy from the laser light. The matrix molecules absorb the energy from the laser light and transfer it into excitation energy of the solid system. Thereby an instantaneous phase transition of a small volume (some molecular layers) of the sample to gaseous species is induced. In this way the analyte molecules are desorbed together with matrix molecules, with limited internal excitation. Different models for desorption are discussed in the literatur [Vertes 1990; Johnson 19911. 2. Isolation of the biomolecules from each other. The biomolecules are incorporated in a large excess of matrix molecules, strong intermolecular forces are thereby reduced (matrix isolation). Incorporation of analyte into matrix crystals taking place upon evaporation of the solvent forms an essential prerequisite for successful MALDI analysis. it moreover provides an in-situ cleaning of the sample and is the reason for a high tolerance against contaminants.

V.1MALDI

151

3. Ionization of the biomolecules. An active role of the matrix in the ionization of the analyte molecules by photoexcitation or photoionization of matrix molecules, followed by proton tranfer to the anlyte molecules is likely, though not proven unequivocally to date. A lot of work was already done and is still going on to find substances useful as matrices for MALDI analysis. Depending on the used laser wavelength, the solubility of the analyte and the class of compounds, different matrix compounds or mixtures of matrix compounds are used. A list of the most commonly used substances is given in Table 1.

Table 1. Most commonly used matrices for MALDI MS. Matrix

Wavelength [nml

2,5-dihydroxy benzoic acid (DHB)

331,355

DHB + 10% 5-methoxy salicylic acid

331,355

Sinapinic acid

331,355

a-cyano-4-hydroxycinnamic acid

331,355

Nicotinic acid

266

4-hydroxy picolinic acid

331,355

Succinic acid

2.94, 10.6 pm

Glycerol

2.94, 10.6 pm

Comments

used for masses >20 000 Da

produces more highly charged molecular ions

used for oligonucleotides

liquid matrix

3 Instrumentation 3.1 Time-of-flight(TOF) Mass Spectrometers MALDI of large molecules is usually coupled to TOF mass analysis, although several applications have been performed on FT-ICR [Buchanan 1993; Castor0 19931or magnetic sector analyzers [Hill 1991; Annan 19921, recently first results with quadrupole ion trap mass spectrometer have been reported [Schwartz 1993; Chambers 1993; Jonscher 19931. In TOF analyzers the mass to charge ratio of an ion is determined by measuring their flight time. After acceleration of the ions in the ion source to a fixed kinetic energy, they pass a

152


field free drift tube with a velocity proportional to (m!zi)-”* (ml/z,is the mass-to-charge ratio of a particular ion species). Due to their mass-dependant velocities, ions are separated during their flight. A detector at the end of the flight tube produces a signal for each ion species. Typical flight times are between a few microseconds and several 100 ps. Figure l a shows a diagram of a 1inearTOFmass spectrometer. Acceleration voltages are typically 1 - 30 kV and the flight path lengths range from 0.5 - 3 m. One problem in TOF mass analysis results from the energy distribution of the ions due to the desorptionlionization process. This initial energy spread leads to a peak broadening at the detector.

a.

b.

r,

detector

ion mirror

Figure 1. Schematic diagram of time-of-flight mass analyzers (TOF). a. Linear TOF: ions are separated according to their mass dependant velocities. b. Reflectron TOF: the initial velocity distribution of ions of same mass are largely corrected. Thereby the mass resolution m/Dm, which is a measure of an instruments capability to produce separate signals from ions of similar mass and thus a measure of the performance of the instrument is limited. The peak broadening can be reduced by using a reflectron (ion mirror) TOF mass analyzer as shown in Figure 1b. The ions are decelerated in the reflectron and turn around at different locations in the reflecting electric potential gradient, thus ions of higher kinetic energy spend a longer time in the ion mirror. If the geometry and the voltages of the reflectron are arranged properly, the arrival time spread can be largely corrected for at the plane of the detector and increased mass resolution is obtained. For reflectron TOF mass analyzers in combination with MALDI ion sources resolutions of up to 6000 (hwfm; half width full maximum) have been reported, whereas linear TOF analyzer are limited to about 500 mass resolution. Figure 2 shows the mass region of the molecular ion of the peptide mellitin with a mass resolution of 6000 (hwfm).

V.1 MALDI

153

100

.-c

75

v)

C

.-9

s!

-

50

25

2846

li:

2848

2850

2852

2854 mlz

Figure 2. Section of the MALDI mass spectrum of mellitin from a reflectron-TOF instrument showing the isotopic distribution of the molecule with a mass resolution of 6000 (hwfm).

3.2 Laser Desorption Ion Source Laser ion sources in use for MALDI may differ in some technical details, but all comprise some common features [Feigll983; Beavis 1989a; Salehpour 1989; Spengler 19901.They all use pulsed lasers with pulse durations of 1 - 200 ns. Most commonly used are nitrogen N,-lasers emitting at 337 nm or Nd-YAG lasers, whose emission wavelength of 1064 nm has been transferred to 355 nm or 266 nm by frequency tripling or quadrupling using nonlinear optical crystals. Moreover, excimer lasers (193,248,308 and 351 nm), frequency doubled, excimer pumped dye lasers (220 - 300 nm) have been used for MALDI, as well as dye lasers (wavelength in the visible) and infrared lasers (TEA CO,: 10,6 pm and ErYAG: 2,94 ym). For the required irradiances in the range of 106 to 107 W/cm2the laser beams are focussed to values between 30 and 500 ym by suitable optical lenses. The angle of incidence of the laser beam on the sample surface varies between 15 - 70". The irradiance at the sample surface is a critical parameter, the minimum (threshold) irradiance to produce ions is well defined and best results are obtained for a laser irradiance no more than ca 20% above the threshold. Thus the intensity of the laser beam on the sample has to be carefully adjusted. This can be done by neutral density filters, angle-dependant reflection attenuaters or polarizers. The position of the laser focus on the sample surface can be changed either by moving the sample towards a fixed laser spot or by steering the laser beam. It should be emphasized, that optical control of the sample is a valuable means to yield optimal MALDI results (see:sample preparation). In Figure 3 the diagram of a laser ion source with movable sample stage and a video-microscope for observation is shown.

154


Laseroptics

Attenuator movable sample probe

4' Ionoptics Microscope

l p Monitor

Figure 3. Schematic drawing of the ion source of a laser desorption mass Spectrometer (Finnigan MAT, Bremen, Vision 2000).

3.3 Ion Detection and Data Collection To detect ions from matrix-assisted laser desorptionhonization secondary electron multiplier are used. The high mass ions produce either electrons or low-mass ions at the conversion dynode of the multiplier. These particles are than used to start the multiplication cascade in an electron multiplier. The surfaces used for the conversion are either copper-beryllium or the lead glass inner surface of a microchannel plate. The yield of secondary electrons and ions from the conversion dynode is a function of the velocity of the ions to be detected. MALDI instruments using low ion acceleration energy need postacceleration of high mass ions in order to compensate for the lower detection efficiency at lower ion velocities. This is achieved by a separated dynode held at a potential of typically 20 kV placed in front of the multiplier. The detector signal is either amplified with a fast linear amplifier or directly digitized by a digital oscilloscope (transient recorder) with a sampling rate of 2 100 x lo6 sampleh. The data are then transferred to a PC for spectrum averaging, mass calibration and storage. Typically 10 - 50 spectra, each from a single laser shot, are summed to improve the signal-to noise ratio and to allow for an accurate mass determination.

V.l MALDI

155

4 Applications 4.1 Sample Preparation The preparation of samples for MALDI analysis is quite simple and fast. A 5-10 g/L solution of the matrix material is prepared in either pure water or a mixture of water and organic solvent (acetonitrile, ethanol); a mixture of water acidified by trifluoroacetic acid (0.1%) and acetonitrile (2: 1) is a well-suited solvent. - lo-' M solutions of the analyte are prepared in the same solvent as the matrix. Small amounts of both solutions (between 0,5 - 10 PI) are then mixed together on the metal sample support (usually stainless steel) to give a final analyte concentration of 0.005 - 0.05 pg/yL in the mixture. The solvent is evaporated and the sample tranferred to the vacuum chamber of the mass spectrometer. Depending on the matrix used finely dispersed small crystallites or extended crystalline areas at the rim of the droplet can be observed under microscopic inspection. The most intense ion signals are usually associated with these well-developed cristalline regions of the sample [Strupat 1991; Bridson 19931, proving that incorporation of the analyte into the crystalline matrix is essential. The time needed for one anlysis is only some minutes, including sample preparation and transfer to the mass spectrometer. Additionally, some minutes are needed for mass calibration.

4.2 Molecular Weight Determination of Proteins and Glycoproteins In MALDI mass spectra generally the most intense signal is the singly charged molecular ion. Additionally, doubly and triply charged molecular ions as well as singly and multiply charged cluster ions appear. Figure 4 shows a typical MALDI spectrum from Carbonic anhydrase (bovine) with a molecular weight of 29024 Da. inn.

0

M+

10000

20000

40000

60000

m/z

Figure 4.MALDI spectrum of Carbonic anhydrase with a inolecular weight of 29 024 Da. DHB was used as matrix, the laser wavelength was 337 nm.


156

0

0 I

10000 20000

!40000

10000 20000

40000

mlz

Figure 5. MALDI spectra of Penicillium Lipase. a. Spectrum of the native lipase. Cytochrome C (CC) is added as mass calibrant. b. Spectrum of the declycosylated (Endoglycosidase H) lipase. From the mass shift of 2000 Da the carbohydrate content of the native substance can be calculated.

-

-

- s s-

Carbohydrate-

Heavy chain

I I

-s s-

-

-

6OOH 6OOH

Figure 6. Schematic structure of a monoclonal antibody of IgG class.

V.l MALDI

157

In the low mass range (< 500 Da) ions signals from the matrix are to be seen. The base peak is the singly charged molecular ion, with lower intensity the doubly and triply charged molecular ions as well as the singly charged dimer is registered. The relative abundances of these ions in the spectrum depend on matrix and the concentration of analyte used. Proteins are detected as the protonated species for positive ions and deprotonated ones for negative ions. Both positive and negative ion spectra can usually be obtained at comparable intensities. Fragment ions due to the loss of small neutral molecules such as H,O, NH, or HCOOH from protonated molecular ions are of low intensity. Fragmentation which can be assigned to the cleavage of covalent bonds in the protein backbone is not prominent under standard conditions. Hundreds of different proteins have been measured in the mean time by MALDI mass spectrometry. No limitations of the application caused by primary, secondary, or tertiary structure has yet been discovered. Proteins with different solution-phase properties, including proteins that are insoluble in ordinary aqueous solutions and glycoproteins that contain large proportions of carbohydrate, can be analyzed. Figure 5a shows a MALDI spectrum of a glycoprotein, native Penicillium lipase. The spectrum shows the singly and doubly protonated molecule. Comparison of the peak width of molecular ion peaks from this substance and that from the peptide cytochrome C (CC, added to the sample as calibrant) shows that the glycoprotein peak is extremely broad because of the heterogeneity in the carbohydrate part of the molecule. In this mass range the mass resolution of TOF instruments in not sufficient to resolve individual glycoprotein components. However, the width of the peak gives a measure for the heterogeneity of the compound. Two approaches are practicable to determine the carbohydrate content of a glycoprotein. If the amino acid sequence is known, the difference between the calculated mass of the peptide backbone and the average mass of the molecule determined from the centroid of the peak gives the average mass of the carbohydrate attached to the protein. The second possibility is the treatment of the compound with glycosidases, enzymes which chemically cleave the carbohydrates and subsequent measurement of the molecular mass. In Figure 5b the mass spectrum of the deglycosylated (with endoglycosidase H) lipase is shown. The molecular ion peak has considerably shifted to smaller mass. From the mass difference of 2000 Da a carbohydrate content of 6.3% can be calculated. A detailed description is given in [Hedrich 19931. High-mass molecules such as monoclonal antibodies (MoAb) with molecular masses up to I50 000 Da can easily be analyzed by MALDI [Siege1199 I]. Figure 6 shows a schematic drawing of an IgG monoclonal antibody. It has a Y-shaped form consisting of two identical heavy chains (HC) with masses of about 50 000 Da and two identical light chains (LC) with masses of about 24 000 Da. The heavy chains contain a small amount of carbohydrate. MoAb are currently investigated for the targeting of anticancer drugs or imaging reagents to tumor sides. The imaging reagents (e.g. radioactive metal ions) bind to chelating agents which are chemically bound either to lysine or the carbohydrate of the MoAb. They have typical masses between 300 -1500 Da and MALDI has been used to determine the molecular mass of the pure and conjugated MoAb as well as the average mass of the carbohydrate moiety and the average number of the chelator molecules. Figure 7 shows the MALDI spectra of pure Chimeric B MoAb (a), the deglycosylated molecule (b) and the conjugated with DTPA (diethylene triamine-penta-acetic acid, mw 375.3 Da) [Siegel 19931.


158

A. $100

'2

?

75

%2 50 F:

2

E

25 50

I00

200

300

B.

9

c)

.3

n

75conjugated

Figure 7. MALDI spectra of the

From the mass differences determined an average carbohydrate content of 3 574 Da was found and a loading value of 5.9 DTPA. One important feature of the MALDI technique is the high tolerance against inorganic and organic contaminants. Somebuffers and salts normally used in biochemical procedures do not have to be removed before analysis. Using sinapinic acid as matrix inorganic salts and denaturation agents like urea and guanidine hydrochloride up to 2 mM in the protein solution and buffering agents like citrate, glycine, hepes, tris, ammonium bicarbonate and ammonium acetate up to 200 mM do not strongly affect the analysis [Beavis 1989, 1990al. With DHB as matrix up to 10% sodium dodecyl sulfate (SDS) in the protein solution is tolerable [Strupat 19911.

4.2.1 Accuracy of Mass Determination For mass determination the mass scale has to be calibrated. This is usually done with welldefined reference compounds either internally by adding them to the analyte-matrix mixture or externally by measuring their masses in a separate analysis, preferentially from a second spot on a support enabling the loading of multiple samples. A mass accuracy of 0.01% (one part in ten thousand) for proteins up to 30 000 Da can be obtained [Beavis 1989b, 1990bl. Decrease of mass accuracy in the higher mass range is supposed to be due

V . l MALDI

159

to non-resolved adducts between analyte and matrix. Heterogeneity of the protein under investigation, if not resolvable into individual signals, is supposed to be another reason. To improve the achievable mass resolution also in the high mass range is therefore an important goal for ongoing research.

4.2.2 Sensitivity and Mass Range The outstanding sensitivity achieved under standard preparation conditions is a strength of the MALDI technique. Typically, about 1 pmol of analyte is loaded for analysis. 1 - 10fmol of protein have been shown to suffice for analysis [Strupat 1991; Karas 1989bl.The amount of material consumed for the analysis is much less than the total amount loaded onto the sample support, nearly the entire sample can be recovered after analysis. The mass range accessible for MALDI using a time-of-flight instrument extends to more than 300 000 Da. The largest functional protein entity detected so far is the trimer of urease at 272 000 Da [Hillenkamp 19891.

4.3 Analysis of Oligonucleotides Compared with protein analysis, only a few results have been reported for oligonucleotides with MALDI. Oligo(deoxy)ribonucleotides and mixtures thereof [Karas 1991; Bornsen 1990; Huth-Fehre 1991; Parr 1992; Nordhoff 1992,1993; Keough 1993; Tang 1993; Wu 19931 have been analyzed as well as small RNA samples [Nordhoff 1992; Hillenkamp 19901. The highest mass detected for nucleic acids so far is about 40 000 from a rRNA [Nordhoff 19921. Oligo(deoxy)ribonucleotides yield stronger signals and better mass resolution for negative ions than for positive ions. Only for higher molecular mass tRNAs and rRNAs the signal intensity is higher for the positive ions. New matrices were required to yield high-quality DNA- and RNA MALDI results, especially 3-hydroxy picolinic acid proved to be very useful [Wu 19931.Figure 8 shows the negative UV-MALDI spectrum of a 22mer (5'-CTC TCA CTA CAG GCA AGC TAC C-3') using this matrix. The deprotonated (M-H)- and doubly deprotonated (M-2H)2-signals are registered. Good results have also been obtained using an infrared wavelength (2.94 pm) and succinic acid as matrix [Nordhoff 19931. In MALDI mass spectrometry nucleic acids are prone to form metal attached molecular ions, mainly from alkali salts present in the solution or from metal contaminants at the sample holder. Multiple salt formation leads to a broad distribution of pseudomolecular ions. Figure 9a shows the negative IR MALDI spectrum of dS'-CGC GAT ATC GCG-3' containing 20 mM KBr in the analyte solution. Deprotonated, potassium attached molecular ions [M-(n+l)H+nK]- are formed with n ranging from 0-10. Additionally small signals from sodium- instead of potassiumattachment appear. In this mass range the signals are well resolved but with increasing mass of the oligonucleotide, this effect results in a broad unresolved peak and the molecular weight can not be determined correctly. This can be avoided by exchanging the metal cations against ammoniumions [Nordhoff 19921.A very efficient and simple way to do this is to add some suitable cation exchange polymer beads to the sample solution or directly to the sample droplet on the support. Figure 9b shows the spectrum of the same sample after thiq nrocedure. Only the deprotonated molecular ions (beside some small fragments) are to


160

(M-H) -

'O0I

1000

2000

3000 4000

6000

8000

Mh

Figure 8. Negative UV-MALDI spectrum of a DNA 22 mer: 5'-CTCTCA CTA CAG GCA AGC TAC C-3' with 3-hydroxy picolinic acid as matrix. A)

n = 1 ,2 ... 10 :

I v) .-c

5 75~ d4 5 0 -

-

loss of G, A, and C

V.l MALDI

161

4.4 Analysis of Glycans and Glycoconjugates Successful ionization of underivatized oligosaccharides by MALDI was demonstrated using different matrices [Mock 1991; Stahl 1991; Harvey 19931. Spectra of oligomeric distributions of dextrins and dextrans with molecular weights up to 15 000 Da have been obtained. Native and permethylated glycoshingolipids [Egge 19911 and gangliosides [Juhasz 19921have been analyzed with detection limits of about 10 fmol. Figure 10 shows MALDI spectra of a native glycosphingolipid (GSL,,,, rabbit erethrocytes) (a) and from a mixture of permethylated glycoshingolipids (GLSp,,,)(b). The general structure of these compounds is shown in Figure 1Oc.All glycans show mainly sodium or potassium-attached molecular ions. The sample amounts required and the accuracy of mass determination are in the same order as for peptides and proteins. Larger oligosaccharides appear to be more difficult to ionize than those with lower molecular weight, the mass range has, as yet been extended to only about 15 000 Da. Future work is necessary to exploit the potential for MALDI of these compounds. GSL nat.

(n = 1)

B)

100

(n = 5)

75

.-E

3

50

c;

.-C

25

(n = 6)

25

I 1

200 1000 2000

4000 mlz

0

.

1000

.

5000

m/z

. Ceramld

GlcNAc

-

-

162


5 Combination of MALDI with Biochemical Methods 5.1 Peptide Mapping of Digested Proteins by MALDI MALDI mass analysis of peptide mixtures produced by enzymatic or chemical digestion of proteins is a new strategy for elucidation of protein structure. The absence of fragment ions and the dominance of singly protonated molecular ions in MALDI spectra make the interpretation of spectra from apeptide mixture straightforward. MALDI analysis of tryptic or carboxypeptidase digests or of peptides from BrCN cleavages have been reported [Schar 1991; Billeci 19931. Using different matrices or matrix mixtures all peptides in a mixture could be detected. The high sensitivity of MALDI, the accuracy of mass determination and the tolerance against impurities and contaminants make the technique very well-suited for fast analysis of peptide mixtures. A new strategy of protein identification is currently developed in different laboratories. C-DNA sequence databases are extended to predict the peptide products of different enzymatic and chemical digests and the peptide map determined by MALDI for a protein under investigation is compared to the database by a computer search [Mann 19921.

5.2 Combination of MALDI and Gel Electrophoresis Sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) and highresolution two-demensional polyacrylamideelectrophoresis(2-DE) are widely used methods for separation of small quantities of proteins. For further protein chemical analysis like sequencing or amino acid composition analysis the most rapid and efficient method is the electrotranfer of these proteins onto a suitable membrane (electroblot). For accurate mass determination of these proteins a MALDI analysis directly from the blot membrane has been performed [Eckerskorn 19921. Gel electrophoretically separated proteins were electroblotted onto polyvinylidene difluoride (PVDF) or polyamide membranes. Pieces of membrane containing the protein of interest were soaked in matrix solution and transferred to the mass spectrometer to be analyzed by MALDI. Figure 11 shows a MALDI spectrum of trypsin inhibitor with a molecular mass of 19 978 Da blotted onto a PVDF membrane. 1001

M+

Figure 11: IR-MALDI spectrum of Trypsin inhibitor obtained directly from a PVDF blot membrane. Succinic acid was used as matrix.

V . l MALDI

163

Best results were obtained using the infrared wavelength from an Er-Yag-Laser. The spectrum has nearly the same quality than MALDI spectra from a normal preparation. Staining of proteins on the gel can sometimes lead to peak broading because of multiple attachment of dye molecules to the analyte, but staining by colloidal pigments was found to be compatible to a subsequent MALDI analysis [Strupat 19941.

5.3 Combination of MALDI with Capillary Zone Electrophoresis Mass spectrometry can be used as ideal detector for capillary zone electrophoresis (CZE) [Keough 1992; Van Veelen 1993; Castors 19921. Proteins in the sub-pmol range were collected in 1-2 pl volumes in microcentrifuge tubes and than mixed with matrix solution and tranferred to the sample probe of the mass spectrometer [66]. Another approach has been reported where the effluent from the capillary is deposited on a moving sample stage together with a sheath flow of matrix solution [Van Veelen 19931. Different peptides and proteins with concentrations in the lower femtomole range could be analyzed.

6 Future Developments Besides instrumental developments to enhance mass resolution for increasing mass accuracy and optimizationof samplepreparation techniques for some classes of biopolymers, which currently investigated by different groups, two very promising developments are described in the following.

6.1 Peptide Sequencing A promising approach for peptide sequencing has recently been developed by Spengler and Kaufmann [Spengler 1991, 19921 using MALDI with a reflectron time-of-flight mass spectrometer. Whereas fragmentation does not occur under MALDI conditions, intensive metastable decay in the field-free drift region can be observed. Both unimolecular and bimolecular (by collision with residual gas molecules) decay of laser-desorbed peptide and protein ions takes place, depending on various instrumental parameters (residual gas pressure, acceleration voltage, matrix-to-analyte ratio, laser irradiance). By tuning of the potentials of the ion reflectron according to the kinetic energy of the fragment ions a series of spectra showing the full fragmentation pattern of the peptide can be registered. Figure 12 shows a section of the peptide sequence ions in the MALDI spectrum of substance P. Beside the molecular ion peak a range of metastable fragment ions is registered. They are labelled according to the nomenclature of [Roepstorff 1985; Johnson 19881.

164


g b

I

I

I

I

I

1000

1100 mlr

1200

1300

Figure 12. MALDI spectrum of metastable ions of substance P (with permission of Spengler et al.).

6.2 "Surface" MALDI Very recently, the use of modified sample support surfaces has been reported which facilitate the MALDI analysis of specific macromolecules captured directly from unfractionated biological fluids and extracts [Hutchens 19931. For this purpose the matrix is covalently bound to a substrate which moreover contains chemically defined solid-phase reaction centers. These surfaces facilitate protein discovery through molecular recognition (affinity capture) or structure analysis through sequencial chemical and/or enzymatic modification of the adsorbed molecules in situ.

7 References Annan RS, Kochling HJ, Hill JA, Biemann K (1992) Rapid Commun. Mass Spectrom. 6,298-302 Barber M, Bordoli RS, Sedgwick RS, Tyler AN (1981) J.Chem.Soc.Chem.Commun. 7,325 Beavis RC, Chait BT (1989a) Rapid Commun.Muss Spectrom. 3,233 Beavis RC, Chait BT (1989b) Rapid Commun.Mnss Spectrom. 3,436-439 Beavis RC, Chait BT (1990a) Proc.Nutl.Acad.Sci. USA87,6873 Beavis RC, Chait BT (1990b) Anal.Chem 62,1836 Beckey HD (1977) Principles of field desorption mass spectrometry. Pergamon Press, Oxford. Billeci TM, Stults JT (1993) AnaLChem. 65,1709-1716 Blakely CR, Carmody JJ, Vestal ML (1980) AndChem. 52,1636 Bornsen KO, Schar M, Widmer HM (1990) Chimica 44,412 Bridson JN, Beavis RC (1993) J.Phys. D, Appl. Phys. 26,442 Buchanan MV, Hettich RL (1993) Anal. Chem 65,245A-259A Caprioli RM, Whaley B, Mock KK, Cotrell JS (1991) In: Techniques in protein chenzisrry ZZ (Villafranca JJ; ed),pp.479-5 10, Academic Press, San Diego.

V . l MALDI

165

Castoro JA, Chiu RW, Monnig CA, Wilkins CL (1992) J.Am.Chem.Soc. 114,7571-7572 Castoro JA, Koster C, Wilkins CL (1993) Anal. Chem 65,784-788 Chambers DM, Goeringer DE, McLuckey SA, Glish GL (1993) Anal Chem 65,14-20 Cotter RJ (1981)Anal.Chem 53,1306-1307 Cotter JR (1987) J.Anal.Chirn Acta 195,4559 Eckerskorn C, Strupat K, Karas M, Hillenkamp F, Lottspeich F (1992) Electrophoresis 13,664-665 Egge H, Peter-Katalinic J, Karas M, Stahl B (1991) Pure AppLChem. 63,491-498 Feigl P, Schueler B, Hillenkamp F (1983) Znt.J.Mass Spectrom.Zon Phys. 47,15 Fenn JB, Mann M, Meng CK, Wong SF, Whitehouse CM (1990) Mass Spectrorn Rev 9,3770. Hardin ED, Vestal ML (1981) Anal.Chem 53,1492-1497 Harvey DJ (1993) Rapid Commun.Mass Spectrom. 7,614-619 Hedrich HC, Isobe K, Stahl B, Nokihara K, Kordel M, SchmidRD, Karas M, Hillenkamp F, Spener F (1993) Anal.Biochem 2ZI,288-292 Hill JA, Annan RS, Biemann K (1991) Rapid CommunMass Spectrom. 5,395-399 Hillenkamp F, Unsold E, Kaufmann R, Nitsche R (1975) AppLPhys 8,341-348 Hillenkamp F, Karas M (1989) Proceedings of the 37th Annual Conf.of the ASMS, Miami Beach, May 21-26, pp.1168-1169 Hillenkamp F, Karas M, Ingendoh A, Stahl B (1990) In: Biological mass Spectrometry (Burlingame AL, Mc Closkey JA; eds), pp.49-60, Elsevier, Amsterdam. Hillenkamp F, Karas M, Beavis RC, Chait BT (1991) AnaLChem. 63,1196A-1203A Hillenkamp F, Ehring H (1992). In: Mass spectrometry in the biological sciences: a tutorial (Gross ML; ed), pp. 165-179, Kluwer Academic Publishers, London. Hutchens WT, Yip T-T (1993) Rapid Commun.Mass Spectrom. 7,576 Huth-Fehre T, Gosine JN, Wu KJ, Becker CH (1991) Rapid Commun.Mass Spectrom. 5,378 Johnson RS, Martin SA, Biemann K (1988) Znt.J.Muss Spectrom. 86,137 Johnson RE, Banerjee S, Hedin A, Fenyo D, Sundquist BUR (1991) In: Methods and mechanisms for proceeding ions from large molecules (Standing KG, Ens W; eds), pp.89-99, Plenum Press, New York. Jonscher K, Currie G, McCormack AL, Yates JR (1993) Rapid Commun.MassSpectrom. 7,20-26 Juhasz P, Costello CE (1992) J.Am.Soc.Mass Spectrom. 3,785-796 Karas M, Bachmann D, Bahr U, Hillenkamp F (1987) Znt.J.Mass Spectrom.Zon Proc. 78,53-68 Karas M, Hillenkamp F (1 988) Anal.Chem 60,2299-2301 Karas M, Bahr U, Hillenkamp F (1989a) Znt.J.Mass Spectrom.ZonProc. 92,23 1-242 Karas M, Ingendoh A, Bahr U, Hillenkamp F (1989b) Biomed.Environm.Mass Spectrom. 18,841-843 Karas M, Bahr U, GieSmann U (1991) Mass Spectrom.Rev.10,335-357 Keough T, Takigiku R, Lacey MP, Purdon M (1992) Anal.Chem. 64,1594-1600 Keough T, Baker TR, Dobsen RLM, Lacey MP, Riley TA, Hasselfield JA, Hesselberth RE (1993) Rapid Commun.Mass Spectrom. 7,195-200

166


Koster C, Kahr MS, Castor0 JA, Wilkins CL( 1992)Mass.Spectrom.Reviews11,495-512 Macfarlane RD, Torgerson FD (1976) Science 109,920 Mann M, Hojrup P, Roepsdorff P (1992). In: Proc. 40th Conf Mass Spectrom Allied Topics, Washington DC, p.957. Mock KK, Davey M, Cotrell JS (199 1) Biochem.Biophys.Res.Commun. 177,644 Nordhoff E, Ingendoh A, Cramer R, Overberg A, Stahl B, Karas M, Hillenkamp F, Crain PF (1992) Rapid Commun.Mass Spectrom. 6,771-776 Nordhoff E, Cramer R, Karas M, Hillenkamp F, Kirpekar F. Kristiansen K, Roepsdorff P (1993) Nucl.Acid Res. 21,3347-3357 Parr GR, Fitzgerald MC, Smith LM (1992) Rapid Commun.Mass Spectrom.6,369 Posthumus MA, Kistemaker PG, Meuzelaar HLC, deBrauw MC (1978) Anal.Chein 50,985 Roepsdorff P, Fohlmann J (1985) Biomed.Mass Spectrom. 12,631 Salehpour M, Perera I, Kjellberg J, Hedin A, Islamian MA, Hakansson P, Sundquist BUR (1989) Rapid CommLmMassSpectrom. 3,259 Schar M, Bornsen KO, Gassmann E (1991) Rapid Commun.Mass Spectrom. 5,319-326 Schwartz JC, Bier ME (1993) Rapid Commun.Mass Spectrom. 7,27-32 Siegel MM,Hollander IJ, Hamann PR, James JP, Karas M, Ingendoh A, Hillenkamp F (1991) AnaLChem 63,2470 Siegel MM, Hollander IJ, Phipps A, Ingendoh A (1993) Finnigan MAT, Vision 2000 Appl. Data Sheet No. 1 Spengler B, Cotter RJ (1990) Anal.Chem 62,7932 Spengler B, Kirsch, D, Kaufmann R (1991) Rapid Commun.Mass Spectrom. 5,198 Spengler B, Kirsch D, Kaufmann R (1992) Rapid Commun.Mnss Spectrom. 6,105 Stahl B, Steup M, Karas M, Hillenkamp F (1991) Anal.Chem 63,1463 Stoll R, Rollgen FW (1979) Org. Mass Spectrom .14,642-645 Strupat K, Karas M, Hillenkamp F (1991) Znt.J.Mass Spectrom.Zon Process 111,89- 101 Strupat K, Karas M, Hillenkamp F, Eckerskorn C, Lottspeich F (I994 ) Annl.Chem.66, 464-470 Tanaka K, Waki H, Ido Y, Akita S, Yoshida Y, Yoshida T (1988) Rapid Comm.Mass Spectrom. 2,15 1- 153 Tang K, Allman SL, Jones RB, Chen CH, Araghi S (1993) Rapid CommunMuss Spectrom. 7,63-66 Van Breemen RB, Snow M, Cotter RJ (1983) Int.J.Mass Spectrom.lon Phys. 49,3550 Van Veelen PA, Tjaden UR, van der Greef J, Ingendoh A, Hillenkamp F (1993) Int.J.Chromatogr.647, 367-374 Vastola FJ, Mumma, RO, Pirone AJ (1970) J.0rg.Mass Spectrom. 3,101 Vertes A, Gijbels R,Levine RD (1990) Rapid Comm.Mass Spectrom. 4,228 Weller RR, MacMahon TJ, Freiser BS (1990) In: Lasers and Mass Spectrometry (Lubman Dm; ed),pp 249-270, Oxford University, New York. Wu KJ, Steding A, Becker CH (1993) Rapid Commun.Mass Specfrom. 7,142-146


v.2 Electrospray Mass Spectrometry Jorg W Metzger and Christoph Eckerskorn

1 Introduction Estimating molecular weights has always been an important aspect of protein and peptide characterization. Molecular weight measurements have been used to prove homogeneity of the sample, establish identity, analyse quaternary structure (e.g. the presence of subunits) and to detect modifications such as glycosylation or proteolysis. Most molecular mass estimates are made using dodecyl sulphate polyacrylamide gel electrophoresis (SDSPAGE) and size exclusion chromatography, calibrated with known standards. These techniques are able to give useful indications of purity, relative molecular masses and approximate amounts of material present. However, the techniques are unable to provide the mass accuracy and resolution really needed to detect mass changes i.e. those due to posttranslational modifications (Table 1) and amino acid exchanges (Table 2). Two relatively recent developments in mass spectrometry, the so-called "soft" ionization processes (electrospray ionization (ESI) (Yamashita 1984 a,b) and matrix-assisted laser desorption/ ionization (MALDI) (Karas 1988)extend the application of mass spectrometry to proteins and other biopolymers. The accuracy, sensitivity and resolving power of these new techniques permits the detection of these minor, but biologically significant protein modifications. In this account the essential features of ESI mass spectrometry are described in an attempt to provide some perspectives on how it works and how it can be used.

2 Instrumentation ESI-MS like all other mass spectrometric techniques are based on the principle of producing molecular ions for subsequent separation and analysis. For ESI measurements the samples are dissolved in a suitable solvent, e.g. a mixture of methanol or acetonitrile and water. The sample solution is then infused into a glass capillary (fused silica) at a constant flow rate and indroduced to a "source", whose purpose is to produce intact ionized molecules in the gas phase, free of solvent or other solute molecules. In the mass analyser the molecular ions are individually selected and separated on the basis of their mass and charge. The ions reach the detector which is connected to a data acquisition system, where the abundance of ions at any given mass-to-charge ratio ( d z ) is recorded.


168

J.W. Metzger and C. Eckerskorn

Table 1. Mass changes due to some post-translational modifications of proteins ~~

Modification Desamidation of Asn or Gln Disulphide bond formation Methylation H ydrox ylation Oxidation of Met Formylation Acetylation Phosphorylation Sulphatation Cy steinylation Pentoses (ara, rib, xyl) Deoxyoses (fru, rha) Hexosamines (GalN, GlcN) Hexoses (fru, gal, glc, man) Lipoic acid (amide bond to Lys) N-Acetylhexose amines (GalNAc,GlcNAc) Farnesylation Myristoylation Biotinylation (amide bond to Lys) Pyridoxal phosphate (Schiff base formed to Lys) Palmitoylation Stearoylation Geranylation N-Acetylneuraminic acid Gluthathionylation N-glycolylneuraminic acid 5'Adenosylation 4'Phosphopantetheine ADP-ribosylation

Mass change

+ 1.0 - 2.0

+ 14.0 + 16.0

+ 16.0 + 28.0 + 42.0 + 79.9 + 80.0

+ 119.1 + 132.1 + 146.1 + 161.2 + 162.1 + 188.3 + 203.2

+ 204.4 + 210.4 + 226.3 + 231.1 + 238.4

+ 266.5 + 272.5 + 291.3

+ 305.3 + 307.3 + 329.2 + 339.3 + 541.3

2.1 The Electrospray Source The ESI phenomenon is a process that produces naked, intact molecules in ionized form from an analyte solution. A spray of fine, highly charged droplets is created at atmospheric pressure in the presence of a strong electric field. The ESI source may be just a metal capillary (capillary tip) at elevated voltage relative to a counter electrode (interface plate) with an orifice where ions entrained in a flow of gas enter the mass spectrometer (Figure 1). Liquid flow is generated by infusion syringes, separation devices (HPLC, CE) or other

V.2 Electrospray Mass Spectrometry

169

liquid sources, at flow rates usually between 1 and a few microlitres per minute. The resulting field between capillary tip and the interface plate charges the surface of the emerging liquid, dispersing it by Coulomb forces into a fine spray of charged droplets. The fine droplets so formed carry an excess of charge and are attracted to the inlet of the mass spectrometer, which is held at a lower potential. A countercurrent flow of dry gas to the droplets causes evaporation of solvent from each droplet, decreasing its diameter. Consequently, the charge density on its surface increases until the so-called Rayleigh limit is reached, at which the Coulomb repulsion becomes of the same order as the surface tension. The resulting instability, sometimes called a "Coulomb explosion", tears the droplet apart, producing charged daughter droplets that also evaporate. This sequence of events repeats and finally produces droplets so small that the combination of charge density and radius of curvature at the droplet surface produces an electric field intense enough to finally desorb ions from the droplets into the ambient gas phase. This ion desorption mechanism, or so-called ion-evaporation process, first proposed by Iribarne and Thomson [Iribarne 1976;Thomson 1979;Fenn 19901,and additionally competing, partially unknown processes, produces at last so-called "quasi-molecular" ions suitable for mass analysis. On the basis of energy calculations, however, this process was critisized [Rollgen 1987, 1989; Schmelzeisen-Redeker 19891.Like Dole and coworkers [Dole 19681,who performed first ESI experiments already more than 25 years ago, Rollgen assumed that the droplets explode until a single analyte ion remains. The formation of gas phase ions seems to occur at a very early stage in the ESI process. Recently it was asked whether droplet formation really is a prerequisite of the formation of gas phase ions [Siu 19931. According to this study it is most likely that the majority of ions already desorb at the so-called "Taylor cone" directly at the solutiodair interface at the end of the capillary. In spite of quite a number of systematic investigations, in particular by the groups of Kebarle [Kebarle 19931 and Smith [Smith 19931, so far no satisfactory quantitative description exists of the formation of gas-phase ions from solute species in charged liquid droplets. Even so, the described concept for ion evaporation is widely accepted and has been a useful working hypothesis, altough a lot of questions still remain about the details. Several variations of the electrospray experiment exist, but all contain the essential elements described above. An advantageous improvement was introduced by Bruins 1987 and was called "ion spray" by its originators to distinguish it from electrospray. A concentrically applied nebulizer gas (compressed air) at the capillary tip is used to assist the formation of suitably fine droplets (Figure l).The name ion spray is somewhat misleading because it implies that a new ionization mechanism is involved. The substantial difference between the two techniques is a difference in the mechanism of droplet formation: while electrospray forms droplets by charge-shearing of a liquid column as it exits a narrow tube, in ion spray a jet of air shears the liquid, resulting in extremely fine droplets. With this pneumatic nebulization of liquid, ion spray produces stable ion currents over a wide range of flow rates, from 1 pllmin to 200 pllmin and during gradient flows from 100% water to 100% organic modifiers. This feature facilitates direct coupling of liquid separation techniques (e.g. HPLC) to mass spectrometry. A schematic drawing of the nebulizer-assisted electrospray ion source (PE Sciex) used in this studies is shown in Figure 1. Other improvements supporting the droplet formation of ESI are ultrasound ("ultraspray" [Whitehouse 19921) or heating ("thermospray" [Lee 19921).

170


\

-----r------

I

t

2 Orifice-Voltage Oto k250V

Interface Voltage k650 V Figure 1. Nebulizer-assisted electrospray ion source (ion spray; PE SCIEX). The ion spray voltage is ca. +4.9 kV in the positive mode and ca. -3 kV in the negative ion mode.

2.2 The Mass Analyser The most common analysers used routinely in ESI mass spectrometers are quadrupole mass filters. A quadrupole consists of four parallel rods, in which opposite electrodes are electrically connected. To one pair of rods is applied a potential of U+V cos wt, where U is a DC voltage and V is the peak amplitude of an RF (radio frequency) voltage at the frequency o=2nf. To the other pair of rods a potential of the same amplitude is applied, but the polarity of the DC voltage is reversed and the RF voltage is shifted in phase by 180". Ions injected parallel to the rods in the quadrupole undergo transverse oscillation caused by the perpendicular DC and RF voltages applied to the rods. For a proper selection of U and V, ions of a given mass to charge ( d z ) ratio will have stable trajectories and will ultimately emerge the quadrupole towards the detector. Ions with other values of m/z will have unstable oscillations which increase in amplitude until they collide with the rods, thus not being transmitted.


17 1

The mass spectrometer API-I11 (PE-Sciex, Toronto, Canada) used in this work is a triple quadrupole mass spectrometer with an m/z range of 10-2400. A schematic drawing of this instrument is shown in Figure 2.

2.3 The Detector Electron multipliers and channel electron multipliers are used universally for detecting the ions. Transmitted ions of the selected mass-to-charge ratio are deflected into the collector of the detector. An ion striking the collector causes an electron cascade and the resulting signal after amplification is sent to the data aquisition of a computer.

3 The Ion Spectra In the electrospray ionization process multiprotonated molecules (M + nH)"+are formed, which give rise to a series of consecutive peaks at (M + n)/n along the m/z scale of the ion spectra. An example is shown in Figure 3, which demonstrates the major advantage of ES: because a protein can be multiply protonated, the m/z ratios of the resulting ions appear as a proportional fraction of its molecular weight. The quadrupole mass analyser of limited range (for the instrument used here up to m / z 2400, where z=1) suffices to measure weights up to 50 times larger. For a single compound, each adjacent pair of peaks differs by one charge. The unknown charge state of the ion can be derived from any two such pairs because they are all derived from a single molecular mass: If two adjacent peaks represent the adduct ions of the neutral molecule plus a proton: m,=(M+n,)/nl

(1)

m2 = ( M + n2 ) / n2

(2) with m2 > ml, n2 < nl

where M is the molecular weight of the protein, m, and m, are the measured mass-to-charge ratios, n1 and n2 are the numbers of protons, respectively. Then, if n2 = n1 - 1

in1= (M + (n, +1)) / (n, I

+ 1)

(3)

n2 is given by equations (2) and (3):

n2 = ( ml

-1 ) / (mz -ml)

(4)

and the molecular weight can be determined: M=n,(m,-1) This calculation, called deconvolution, is usually performed by a computer. The relatively high precision (0.01%) of the mass determination is chiefly the result of averaging individual measurements in the same spectrum. In the example given (Figure 3) each of the mass peaks contribute to the molecular weight (in addition to the care with which the mass scale of the instrument is calibrated and the stability of the calibration).

1. CID

Charged Droplets

i

plier Multi-

I

3 Quadrupole

Collision Quadrupole

1 Quadrupole

Rf-onlyQuadrupole

Zone v

2. CID

(PE-Sciex).

I11

API

spectrometer

w Zone

mass

spray

ion

Triple-quadrupole

2.

Heater


173

The net number of charged sites of a protein or peptide under solvation conditions is an important factor affecting the maximum extent of charging obtained in ESI mass spectra. For many proteins in aqueous solutions ( pH c 4) an approximately linear correlation is observed between the maximum number of charges and the number of basic amino acid residues ( e g arginine, lysine, histidine) plus the N-terminal amino group, unless it is not post-translationally modified [Smith 1990bl. The accessibility of these basic sites and the distribution of charge states depends, apart from the pH, on the temperature and any denaturing agents present in the solution. This pH effect can be used to probe conformational changes in proteins. For example, bovine cytochrome C , the most abundant ion, has 10 positive charges when electrospraying a solution at pH 5.2, but 16 charges at pH 2.6 and an intermediate bimodal distribution at pH 3.0 [Chowdhury 19901. A similar effect is observed upon reduction of disulphide bonds. Lysozyme with four disulphide bonds shows a charge distribution centreing at 12 charges, and after reduction with DTT a new cluster centreing around 15 charges appears [Loo 19903. a

10

Y+18H m

I

12

I

Y+13H

i?

I

1304.07

I

I

Y+12H 141W8

Y + l 1H

Y+@H

18M.49

1100

1100

1900

1400

1500

1 W

17W

lUD

1 1900

mh

Figure 3. Electropray mass spectrum of equine myoglobin, M = 1695 1.6. The ion at m/z 1884.49 corresponds to the molecule with 9 protons attached. From the 12 ions ranging from 9+ ( d z 1884.49) to 20+ ( d z 848.72), a molecular weight of 16 951.34 f 1.5 is calculated.

174


4 Coupling of Chromatographic Methods to the Mass Spectrometer A mass spectrometer constitutes a very specific (but admittedly expensive) detector for HPLC, capillary electrophoresis (CE) and other chromatographic separation techniques. Ionization methods which are based on spraying of the analyte, like thermospray and ESI, are very suitable for such couplings [Covey 1986 a,b; Lee 19891.The wide spread and older TSP interface, however, is generally less sensitive than ESI by a factor of 10 - 100, is more difficult to optimize, has to use a special modifier in order to allow ionization (which might cause problems with gradient HPLC) and can lead to undesired fragmentation of thermolabile compounds [Covey 1988bl. Flow rate and composition of the eluent have to be compatible with both components, the HPLC and the MS interface [Niessen 19921. Whereas pure ESI gives best results in the low ml/min range [Wong 19881,variants in which droplet formation is assisted by compressed air [Bruins 1987; Hopfgartner 19931, by temperature or by ultrasound [Whitehouse 19921 are able to handle flow rates up to 1-2 ml. Thus, the choice of HPLC column very much depends on the kind of interface being used, the scope being enlarged by using a pre- or post-column split. Since ESI and nebulizerassisted ESI constitute concentration and not mass flow-sensitive detectors, the sensitivity is not influenced by splitting the eluent [Hopfgartner 19931. Because the optimal conditions for chromatography and ionization are very often not the same, it is sometimes of advantage to modify the mobile phase after it has passed the column, in order e.g. to reduce the surface tension, to change the pH [Smith 19881or to adjust the flow rate to the requirements of the interface [Smith 1988 a,b; 19901. For the coupling of HPLC with the mass spectrometer, the fused-silica capillary, which is part of the ESI interface (cf. Figure 1) is simply connected to the end of the HPLC column (Figure 4a). The components are separated by chromatography and reach the ion source at different times where they are ionized in succession at atmospheric pressure (API,

injection valve

HPLC pump

I5 PI sample Imp)

200 pVmin

I

I

collector

HPLC

lnlection volume 5 pI pump

flowrate 50-150 @ n i n

Figure 4. a Setup of the HPLC-ES-MS equipment. b Sample introduction with an autosampler.


175

atmospheric pressure ionization). Their masses are registered in the form of a total ion chromatogram (TIC = total ion current versus time), which looks similar to the UV/VIS trace when using photometer detection. The mass information of a component eluting at a certain point of time (indicated as a peak in the TIC) can be obtained during the chromatography or after the run. API techniques have the following features, which are advantageous for easy coupling to HPLC: The main portion of the eluent is removed at atmospheric pressure (therefore no vacuum problems occur and only minimal purification of the ion source is necessary). Solvents, gradients and volatile buffers commonly used for reversed-phase HPLC can be used over the whole concentration range. Since no heat is applied the ionization is very soft (only the orifice plate is kept at ca. 50-60 "C to prevent its blocking by freezing). Optimization and operation of the ESI source are easy. No back pressure is built up (in contrast, for example, to thermospray MS). The occurrence of multiply charged ions in ESI and its variants allows the determination of biomolecules up to the 100 kD range [Hemling 19901. However, it is suitable also for smaller molecules (plant pigments, etc.) [Glassgen 19921. Thus, a very large mass window is available, so that the danger of overlooking a compound (due to its thermolability or size) is low. In general, the mass spectrometer is able to see more than a UVNIS detector or even a diode array detector. It is therefore useful to connect further detectors like photometers, fluorescence, RI or radio detectors in series or parallel to the mass spectrometer (Figure 3a) [Heath 19931. By simultaneous detection with a second detector high specificity can be achieved; in addition, individual components can be quantified via the UV/VIS trace. The software usually allows a direct comparison of the UV/VIS signal with the TIC. The advantage of a mass spectrometric detection for HPLC over simple detection by UV/ VIS is obvious in cases when two or more compounds cannot be separated and coelute. The peak purity can be determined very easily since the mass spectrum consists of the ions of all components forming this peak. On the other hand, for mixtures containing compounds with identical nominal mass (so-called isobaric compounds) it is helpful to use HPLC-MS, because the mass spectrum of such a mixture without separation does not allow differentiation of the individual isobaric components. Structural differences of isobaric peptides can be determined by on-line HPLC-MS-MS using collision-induceddissociation (CID).

5 Off-Line HPLC-MS Although on-line HPLC-MS using an electrospray interface is a relativelyy easy and fast technique, it can be of advantage to analyse routine samples like synthetic peptides off-line. For this purpose fractions of an analytical HPLC run after separation by a routinely used analytical column (e.g. 4.6 mm I.D.) are first collected and then investigated by ESI-MS. ESI is a highly sensitive method, which allows determination of molecular mass of peptides in the picoomole to low femtomole range (Figure 5). Thus, the concentration of

176


400 pMol

2

250

500

m/z

1000

750

I

I

5

10

15

20

80 pMol

25

30

35

min

Figure 5. Sensitivity of ESI mass spectrometry. Successive injections of a neat solvent (blank) and different amounts (128 fmol, 3.2,16,80 and 400 pmol; each in triplicate) of the nonapeptide TYQRTRALV (RMM 1107) into a flow of 40 pl/min methanol/ 0.1% formic acid (1: 1). The injection of 640 fmol (= 0.7 ng) gives a spectrum which shows the doubly and triply charged molecular ions of the peptide with a signal/ noise better than 5: 1 (see inset). such a sample is generally sufficiently high to obtain a mass spectrum with a reasonable signahoise ratio. For the investigation of labile compounds, if only small sample amounts are available, or if the sample is contaminated with salts or other impurities, however, online HPLC is essential.

5.1 Sample Introduction with an Autosampler Even without separation by HPLC an ESI mass spectrum allows checking of the purity of crude synthetic peptides and determination of by-products [Metzger 1991, 19931. A high sample throughput is often desired for such routine ESI-MS measurements. This can be achieved with an autosampler in combination with a HPLC pump using the same set-up as for HPLC-MS, but, without a column (Figure 4b). With this, ca. 50-60 samples per hour can be investigated. The individual samples (injection volume ca. 5 pl) are injected into a continuous flow of solvent (e.g. methanol/O.l% formic acid; flow rate ca. 80-100 pl/min). The time required for the analysis of the ESI mass spectra of crude synthetic peptides is about 2 min per sample.


177

5.2 Purity Control of Synthetic Peptides The nonapeptide SNKLYLKNI with a monoisotopic relative molecular mass (RMM) of 1091.6 was obtained by solid-phase peptide synthesis using Fmoc/tBu strategy. The side chain protecting group for serine and tyrosine was tert.-butyl and for lysine tert.butyloxycarbonyl; the asparagine side chain was not protected. After cleavage of the peptide from the resin with trifluoroacetic acid the crude peptide was precipitated from ether, lyophilized and an ESI mass spectrum was recorded (Figure 6n). The spectrum showed the expected singly and doubly protonated ions of the peptide at m/z 1092.5 and 546.8. However, a closer look showed that in the [M+H]+ range additional ions at m/z 1149,1074,980,965,1073 and 930 were found, which indicated the presence of by-products (Figure 6b); these masses found are in agreement with a tert.butylatedpeptide ([M+H+56]+),adehydratedpeptide ([M+H- 1 8 1 9 and truncatedpeptides lacking leucine or isoleucine ([M+H-113]+), lacking lysine ([M+H- 12819 or lacking tyrosine ([M+H-163]+), respectively. The formation of these by-products can be explained by assuming incomplete deprotection of the Ser and Thr side chains, dehydration of the side chain amide group of Asn forming a nitrile function [Metzger 19931 and incomplete coupling steps [Metzger 19941.

4

[M+2H] 2i 546.5

I

b)

Product 1091.5 I

900

1000

I100

1200

d Z

Figure 6. a ESI mass spectrum of crude SNKLYLKNI (RMM 1091.6). b Enlarged [M+H]+ range indicating the presence of by-products.


178

Product

4

- Lys

J p3

-HzO

k .J 4

+tBu

mi2

“lh

,

978

1073

64

-1420

h

mlr I148 1

J

Figure 7. HPLC-ESI-MS of crude SNKLYLKNI. Total ion current chromatogram (TIC; m/z 400- 1400).

tlBu

5.60

5.80

6.00

121

125

130

6.20 134

6.40

6.60

139 I43 Time (mm)iScan

1mmu 67

A

1 6.80 147

7.00 112

7.20 155

7.40 158

Figure 8. HPLC-ESI-MS of crude SNKLYLKNI. Extracted ion current chromatogram (XIC) of m/z 1092, 928, 963, 978, 1073 and 1148 corresponding to the octapeptide SNKLYLKNI and by-products formed during synthesis. To confirm that these additional ions corresponded to contaminants in the crude peptide, HPLC-MS was performed. A 2 mm I.D. column filled with RP-18 and a gradient 0-100% B in 15 min was used, with 0.1% formic acidA and acetonitrile B as eluents. The TIC shows a broad peak at a retention time of ca. 6.1 min and additional smaller peaks (Figure 7). Analysis of this chromatogram showed that the masses observed in the ESI mass spectrum of the crude mixture were also found in the TIC. Search for these ions in the TIC resulted in the extracted ion chromatogram (XIC) shown in Figure 8. The XIC shows that the truncated peptides with missing Leu/Ile and Tyr are eluted before the intact nonapeptide, since they are less hydrophilic, whereas the more hydrophobic des-Lys peptide, the peptide with the modfied Asn residue and the tert.-butylated peptide elutes after the nonapeptide. The corresponding mass spectra obtained in the apices of the peaks in the XIC (Figure 9nf, show that the separation of the individual compounds of the crude mixture was successful; the spectra almost exclusively contain the singly and doubly protonated ions of these compounds (note that the des-Lys peptide, which contains one basic centre less than the other peptides, only forms a singly charged ion) (Figure 9b).


179

-Lys

a)

w+w+

pl+2H] 2+ 546.8 KKYLNSILN

d 2

C)

WcZH] 2+

-h. -uc

mlr

-Tyr

W+W+

f)

W+2H]2r 574.9

I

251 i 500

d z

d r

Figure 9. ESI-MS mass spectra obtained in the apices of the peaks in the XIC displayed in Figure 8.

5.3 Characterization of Synthetic Peptide Libraries Peptide libraries consist of equimolar mixtures of free peptides of defined length, which can be directly used in assays for studying antigenfantibody, receptorlligand or enzyme/ substrate interaction or for the detection of antibiotic activities [Geysen 19931. A peptide library which contains yt amino acids consists of 20" individual peptides if all 20 protein amino acids are represented. Thus, a complete hexapeptide library, for example, contains 206 = 64 million peptides. Since the binding affinities in biological assays of individual

180


compounds in the mixture can differ by many orders of magnitude, importance should be attached to the purity of the mixture. For the determination of the composition and purity of peptide libraries pool sequencing [Stevanovic 1993; Metzger 19931 and ESI mass spectrometry [Stevanovic 1993; Metzger 1993, 1994 a,b] are well suited. These techniques allow fast optimization of the synthesis of even complex mixtures. The ESI mass spectra of a mixture which contains each peptide in equimolar amounts shows the protonated molecular ions of all peptides present in the mixture. The ion intensities reflect the mass distribution, which can be easily calculated by a computer program. The lightest and heaviest peptide of a mixture mark the range within which the protonated molecular ions should be found. The presence of mass peaks at lower or higher m/z values indicates the presence of by-products (e.g. deletion peptides or peptides with protecting groups). Usually peptide libraries contain many peptides with the same mass. Therefore not only one but several isobaric molecular ions give rise to a particular mass peak in the ESI mass spectrum. Often it is possible to differentiate the isobaric peptides of peptide libraries by HPLC-MS-MS. Isobaric peptides in less complex peptide mixtures can be differentiated by HPLC-MS as shown for the octapeptide mixture LNYRF(S,T,I,E)(N,K,Q)(V,L,I,M). All 48 peptides of this mixture have the same N-terminus but differ in the three C-terminal residues. In position 6, they contain Ser, Thr, Ile or Glu, in position 7 Asn, Lys or Gln and in position 8 Val, Leu, Ile or Met. In the ESI mass spectrum the [M+H]+ ions of the peptides of this mixture are expected at m/z 1013 (lx), 1027 (5x), 1039 (lx), 1041 (8x), 1045 (lx), 1053 (4x), 1055 (5x), 1059 (3x), 1067 (4x), 1069 (4x), 1071 (lx), 1073 (2x), 1083 (4x), 1085 (2), 1087 (lx) and 1100 (2x). HPLC-MS of this mixture was performed using a C-18 column at a flow rate of 200 PI/ min and a linear gradient (5-20% B in 20 min; A = 0.1% aqueous trifluoroacetic acid; B = 0.1% trifluoroacetic acid in acetonitrile). A post-column split (1:5) allowed ca. 40 pl/min to reach the ESI interface. The TIC showed four distinct peak groups with maxima at retention times of 15.2, 16.1, 17.2 and 18.2 min (Figure 10). Analysis of this TIC could be facilitated by using a two-dimensional display of the massanalysed chromatogram, in which the mass signals observed are plotted against a given retention time or mass scan (Figure 11, p. 182). It can be seen that most of the isobaric peptides of the mixture were separated from each other. However, due to the complexity of the mixture most of the components are coeluting with further peptides of different mass. The two-dimensional display also shows that peak groups I11 and IV (cf. Figure 10) clearly correspond to six and four different peptides, respectively. The peptide mixture contains five peptides with a monoisotopic mass of 1026. The TIC of m/z 1027 corresponding to the [M+H]+ions of these peptides allows the determination of their retention times (Figure 12). From the observed intensity distribution it can be concluded that one of these isobaric peptides (eluting after 14.3 min) is under-represented in comparison to the other four peptides, which suggests that this peptide is present in nonequimolar amounts.The mass spectra obtained in the apices of the peaks in the XIC are shown in Figure 13.


10015.2

181

1

I

5 h

75.

17.2 111 18.2

A

.-m C

3

e,

I

50

IV

.-

Y

I

d

25.

I

Figure 10. HPLC-ESI-MS of the 48 component octapeptide mixture LNYW(S,T,I,E)(N,K,Q)(V,L,I,M). Reconstructed total ion current chromatogram ( d z 800-1200) showing four distinct peak groups I-IV.

16.0

0.0 !

5.0 88

10.0 15.0 176 264 Time (min)/Scan

20.0 352

Figure 12. HPLC-ESI-MS of the 48 component octapeptide mixture LNYRF(S,T,I,E)(N,K,Q)(V,L,I,M). Extracted ion chromatogram of m/z 1027corresponding to the protonated molecular ions of five structurally different isobaric peptides.

dimensional

Two

LNYRF(S,T,I,E)(N,K,Q)(V,L,I,M).

120). mixture

octapeptide 1000-1

( d z

component ions

protonated of the 48 molecular

singly HPLC-ESI-MS

of 11. the


183

t R [mini

I43

1027

I4 9

%

1 15.1

15.7

; 16.0

1000

900

900 ~

1100

1200

mlz

Figure 13. ESI-MS mass spectra obtained in the apices of the peaks in the XIC displayed in Figure 12.

-

5 G

4

m/z 1012

LNYRF SNV

m/z 1038

LNYRF INV

I00 50

2 0 ]::l

fi

LNYRF INM

m/z 1070 ,

0

,A

12,oo 21 1

,

& 14.00

246

16.00 28 1

-

A

4

18,OO 3 16

20,oo 352

Time (min)/Scan

Figure 14. Extracted ion chromatogram of m/z 1012, 1044, 1038 and 1070 corresponding to the protonated molecular ions of LNYRFSNV, LNYRFSNM, LNYRFINV and LNYRFINM, respectively. The TICS of m/z 1012, 1044, 1038 and 1070 are shown in Figure 14. These values correspond to the [M+H]+ions of the four peptides LNYRFSNF, LNYRFSNM, LNYRFINV and LNYRFINM, respectively.

184


6 Structure Elucidation of Peptides and Proteins 6.1 HPLC coupled to Mass Spectrometry Modification reaction can be used for counting the number of residues with functional groups in their side chain. Usually the modification reagent is used in high excess and further componentslike bases, acids or buffers are present, which makes direct measurement of the reaction cocktail with ESI-MS difficult. These reagents can be easily removed by HPLC-MS. Mass spectrometric peptide mapping is of particular interest for protein structure analysis; in this proteins are first cleaved enzymatically or chemically and these smaller peptides are then characterized by HPLC-MS [Ling 1991; Hess 1993; Guzetta 19931 or HPLC-MS-MS.

6.2 Capillary Electrophoresis Coupled to Mass Spectrometry Capillary zone electrophoresis has gained special attraction because of its separation power, fast analysis times and low consumption of sample and solvent. Whereas HPLCESI-MS can be used routinely, the coupling of CE with the mass spectrometer requires more optimization and more time [Smith 1993al. This is due to the low flow rates for CE (nl/min), the low sample amount (pmol range) and especially the buffers (like citric acid or borate) necessary for effective separation. Organic buffers are ionized very easily and give interfering peaks over the whole mass range. The presence of non-volatile buffers like phosphate or borate leads to a high conductibility in the droplet, causing an instable spray [Mann 19901. One way of solving the problem of the low electro-osmotic flow is to add a second flow (make up flow with several microliters per minute). This can be achieved either by an additional coaxilliar capillary, which surrounds the CE capillary (liquid sheath flow) [Smith 1988b] or by mixing in a dead-volume free mixing chamber behind the exit of the column [Lee 1988; Garcia 19921. In both cases the analyte is diluted. The analyte concentration is often beyond the mass spectrometric detection limit. For sample introduction, hydrostatic loading is of advantage. Despite all its problems, CE-MS has considerable analytical potency [Olivares 1987; Smith 1988a, 1989; Edmonds 19891.

6.3 Microcapillary LC Coupled to Mass Spectrometry The use of packed fused silica columns is especially suitable for small sample volumes and low amounts (10pmol) [Huang 1991; Perkins 19921.Thelow flow rates of ca. 1 pl/min are well suited to the requirements of the ESI source [Wong 1988; Smith 19901and cause very low consumption of the mobile phase. Because of this, even expensive solvents like deuterated solvents can be used [Karlson 19931. By building in a pre-column splitter for micro LC, conventional micro HPLC pumps can be used [Huang 19913. HPLC with packed capillaries has so far been used especially for the analysis of peptide and protein.


185

6.4 Practical Aspects For coupling HPLC and MS in this work a pulsation-free HPLC syringe pump was used in combination with a 2 inm I.D. reversed phase column ("narrow bore" columns). 2 mm columns have the advantage over columns with larger diameters that solvent consumption is low (ca. 5-10 ml perrun) and the advantage over smaller columns that operation with precolumns is still possible, which helps to prevent blocking of the column. Although the ion spray source can still be operated at a flow rate of 200 pl/min without problems, it is advantagous to split the main portion of the eluent to improve sensitivity.

7 References Bruins, A.P., T.R. Covey, J.D. Henion (1987) AndChem. 59,2642-2646. Ion spray interface for combined liquid chromatography/atmospheric pressure ionization mass spectrometry. Chowdhury, S. K., Katta, V., Chai4B.T. (1990)J.Am.Chern.Soc.112,9012-9013. Probing conformational changes in proteins by mass spectrometry Covey, T.R., E.D. Lee, A.P. Bruins, J.D. Henion (1986a) AnaLChem. 58,1451A-1461A. Liquid chromatography/mass spectrometry. Covey, T.R., E.D. Lee, J.D. Henion (1986b)AnaLChem. 58,2453-2460. High-speed liquid chromatography/tandem mass spectrometry for the determination of drugs in biological samples. Covey,T.R., A.P. Bruins, J.D. Henion (1988) 0rg.MassSpectrom. 23,178-186. Comparison of thermospray and ion spray mass spectrometry in an atmospheric pressure ion source. Dole, M., L.L. Mack, R.L. Hines, R.C. Mobley, L.D. Ferguson, M.B. Alice (1968) J.Chem.Phys. 49,2240-2249. Molecular Beams of Macroions. Edmonds, C.G., J.A. Loo, C.J. Barinaga, H.R. Udseth, R.D. Smith (1989) J.Chromatogr. 474,2 1-37. Capillary electrophoresis-electrospray ionization-mass spectrometry. Fenn,J.B.,M.Mann,C.K.Meng, S.F.Wong,C.M. Whitehouse( 1990)MassSpectrom.Rev. 9,37-70. Electrospray ionization - principles and practice. Geysen, H.M., T.J. Mason (1993) Bioorg.Med. Chem.Lett. 3,397-404. Screening chemically synthesized peptide libraries for biologically-relevant molecules. Glassgen, W.E., H.U. Seitz, J.W. Metzger (1992) BiolMass Spectrom. 21,27 1-277. Highperformance liquid chromatography - ion spray mass spectrometry and tandem mass spectrometry of anthocyanins from plant tissues and cell cultures of Daucus carota L. Guzetta, A.W., L.J. Basa, W.S. Hancock, B.A. Key, W.F. Bennett (1993) Anal. Chern. 65, 2953-2962. Identification of carbohydrate structures in glycoprotein peptide maps by the use of LC/MS with selected ion extraction with special reference to tissue plasminogen activator and a glycosylation variant produced by site directed mutagenesis. Hess, D., T.C. Covey, R. Winz, R. W. Brownsey, R. Aebersold (1993) Protein Sci. 2,13421351. Analytical and micropreparative peptide mapping by high performance liquid chromatography/ electrospray mass spectrometry of proteins purified by gel electrophoresis.

186


Heath, T., A.B. Giordani (1993) J. Chroinatogr. 638,9-19. Reversed-phase capillary highperformance liquid chromatography with online UV, fluorescence and electrospray ionization mass spectrometric detection in the analysis of peptides and proteins. Hemling, M.E., G.D. Roberts, W. Johnson, S.A. Carr, T. R. Covey (1990) Biomed.Environ. Mass Spectrom. 19,677-69 1. Analysis of proteins and glycoproteins at the picomole level by on-line coupling of microbore high-performance liquid chromatography with flow fast atom bombardment and electrospray mass spectrometry: a comparative evaluation. Hopfgartner, G., T. Wachs, K. Bean, J. Henion (1993a) Anal.Chem. 65,439-446. Highflow ion spray liquid chromatography/mass spectrometry. Huang, E.C., J. Henion (1991)Anal. Chem. 63,732-739. Packed-capillary chromatography/ ion-spray tandem mass spectrometry. Determination of biomolecules. Iribarne, J.V., B.A. Thomson (1976) J.Phys.Clzenz. 64,2287-2293. On the evaporation of small ions from charged droplets. Karas, M., F. Hillenlamp (1988) Anal. Chenz. 60,2299-2301. Laser desorption ionization of proteins with molecular masses exceeding 10000 daltons. Karlson, K.-E. (1993) J.Chromatogr. 647,31-38. Deuterium oxide as a reagent for the modification of mass spectra in electrospray microcolumn liquid chromatography mass spectrometry. Kebarle, P., L. Tang (1993)AnaLChein. 63,2709-2715. From ions in solution to ions in the gas phase. Lee, E.D., J. Henion, T.R. Covey (1989) J.Microco1umn Sep. 1,1-48. Microbore highperformance liquidchromatography- ion spray mass spectrometry for the determination of peptides. Lee, E.D., J.D. Henion (1992) Rapid Commun.Mass Spectrom. 6,727-733. Thermally assisted electrospray interface for liquid chromatography/ mass spectrometry. Ling, V., A.W. Guzzetta, E. Canova-Davis, J.T. Stults, W.S. Hancock, T.R. Covey, B.I. Shushan (1991) Anal. Chein. 63,2909-2915. Characterization of the tryptic map of recombinant DNA derived tissue plasminogen activator by high performance liquid chromatography - electrospray ionization mass spectrometry. Loo, J. A., Edmonds, C.G., Udseth, H. R., Smith, R. D. (1990)Anal.Clzem. 62,693-698. Effect of reducing disulfide-containing proteins an electrospray ionization mass spectra. Mann, M. (1990) 0rg.Mass Spectrom. 25,575-587. Electrospray: its potential and limitations as an ionization method for biomolecules. Metzger, J., G. Jung (1991) API-MS is a powerful new tool for on-line peptide and protein analysis with HPLC and CZE. In: Peptides 1990, Proceedings of the 21st European Peptide Symposium (Giralt, E., Andreu, D.; eds.) p.341-342, Escom, Leiden. Metzger, J. W., Jung, G. (1993) Peptide and protein analysis with ion spray mass spectrometry. In: Chemistry of peptides and proteins (Brandenburg, D., Ivanov, V., Voelter, W.; eds.) pp.171-180, Vol. 5/6, DWI Reports 112 A, Verlag Mainz. Metzger, J.W., K.-H. Wiesmiiller, V. Gnau, J. Briinjes, G. Jung (1993) Angew.Chem. Int. Ed. Engl. 32,894- 896. Ion-spray mass spectrometry and high-performance liquid chromatography-mass spectrometry of synthetic peptide libraries. Metzger, J.W., C. Kempter, K.-H. Wiesmiiller G. Jung (1994a) Anal. Biochenz219,261277. Electrospray mass spectrometry and tandem mass spectrometry of multi-component peptide mixtures: determination of composition and purity.


187

Metzger, J.W., S. Stevanovic, J. Brunjes, K.-H. Wiesmuller, G. Jung (1994b) Methods Enzymol., in press. Electrospray mass spectrometry and multiple sequence analysis of synthetic peptide libraries. Niessen, W.M.A., R.A. M Van Der Hoeven, J. Van Der Greef (1992) 0rg.Mnss Spectrom. 27, 341-342. Analysis of intact oligosaccharides by liquid chromatography/mass spectrometry. Olivares, J.A., N.T. Nguyen, C.R. Yonker, R.D. Smith (1987)Anal.Chenz. 59,1230-1232. On-line mass spectrometric detection for capillary zone electrophoresis. Perkins, J.R., C.E. Parker, K.B. Tomer (1992) J.Am.Soc.Mnss Spectrom. 3,139-149. Nanoscale separations combined with electrospray ionization mass spectrometry: sulfonamide determination. Rollgen, F.W., E. Bramer-Weger, L. Buetfering (1987) J.Phys.Colloq. (Paris) 48,C6-253/ C6-256. Field ion emission from liquid solutions: ion evaporation against electrohydrodynamic disintegration. Rollgen, F.W., H. Nehring, U. Giessmann (1990) Mechanisms of field induced desolvation of ions from liquids. In: Zon Formation fi-om Organic Solids (Hedin, A., B.U.R. Sundqvist, A. Benninghoven; eds.) Proceedings of the Fifth International Conference, Lovaanger, Sweden, June 18-21, 1989. Wiley, Chichester. Schmelzeisen-Redeker, G., L. Biitfering, F.W. Rollgen (1989) 1nt.J.Mass Spectrom.Zon Proc. 90,139-150. Desolvation of ions andmolecules in thermospray mass spectrometry. Siu, K.W.M., R. Guevremont, J.C.Y. Le Blanc, R.T. O’Brian, S.S. Berman (1993) 0rg.Mas.s Spectrom. 28,579-584. Is droplet evaporation crucial in the mechanism of electrospray mass spectrometry? Smith, R.D., J.A. Olivares, N.T. Nguyen, H.R. Udseth (1988a)Anal.Chem. 60,436-441. Capillary zone electrophoresis - mass spectrometry using an electrospray ionization interface. Smith, R.D., C.J. Barinaga, H.R. Udseth (1988b) AnaLChern. 60,1948-1952. Improved electrospray ionization interface for capillary zone electrophoresis - mass spectrometry. Smith,R.D., J.A. Loo, C.J. Barinaga, C.G. Edmonds, H.R. Udseth(1989) J.Chromatogr. 480,2 11-232. Capillary zone electrophoresis and isotachophoresis mass spectrometry of polypeptides and proteins based upon an electrospray ionization interface. Smith, R.D., J.A. Loo, C.G. Edmonds, C.J. Barinaga, H.R. Udseth (1990) J.Chromatogr. 516,157-165. Sensitivity considerations for large molecule detection by capillary electrophoresis-electrosprayionization mass spectrometry. Smith, R.D., J.H. Wahl, D.R. Goodlett, S.A. Hofstadler (1993) Anal.Chem. 65,574A584A. Capillary electrophoresis/mass spectrometry. Smith,, R.D., K.J. Light-Wahl(l993) Biol.Mass Spectrom. 22,493-501. The observation of non-covalent interactions in solution by electrospray ionization mass spectrometry: promise, pitfalls and prognosis. Stevanovic, S., K.-H. Wiesmiiller, J. Metzger, A.G. Beck-Sickinger, G. Jung (1993a) Bioorg.Med.Chem.Lett. 3,43 1-436. Natural and synthetic peptide pools: characterization by sequencing and electrospray mass spectrometry. Stevanovic, S., G. Jung (1993b) AnaLBiochem. 212,212-220. Multiple sequence analysis: pool sequencing of synthetic and natural peptide libraries. Thomson, B.A., J.V. Iribarne (1979) J.Chem.Phys. 71.445 1-4463. Field induced ion evaporation from liquid surfaces at atmospheric pressure.

188

J.W. Metzger and C . Eckerskorn

Whitehouse, C.M., J.B. Fenn, S. Shen, C. Smith (1992) PCTInt.Appl.Chemica1 Abstracts 118,93614. Method and apparatus for improving electrospray ionization of solute species for mass spectrometry. Wong, S.F.,C.K. Meng, J.B. Fenn (1988) J.Phys.Clzem. 92,546-550. Multiple charging in electrospray ionization of poly(ethy1ene glycols). Yamashita, M., J.B. Fenn (1984a) J.Phys.Chein. 88,4451-4459. Electrospray ion source. Another variation on the free-jet theme. Yamashita, M., J.B. Fenn (1984b) J.Phys. Chem. 88,4671-4675. Negative ion production with the electrospray ion source.


v.3 Sequence Analysis of Proteins and Peptides by Mass Spectrometry Christiane Weigt, Helmut E. Meyer and Roland Kellner

1 Introduction A protein is characterized not only by its amino acid sequence but also by any groups covalently attached to it, e.g. phosphate, carbohydrate or lipid. However, determination of the primary structure is often the first step in speculating on the shape and function of the analysed protein. The classical method of sequencing proteins and peptides is Edman degradation. In this, the purified proteins have to be digested enzymatically with different endoproteases or split chemically, e.g. with cyanogen bromide. After fractionation of the resulting peptides by reversed-phase high-performance liquid chromatography (HPLC), the separated peptides can be sequenced. Another method of determining the primary structure is sequencing the gene or cDNA encoding the desired protein. More often, partial amino acid sequences are determined to construct oligonucleotide probes allowing a cDNA or genomic DNA library to be screened for positive DNA clones. Sequencing of those DNA clones will result in identification of the amino acid sequence of the encoded protein. Both methods, Edman degradation and DNA sequencing, have the important limitation that they cannot identify post-translational modifications. For example, modified amino acids or disulphide bridges need special treatment to be analysed. Mass spectrometry has improved tremendously over the last few years and new ionization techniques have been developed, allowing peptides or even proteins to be analysed. Mass spectrometry can overcome the above-mentioned drawback and is complementary to Edman degradation. It is more rapid and efficient in determining the nature of a N-terminal blocking group or in recognizing and localizing post-translational modifications. And by developing coupling interfaces between HPLC or capillary zone electrophoresis (CZE) and tandem mass spectrometry, a particular peptide even in a complex peptide mixture can be separated and analyzed on-line. This article will describe forthcoming methods of sequencing proteins and peptides by mass spectrometry, especially tandem mass spectrometry.


C. Weigt, H.E. Meyer and R. Kellner

190 100-

893.15

1

848.5

342.75

38.25

808.2

ion 2

m2 = (M + "2)

1 nz

1060.7 ion 1

ml = (M + n1) I n1

771.5 1131.3

I1

738.0

707.25

600

800

1000

1200

1400

Figure 1. Electrospray mass spectrum of equine myoglobin. Theoretical molecular weight M = 16.951,5 Da. From the 12 multiprotonated ions carrying 13 H+ ( d z 1305) to 24 H+ ( d z 705,25), a molecular weight of 16.951,9 +/- 1,7 Da is calculated [Biemann 19921.

2 Protein Sequencing by Mass Spectrometry At the end of the 1970s mass spectrometry was subject to considerable limitations in regard to the analysis of proteins and other biomolecules. The measurable molecular mass range ended at 1 or 2 kDa and it was not possible to ionize large proteins. In addition, mass spectrometry was not compatible with high-performance liquid chromatography (HPLC), an important tool for separation of proteins and peptides. In 1981 a new ionization method, fast atom bombardment (FAB) was established, allowing ionization of peptides and small polar proteins [Barber 19811. Now the molecular weight of proteins up to 15kDa could be determined with an accuracy o f f 0.01 % [Biemann 19931. Using FAB, the protein or peptide is embedded in a glycerol matrix. This matrixpeptide mixture is bombarded with a beam of argon or xenon atoms. This process results in protonated or deprotonated peptide ions, glycerol and glycerol cluster ions. In the mass spectrometer the resulting ions are separated according to their mass or their mass-tocharge ( d z ) ratio. Through further development of this method coupling with HPLC or capillary zone electrophoresis (CZE) is now practicable by continuous flow (CF-)FAB. With this, the capillary from HPLC or CZE is directly connected to the mass spectrometer [Ito 19851, a few percent of glycerol are added on-line and the flow of effluent is subjected continuously to ionization in the FAB source. FAB ionization is one of the "soft" ionization methods, i.e. the protonated peptides are very stable ions with little excess energy. For that reason the ions undergo little fragmentation and the spectra obtained deliver mainly molecular weight determination, with little structural information. To analyse the sequence of an unknown protein it is more efficient

V.3 Sequence Analysis by MS

191

to use tandem mass spectrometry (for details of this method see below). By utilizing FAB in combination with tandem mass spectrometry a great number of proteins have been sequenced [Biemann 1987, 19881 and sites of post-translational modifications, e.g. phosphorylation sites, have been identified [Labdon 19921. Matrix-assisted laser desorption ionization (MALDI) mass spectrometry for analysis of large biomolecules was first described in 1988 [Karas 19881. Laser energy is used to irradiate a UV- or IR-absorbing matrix, e.g. sinapinic acid, while the peptide or protein of interest is embedded in a large excess of this matrix. Energy is transferred onto the target molecules and ionization and desorption takes place. Analyte ions are emitted and fly towards the detector. Different masses cause different flight times (time-of-flight (TOF) mass analyser) and therefore the ions can be separated before they are detected (see Chapter V. 1 for detailed descriptions.) It was as a characteristic feature of MALDI reported that hardly any fragmentation of the analyte molecules occurs. However, a metastable decomposition of peptides after the desorption process has been recently described [Spengler 1992; Kaufmann 19931. Analyte ions gain additional activation energy and fragmention occurs by collisions with matrix molecules and residual gas molecules during their flight in the field-free drift path (Figure 2); this post-source decay creates metastable ions. By stepping the reflector voltage, the metastable ions can be brought into the scope of the detector and their flight times and, hence, masses can be detected. This "MALDI sequencing" in combination with Edman degradation is a very promising tool for protein sequencing [Kellner 19941. I00

.s 75

1

u

IUI

IM

I

I

1

L

i.50

4nI

I

.$? L I 25

.+

ISW

2wo

sample

I aCCe'eratlon grid I

I 1 I

"llll

'Vy,""

(region where metastable decay

I

takes place)

Figure 2. Schematic diagram of the refletron time-of-flight MS for matrix-assisted laser desorption. The mass spectrum of a peptide mixture of substance P (1 348 Da), bombesin (1620 Da), melittin (2847 Da) and ACTH (4567 Da) demonstrates the appearance of the different types of ions. From [Spengler 19921 with permission from John Wiley & Sons.

192


Another new way in protein sequencing with MALDI has recently been described [Chait 1993; Wang19941. The principle is to produce a set of all fragments of an intact peptide by Edman degradation with phenylisothiocyanate (PITC) containing 5 % phenylisocyanate (PIC). The phenylcarbamyl peptides formed with PIC are resistant against TFA and terminate Edman degradation. The generated set of peptide fragments is then analysed in one step by MALDI. The amino acid sequence is readable by the mass difference between successive signals (see Chapter IV. 1.4). The latest technique for sequencing proteins and peptides by mass spectrometry uses electrospray ionization (ESI). Over the last two decades no method of ionizing proteins was available. In 1984 an ionization method was developed producing an electrospray of a protein solution [Edmonds 19901. The protein or peptide solution passes with a flow rate of 1-lOyl/min through a fused silica capillary with an electrostatic field gradient of 3-6 kV. This electrostatic field generates positive or negative charged droplets. These droplets shrink by solvent evaporation supported by heating or by passing a curtain of dry gas until the charge accumulation on the liquid surface is great enough for ion evaporation. These ions enter the mass analyser and their mass-to-charge ( d z ) ratio can be determined [Smith 19901. Electrospray mass spectrometry generates multiple charged ions, thus direct molecular weight analysis of even large proteins (up to 133000 Da [Loo 19891) with high sensitivity and precision is possible. It is important to mention that several biochemical materials and substances used for isolation of proteins may hinder mass spectrometric measurements [Burlingame 19931: non-volatile buffers; detergents as mixtures of polymers (e. g. Triton X-100); plasticized labware (e.g. Eppendorf tubes) or tubes containing and releasing sodium or potassium ions; ion exchange columns which bleed polymers. Sample preparation for MS is an essential factor, as with Edman degradation! The characteristics of electrospray ionization used in combination with a quadrupole mass analyser are: Soft ionization method, therefore little structural information without tandem mass spectrometry. High non-discriminating ionization efficiency. Low transmission rate concerning the transport of ions from the high-pressure region into vacuum. Generation of multiple charged ions, allowing molecular weight determination of molecules far in excess of the mass range of the analyser used (mass range of over 100kDa [Loo 19891 is obtainable) and which can be fragmented efficiently by collision with a neutral gas (tandem mass spectrometry [Smith 19901) High moleculru-mass accuracy (better than 0.005% in quadrupole instruments [Edmonds 19901). High sensitivity (at low picomole level [Smith 1990]),depending on sample concentration, flow rate, detection efficiency, ionization efficiency and surface tension. High reliability. Compatibility with HPLC and CZE. Acceptance of aqueous mobile phases and buffers. Non-covalently bounded species, i.e. subunits of proteins are depicted as individual parts.


193

2.1 Tandem Mass Spectrometry Electrospray ionization, fast atom bombardment and matrix-assisted laser desorption ionization deliver single or multiple charged ions with little excess of energy. This results in very stable ions and little fragmentation. To obtain structural information the kinetic energy has to be converted into vibrational energy by collision of the ions with a neutral gas (for example helium or argon), inducing fragmentation of the analysed ions. This is the principle of a tandem mass spectrometer [Biemann 19921. The first mass spectrometer (the first quadrupole when using a triple quadrupole) is used as a mass filter allowing only selected ions (parent or precursor ions) to reach the second quadrupole. Here, the collision of the parent ions with helium or argon atoms (collisioninduced decomposition, CID) occurs, producing the daughter or product ions. These daughter ions are analysed in the third quadrupole, the second mass spectrometer. A triple quadrupole instrument produces ions with 100 eV or less energy delivering lowenergy collisions; therefore, fragmentation occurs mainly in the peptide backbone. Electrospray tandem mass spectrometry utilizes mostly double charged ions for CID, while spectra obtained by CID with multiple charged ions are often difficult to interpret. For example, in the case of double charged tryptic peptides, the charges are located at the opposite ends of the peptide, one at the N-terminus, the other at the C-terminal lysine or arginine. Thus, fragmentation results in single charged daughter ions (see Figure 3). The cleavage leads to fragments of type an, bn and cn if the charge is located at the N-terminus and to fragments of type xn, yn and zn if the charge is retained on the C-terminus. If a complete ion series is obtained, the analyzed peptide can be sequenced, because the adjacent signals differ by the mass of one amino acid residue (Table 1).

H2h-

.CO,H

r+

SHR'

$HR'

Ii-(NH-CHR-CO),.,-NH-CH

A

CH-CO-(NI-CHR-CO),.,-OH

dn

W"

HN=CH-CO-(N~-CHR-CO)..,-OH

""

Figure 3. Notations of fragment ions [Biemann 19901. The most prominent fragments observed after low energy collision-induced decomposition are the fragments belonging to the bn and yn series, reflecting splitting in the middle of the peptide bond. These fragments are mainly used for deducing the primary sequence of an analysed peptide.

194

C. Weigt, H.E. Meyer and R. KellneI

The isobaric amino acids leucine and isoleucine and the modified amino acid hydroxyproline (all three with the same nominal mass of 113 Da) cannot be differentiated using a triple quadrupole instrument. This is only feasible with a magnetic instrument where the precursor ions have a kinetic energy of 5-10 keV, resulting in high-energy collisions [Biemann 19921.This high-energy collision produces another pattern offragments (additional dn and wn ions, see Figure 3), allowing discrimination between leucine, isoleucine and hydroxyproline. However, in combination with sequence analysis and amino acid analysis the presence of hydroxyproline, leucine or isoleucine may be determined. The other pair of isobaric amino acid residues is lysine/glutamine. These two amino acids can be distinguished by acetylation of the peptides with acetic anhydride. The acetylation adds 42 Da for the free amino-terminus and for every &-aminogroup of lysine.

Table 1. Monoisotopic and average mass of amino acids.

Amino acid Glycine Alanine Serine Proline Valine Threonine Cysteine Isoleucine Leucine Asparagine Aspartic acid Glutamine Lysine Glutamic acid Methionine Histidine Phen ylalanine Arginine Tyrosine Tryptophan

Letter code Three One GlY Ala Ser Pro Val Thr CYS Ile Leu Asn ASP Gln LYS Glu Met His Phe Arg TYr Trp

G A S

P V T C I L

N D

Q K E M H F R Y W

Mass Monoisotopic

57,021 7 1,037 87,032 97,053 99,068 101,048 103,009 113,084 113,084 114,043 115,027 128,059 128,095 129,043 131,040 137,059 147,068 156,101 163,063 186,079

Average

57,052 7 1,079 87,078 97,117 99,133 101,105 103,139 113,159 113,159 114,104 115,089 128,131 128,174 129,116 131,193 137,141 147,177 156,188 163,176 186,213

Another problem in sequencing proteins and peptides is the localization of disulphide bonds. The principle is to digest the oxidized form of the protein with a specific endoprotease and to determine the molecular weight of the resulting peptides by mass spectrometry. The redetermination of the molecular weight of the peptides after reduction and alkylation gives information about the total number of disulphide bonds. Peptides with an internal -S-S- bond increase in mass by 57 (carbamidomethylation), 58


195

(carboxymethylation ) or 105 Da (ethylpyridylation) per cysteine, respectively. On the other hand the signal of two peptides linked by a disulphide bond will disappear after reduction and two new signals will arise. Through knowledge of the total number of disulphide bonds and the peptides where they appear the disulphide bonds can be identified after sequencing the protein [Biemann 19921. The last aspects to be discussed here are post-translational modifications, such as deamidation, oxidation, glycosylation, phosphorylation, sulfation, methylation, blocked N-, C-termini or ragged C-/N-terminal ends [Covey 19911. After determination of the amino acid sequence of a protein, the next step is to search for post-translational modifications, which may give a hint to the function or localization of the protein in the cell. In this area mass spectrometry again renders good service. If the amino acid sequence of a protein is known, differences between the expected molecular weight and the mass determined by mass spectrometry can be related to a posttranslational modification. The accuracy of the mass measurement may enable identification of a post-translational modification, perhaps in combination with other obtainable data. The expected modification is confirmed if it can be reversed by a specific process, e.g. the treatment of a glycoprotein with glycosidase, followed by redetermination of the molecular weight, and a corresponding decrease in molecular mass may confirm the glycosylation of the analysed protein. One of the most common post-translationalmodifications is the reversiblephosphorylation of a protein. There exist different forms of amino acid phosphates: O-phosphates, Nphosphates, S-phosphates and acyl phosphates. Most often O-phosphoamino acids like phosphoserine, -threonine and -tyrosine are formed or will be formed. Phosphoserine- and -threonine-containing peptides can be identified by the loss of their phosphate group (neutral loss) passing the nitrogen curtain before entering the mass spectrometer. The phosphate group may be split as H3P04 (decrease in mass: 98 Da, or 49 Da for the double charged ion) or HP03- (decrease in mass: 80 Da, respectively 40 Da for the double charged ion). Thus serine- and threonine-phosphorylated peptides can be identified using a quadrupole mass spectrometer with a neutral loss scan function. This means screening for all double charged peptides showing an additional signal with a mass difference of 49 or 40 Da between the phosphopeptide and the dephosphorylated peptide [Covey 1991; Meyer 19931.Phosphotyrosine-containing peptides may be established by the difference between determined and expected molecular mass, because they do not undergo neutral loss. Identification of the specificphosphorylated serine, threonine or tyrosine in a phosphorylated peptide may be performed by tandem mass spectrometry or by other methods [Meyer 19931.

3 Strategy for Protein Sequencing with Electrospray Tandem Mass Spectrometry Exact determination of the molecular weight of the intact protein by mass spectrometry may be the first step in characterizing a sample. The spectra obtained will also verify the homogeneity of a sample. The identification of protein components which are already listed in a database can be achieved by a peptide map according to the fragment masses [Pappin 19931. Thus, the

196

C . Weigt, H.E. Meyer and R. Kellner

protein is digested with specific proteases and the resulting peptides will undergo mass determination by mass spectrometry. The established masses are screened against a database (molecular weight search (MOWSE) peptide-mass database from the SERC Daresbury Laboratory) containing fragments from over 50 000 proteins. Combined with other data, e.g. the amino acid composition of the peptides, the MOWSE search program will identify a known protein or match an unknown but related one. This method allows rapid identification of a protein. Another rapid method for the identification of known proteins is described in which molecular masses of peptide fragments are used to search a protein sequence database [Henzel1993]. After separation by 2D-gel electrophoresis and electroblotting onto aPVDF membrane the proteins are digested in situ. Masses are determined at the subpicomole level by MALDI MS of the unfractionated supernatant. As few as three peptide masses are entered in a newly developed computer program and have been shown to be sufficient to uniquely identify several proteins from over 91 000 protein sequences. Following determination of the total number of cysteines modified by reduction and alkylation of the protein, primary structure analysis may begin by producing overlapping sets of peptides utilizing two or more specific proteolytic enzymes. The first enzyme used is often trypsin. It cleaves the amino acid chain specifically at lysine and arginine, two amino acids which are relatively abundant in most proteins, so that the resulting peptides are of acceptable size for tandem mass spectrometry. LC-MS can be done with an aliquot of the protein digest separating the peptide mixture by reversed-phase HPLC. An advantage of this procedure is that residual proteolytic enzymes and buffer salts, which may disturb mass spectrometric measurements, are removed. The sample buffer should contain some organic solvent [Langen 19931and trifluoroacetic acid should be avoided (it quenches the ion yield drastically) and replaced by acetic acid or formic acid. Hydrophobic proteins and peptides which are not soluble in solutions usually used for electrospray MS may be analysed in chloroform/methanol/water mixtures as well. These solvent mixtures are also tolerated by samples dissolved in hexafluoroisopropanol and formic acid [Schindler 19931. This first LC-MS experiment results in a total ion current chromatogram reflecting the peptide pattern of the digest. On analysing these data the particular mass spectra may elucidate signals containing more than one peptide and determine the d z ratios of every peptide. Afterwards the rest of the digest is commited to electrospray tandem mass spectrometry, whereby every single peptide can be sequencedby fragmentation (LC-MSMS). Ambiguous mass differences of 113 Da for leucine, isoleucine or hydroxyproline and 128 Da for glutamine and lysine prompt further studies. Because only low-energy collision spectra are delivered by quadrupole instruments and therefore dn wn ions are absent these amino acids can be distinguished only by amino acid analysis or Edman degradation of those peptides. Using mass spectrometry, glutamine and lysine can be discriminated by acetylation (see above). This procedure has to be repeated with other endoproteases, until the overlapping peptides result in the whole sequence of the analysed protein. With this sequence information and the known molecular mass of the intact protein, posttranslational modifications can be established. Any N-terminal blocking group or ragged N- or C-termini can be also identified. When the whole amino acid sequence is known the localization of disulphide bonds, if any, is feasible.


197

The strategy for protein sequencing by MS can be summarized: Determination of the molecular weight of the intact protein. Determination of the total number of cysteines. Digestion with specific proteases. Fractionation of proteolytic peptides by HPLC. Mass spectrometry/ tandem mass spectrometry. Differentiation of lysine/glutamine and leucine/isoleucine/hydroxyproline. Identification and localization of post-translational modifications and disulphide bonds.

4 Examples of Protein Sequencing Using Tandem Mass Spectrometry 4.1 Sequence Analysis of Peptides Presented to the Immune System by MHC Molecules The immune response comprises two major arms, humoral and cellular. In this connection T lymphocytes play an important role. They recognize new antigens as short peptides presented by host cells and bound to cell surface class I proteins of the major histocompatibility complex (MHC). If the presented peptides are not self-peptides they will stimulate the cytotoxic T lymphocytes, which proliferate and lyse the infected cells. After cell lysis antibodies secreted by B-cells bind to the foreign proteins and these labelled proteins are ingested by macrophage cells which digest the proteins to peptides. These peptides are again presented at the cell surface in conjunction with class I1 molecules of MHC. Class 1 and I1 molecules are glycoproteins consisting in each case of two chains. The presented peptides play an important role in activating the immune response. Some of these peptides have been sequenced by Hunt [ 19931using mass spectrometric methods. There are several thousand of different peptides presented, so that the resulting mixtures are very complex. Only a few of these peptides could be sequenced by Edman degradation because their purification by HPLC is very tedious. Therefore, Hunt [ 19931 utilized microcapillary HPLC in combinationwith electrosprayionizatiodtandem mass spectrometryfor sequencing of several peptides. Class I molecules were purified by immunoprecipitation. The bound peptides were extracted by acid treatment and separated by filtration. For HPLC separation microcapillary columns (fused silica, 50 pm id. ) packed with C-18 material were used. Peptides eluted with a 15 min linear gradient of 0.1 M acetic acid to 80% 0.1 M acetic acid/ acetonitrile at a flow rate of 500 nl/min. A total ion current chromatogram of MHC class I associated peptides is shown in Figure 4. Sequence information on particular peptides was gained by collision-induced decomposition (CID) of single or double charged ions in a triple quadrupole mass spectrometer. For more details see [Hunt 19931. In Figure 5 the CID mass spectrum from (M+H)+ ions of m/z 1121 (Figure 4, insert) is depicted. Fragmentation was achieved by low-energy collision (15-40 eV) with argon atoms at apressure of 3 mtorr. The spectrum required 2-4 pmol or 100-300 fmol of peptide, respectively. The sequence of the analysed peptide can be established by interpreting the resulting y and b ion series and is presented at the top of Figure 5. The underlined masses


198

0

50

100

200

150

Mass spectrum number

Figure 4.Total ion current chromatogram of MHC class I associated peptides separated by microcapillaryHPLC chromatography.The insert presentsa singleion current chromatogram of (M+H)+ ions showing m / z values of 1120-1122 extracted from the total ion current chromatogram [Hunt 19931. 98

261 390 507

3LK!!lmb, 102 215 $Ql 5.N Ll.5 Zl.2 & Thr Lxx Trp Val Asp Pro Tyr Glu Val 1121 1020 a 7 721 622 507 410 247 118 Y, -

Figure 5. CID mass spectrum of (M+H)+ ions at m/z 1121 (Figure 4, insert). The determined sequence is depicted (at the top); with the observed masses underlined. The above spectrum was recorded with 100-300fmol sample, the below one with 2 pmol. From these data the identity of residues 3-9 was confirmed directly. A single cycle of Edman degradation confirmed the first amino acid to be threonine [Hunt 19931. The ambiguity between Ile and Leu in the second position cannot be resolved by low energy CID.


199

Table 2 a, b. Peptides associated with specific class I1 molecules analysed by microcapillary HPLC coupled to tandem mass spectrometry [Hunt 19931. ~

Peptide

(M+H)+

(miz)

Signal xi05

Yield

9.1 11.9 6.0 21.9 4.9 11.93 3.5 10.0 39.h3

137 179 90 320 74 179

(fmol) ~

1

2

3 4 5 6 7 8 9 10

Peptide

No. 11 12 13 14 15

16

17

18 19

786 898 1011 1211’ 930 954 999 1037’ 103a2 1121

(M+H)+

mtz

1887l 1774l 1900’ 1677‘ 1606’ 20893 19603 1 9893 1926l

6.43

Yield

( pmo 1)

53 150 591 320

SXPSGGSGV XXDVPTAAV XXXDVPTAAV XXXDVPTAAVQA GXVPFXVSV SXXPAXVEX SXXVRAXEV KXNEPVXXX YXXPAXVHX TLWVDPYEV

Peptide Sequence’

2

WANLHEKIQASVATNPI WANLMEKIQASVATNP DAYHSRAIQWRARKQ ASPEAQGALANIAVDKA ASFEAQGALANIAVDK EEQTQQIRLQAEIIQAR EQTQQIRLQAEIIQAR

3

VFQLNQMVRTAAEVAGQX

9 1 6 5 4 5

5

Database Match

Sequence’

KPVSQWTPLLMIPM

Residue Length 17 16 16 17 16 17 16 17 18

-

LLDVPTAAV“ LLLDVPTAAV4 LLLDVPTAAVQA4

-

SLLPAIVEL’

-

YLLPAIVH16 TLWVDPY EV’

Protein source6 Apo-E Apo-E cys-c

H2-Eda H2-Edu

Ape-E

APO-E I1

TF Recp.

were observed in the spectrum. In this way several peptides presented by class I and I1MHC molecules were sequenced and the resulting amino acid sequences are shown in Table 2 a, b. By examination of the sequence datait was found that class I associatedpeptidesare often nine residues long and contain leucine or isoleucine at position 2 and a small hydrophobic aliphatic residue at position 9; class I1 associated peptides are 16-18 amino acids long and contain a six-residue binding motif.

4.2 Partial Sequencing and Identification of a Phosphorylation Site of Recombinant Mitogen-Activated Protein Kinase p42mapk Cell response to different mitogens includes the activation of p42mapk, a mitogenactivated protein (MAP) kinase of 42 kDa molecular weight. Activation of MAP kinase is regulated by phosphorylation of two amino acids, one threonine and one tyrosine. MAP kinase is a member of a protein kinase family which is controlled by phosphorylation of tyrosine and serinelthreonine and contributes to signal transduction. The signalling


200

pathway begins with cell surface growth receptors and functions viaphosphorylation of ras, raf and MAP kinase kinase, which will activate MAP kinase. Three identified substrates of MAP kinase are microtubule-associated protein 2, myelin basic protein and p70s6k protein kinase [Bandi 19931. Some of these signals end in the cell nucleus at proteins that regulate gene expression (transcription factors). Rossomando et al. [ 19921 have identified Tyr185 as the site of autophosphorylation in recombinant MAP kinase using peptide mapping and tandem mass spectrometry. By using a triple quadrupole mass spectrometer they fractionated the tryptic digest of phosphorylated MAP kinase via microcapillary HPLC chromatography (for details see [Rossomando 19921). They identified a tryptic peptide (Vall71-Argl89) which contains the regulatory phosphorylation sites Thr183 and Tyr185 of MAP kinase.

3al

1M1

p:

I

1x10

7w

m

Y12

m/z

Figure 6. CID mass spectrum of (M+3H)3+ ions ( d z742) of the tryptic peptide Val171Arg189 from the phosphorylated form of p42mapk. A sample with 5 pmol was ionized under electrospray conditions. Predicted monoisotopic masses for fragments of types b and y from the deduced sequence are shown above and below the structure at the top of the figure. Those observed in the spectrum are underlined. Fragments that result from internal cleavage at proline are labelled with the approriate single letter codes to indicate the sequences contained in those fragments [Rossomando 19921.


201

To determine the site of phosphorylation in this monophosphorylated peptide, the (M+3H)3+ ions ( d z742) were subjected to tandem mass spectrometry. Ion series of types bandy allowed to establishthe amino acid sequenceandidentifiedTyrl85 as phosphorylation site. Figure 6 depicts the CID mass spectrum of the tryptic peptide Va1171-Arg189. The amino acid sequence is shown at the top with underlined masses found by tandem mass spectrometry. Unlike phosphoserine-or phosphothreonine-containing peptides, the fragments containing the phosphotyrosine residue show only minor dephosphorylation during CID. As an example, fragment y5 ( d z689.3) comprising the C-terminal phosphopentapeptide yields almost no dephosphorylated species which should have a m/z value of 609.3. This is an example of determination of sites of post-translational modifications - in this case phosphorylation - in proteins or peptides. The amino acid sequence of the protein used here was known, but the same results can be obtained with an unknown protein, too.

4.3 Protein Sequence Analysis by Tandem Mass Spectrometry in Combination with Microcapillary HPLC The advantages and characteristics of tandem mass spectrometry coupled with capillary electrophoresis or microcapillary HPLC are demonstrated with P-lactoglobulin [Hunt 19911. It has a molecular weight of 37 kDa and consists of 2 subunits, each with a mass of 18 kDa. Tryptic digestion of P-lactoglobulin was carried out in a fused silica column. 10 pmol protein was digested directly in the column which was then connected with amicrocapillary HPLC column (50 pm i.d.) filled with C-18 material. Peptides were eluted with a gradient of 0-80% acetonitrile containing 0.5% acetic acid within 15 min. Figure 7 shows the total ion current chromatogram of a tryptic digest of P-lactoglobulin. At the bottom of Figure 7 the tryptic peptides identified in this chromatogram are depicted in order of their elution from microcapillary HPLC. The (M+2H)2+ ions ( d z 596,6) of the tryptic peptide containing residues 92-101 are choosen as an example of collision-induced decomposition (CID) and tandem mass spectrometry (Figure 8). The CID spectrum recorded with 5-10 pmol of this tryptic peptide are presented in Figure 8. There again are ions of they and b series, which allow complete determination of the amino acid sequence of the analysed peptide.

C . Weigt, H.E. Meyer and R. Kellner 100 -

80

RIG TRYPTIC PEPTIDES FROM 10 PMOL DIGEST OF P-LACTOGLOBULIN MW 18,281

-

1:

500

400

600

6-LACTOGLOBULIN M/Z

337

9-

14

460

84-

91

420 468

142-148 1- 8

452

76-

596

91-101

546

125-138

533

92-100

601

61-

857

700 800 SCAN NUMBER

TRYPTIC

RESIDUE

83

75

149-162

15-20 MIN PROGRAM

900

1(

PEPTIDES

SEQUENCE GLDIQK IDALNENK ALPMH I R LIVTQTMK TK I PAVFK VLVLDTDY K K TPEVDDEALEKPDK VLVLDTDY K WENGECAQKKIIAEK LSFNPQTLEEQCHI

772

41-

60

VYVEELKPTDEGDLEILLQK

903

15-

40

VAGTWYSLAMAASDISLLDAQSAPLR

Figure 7. Total ion current chromatogram of a tryptic digest of p-lactoglobulin fractionated by microcapillary HPLC (10 pmol). At the bottom are listed the tryptic peptides of the plactoglobulin found in the above digest [Hunt 19911.


203

101 596.6 MH,"

81

$ 6C s s

9 .-s

P U 0

4c

20

0

L

VLVDTDYKK MH' = 1193.3 RESIDUES92-101

397 8 MH,**'

400

500

I

YLF~.

I

700

800

, . 1000 , 1100 , , , 1200 1300 ,

,

900

,

1400

rn/z

+ + + + +

V L V L D T D Y K K

MH2,

+ + + + + + + + +

59

++ ++ ++

YE++

I

a,

L a,

0

.s P d

Y7

100

80

1

Y8

Y6

I l l y g Figure 8. Mass spectrum of the tryptic peptide 92-101 of P-lactoglobulin. The double charged ions (M+2H)2+ with m/z 596.6 and the triple charged ions (M+3H)3+ with ni/z 397.8 are observable. At the bottom the CID spectrum of the double charged ions with m/ z 596.6 and the amino acid sequence of the peptide is shown [Hunt 19911.

204


5 References Bandi, R.H.; Ferrari, S.; Krieg, J.; Meyer, H.E., Thomas, G. (1993) J. Biol. Chem. 268, 4530-4533. Identification of 40s ribosomal protein S6 phosphorylation sites in swiss mouse 3T3 fibroblasts stimulated with serum. Barber, M.; Bordoli, R.S.; Sedgwick, R.D., Tyler, A.N. (198 1) J.ChemSoc.Chem. Commun., 325-327. Fast atom bombardment of solids (F.A.B.): a new source for mass spectrometry Biemann, K. (1988) Biomed.Environ.Muss Spectrom. 16,99-I 11. Contributions of mass spectrometry to peptide and protein structure. Biemann, K. (1990) Applications of tandem mass spectrometry to peptide and protein structure. In: Biological Mass Spectrometry (Burlingame, A.L., McCloskey, J.A.; eds.), pp. 176-196, Elsevier, Amsterdam. Biemann, K. (1992)Annu.Rev.Biochem.61,977-1010. Mass spectrometry of peptides and proteins. Biemann, K. (1993) Recent advances in protein sequencing by mass spectrometry, introduction and overview. In: Methods in Protein Sequence Analysis (Imahori, K., Sakiyama, F.; eds.), pp.119-126, Plenum Press, New York. Biemann, K., Martin, S.A. (1987) Muss Spectrom.Rev. 6,l-76. Mass spectrometric determination of the amino acid sequence of peptides and proteins. Burlingame,A.L. (1993) Mass spectrometry in protein sequenceand structural investigations. In: Techniques in Protein Chemistry IV (Angeletti, R.H.; ed.), pp.3-21, Academic Press, New York. Chait, B.T.; Wang, R.; Beavis, R.C., Kent, S.B.H. (1993) Science 26289-92. Protein ladder sequencing. Covey, T.; Shushan, B.; Bonner, R.; Schroder, W., Hucho, F. (199 1) LCMS and LC/MS/ MS screening for the sites of post-translational modification in proteins. In: Methods in Protein Sequence Analysis (Jornvall, H., Hoog, J.O., Gustavsson, A.M.; eds.), pp.249-256, Birkhauser Verlag, Basel. Edmonds, C.G.; Loo, J.A.; Fields, S.M.; Barinaga, C.J.; Udseth, H.R., Smith, R.D. (1990) Capillary electrophoresis combined with electrospray ionization mass spectrometry and tandem mass spectrometry. In: Biological Mass Spectrometry (Burlingame, A.L., McCloskey, J.A.; eds.), pp.77-100, Elsevier, Amsterdam. Edmonds, C.G., Smith, R.D. (1990)MethodsEnzymol.193,412-43I. Electrosprayionization mass spectrometry. Henzel, W.J., Billeci, T.M., Stults, J.T., Wong, S.C., Grimley, C., Watanabe, C. (1993) Proc.Natl.Acad.Sci. USA 90,501 1-5015. Identifying proteins from two-dimensional gels by molecular mass searching of peptide fragments in protein sequence databases. Hillenkamp, F., Karas, M. (1990) Methods Enzymol. 193,280-295. Mass spectrometry of peptides and proteins by matrix-assisted ultraviolet laser desorption/ionization. Hunt, D.F.; Shabanowitz, J.; Moseley, M.A.; McCormack, A.L.; Michel, H.; Martino, P.A.; Tomer, K.B., Jorgenson, J.W. (1991) Protein and peptide sequence analysis by tandem mass spectrometry in combination with either capillary electrophoresis or micro-capillary HPLC. In: Methods in Protein Sequence Analysis (Jornvall, H., Hoog, J.O., Gustavsson, A.M.; eds.), pp.257-266, Birkhauser Verlag, Basel.


205

Hunt, D.F.; Shabanowitz, J.; Michel, H.; Cox, A.L.; Dickinson, T.; Davis, T.; Bodnar, W.; Henderson, R.A.; Sevilir, N.; Engelhard, V.H.; Sakaguchi, K.; Appella, E.; Grey, H.M., Sette, A. (1993) Sequence analysis of peptides presented to the immune system by class I and class I1 MHC molecules. In: Methods in Protein Sequence Analysis (Imahori, K., Sakiyama, F.; eds.), pp.127-133, Plenum Press, New York. Ito, Y.; Takeuchi, T.; Ishi, D., Goto, M. (1985) J.Chromutogr. 346,161-166. Direct coupling of micro high-performance liquid chromatography with fast atom bombardment mass spectrometry. Karas, M., Hillenkamp, F. (1988) Anal. Chem. 60,2299-2301. Laser desorption ionization of proteins with molecular masses exceeding 10 000 daltons. Kaufmann, R., Spengler, B., Liitzenkirchen, F. (1993) Rapid Commun.MussSpec. 7,902910. Mass spectrometric sequencing of linear peptides by product-ion analysis in a reflectron time-of-flight mass spectrometer using matrix-assisted laser desorption ionization. Labdon, J.E.; Nieves, E., Schubart, U.K. (1992) J.Biol.Chem. 267,3506-35 13. Analysis of phosphoprotein p 19 by liquid chromatography/mass spectrometry. Langen, H.; Sander, B.; Vilbois, F., Lahm, H.W. (1993) Characterization of the proteins c-kit ligand and DHFR by electrospray mass spectrometry. In: Techniques in Protein Chemistry ZV (Angeletti,R.H.; ed.), pp.47-54, Academic Press, New York. Loo, J.A.; Udseth, H.R., Smith, R.D. (1989) AnaLBiochem. 179,404-412. Peptide and protein analysis by electrospray ionization-mass spectrometry and capillary electrophoresis-mass spectrometry. Meyer, H.E.; Eisermann, B.; Heber, M.; Hoffmann-Posorske, E.; Korte, H.; Weigt, C.; Wegner, A.; Hutton, T.; Donella-Deana, A., Perich, J.W. (1993) FASEBJ. 7,776-782. Strategies for nonradioactive methods in the localization of phosphorylated amino acids in proteins. Pappin, D.J.C.; Hojrup, P., Bleasby, A.J. (1993) Current Biology 3,327-332. Rapid identification of proteins by peptide-mass fingerprinting. Rossomando, A.J.; Wu, J.; Michel, H.; Shabanowitz, J.; Hunt, D.F.; Weber, M.J., Sturgill, T.W. (1992) Proc.Nut1.Acad.Sci. USA89,5779-5783. Identification of tyr- 185 as the site of tyrosine autophosphorylation of recombinant mitogen-activated protein kinase p42mapk. Schindler, P.A.; Van Dorsselaer, A., Falick, A.M. (1993) AnaLBiochem. 213,256-263. Analysis of hydrophobic proteins and peptides by electrospray ionization mass spectrometry. Smith, R.D.; Loo, J.A.; Edmonds, C.G.; Barinaga, C.J., Udseth, H.R. (1990) Anal. Chem. 62,882-899. New developments in biochemical mass spectrometry: electrospray ionization. Spengler, B., Kirsch, D., Kaufmann, R., Jaeger, E. (1992) Rapid Cummun.Mass Spec. 6,105-108. Peptide Sequencing by matrix-assisted laser-desorption mass spectrometry. Wang, R., Chait, B.T., Kent, S.B.H. (1994) Protein ladder sequencing: towards automation. In: Techniques in protein chemistry V (Crabb, J.W.; ed.),pp. 19-26, Academic Press, New York.

Section VI: Database Analysis


VI.1 Protein Sequences and Sequence Databases Hans- Werner Mewes and David G. George

1 Introduction Research in biology, biotechnology, and medicine draws heavily from the analysis of living material at the molecular level. Although basic principles of energy conversion, information storage, biosynthesis, and metabolism are well understood, the details of the highly complex interactions of small, medium, large, and very large biomolecules in time and space is only beginning to be understood. Data concerning the molecules involved these processes and their properties are a precious commodity, critical to the emergence of these new understandings. Databases for biological macromolecules are basic resources for research in the modem life sciences. Genome analysis, medical diagnostics, environmental research, molecular biology, gene therapy, and biotechnology are dependent not only on raw data but also on highly structured and interpreted databases for nucleic acid and protein sequences. These databases contain basic information concerning the living cell and its machinery. The understanding of complex processes in the cell and their highly efficient regulation require knowledge of the molecular structure of its components. The biological databases serve as a repository for this knowledge. Cellular equilibrium is maintained by the structural and functional proteins, encoded by large genomic DNA molecules. DNA itself is organized as two unbranched, paired chains of 4 different nucleotides in the form of a left-handed double helix. Part of its information is transcribed into messenger RNA which is translated into proteins, folded unbranched chains build from 20 different amino acids. These proteins may be modified by the cell after their initial translation resulting in the occurrence of nonstandard amino acids within the molecule. The one-dimensional structure of the DNA nucleotides (its sequence) specifies but does not determine the one-dimensional structure of the protein completely. The physico-chemical properties of the amino-acid sequence of the protein ultimately designate its three-dimensional structure, which in turn determines its biological function. However, protein folding is currently a major research problem and all of the factors involved have not been fully identified. The size of these molecules varies from a few hundred (mRNA’s) up to hundreds of millions of nucleotides (mammalian chromosomes) and from a few (e.g. hormones) up to several thousand amino acids. It is customary to distinguish between the primary, secondary, and tertiary structure of these molecules. Primary structure is understood as the order of the building blocks, e g , the sequence of the amino acids in a protein chain. R. Kellner, F. Lottspeich, H. E. Meyer (1994) Microcharacterization of Proteins, VCH Weinheim

H.W. Mewes and D.G. George

210

Secondary structure describes the spatial arrangement of localized substructures of idealized type (double helix for DNA molecules, helix, sheet, and random coil arrangements for proteins) within the molecule. Tertiary structure is the exact description of the threedimensional configuration of the atoms in the molecule.

2 Current Databases Sequence determination became technically feasible by chemical methods in the early 1950s. Because protein sequencing required the isolation and purification of significant quantities in a difficult, expensive, and time consuming process, the number of protein sequences determined by sequential chemical degradation remained low until the late 1970s. The first systematic analysis of the available information was undertaken in the pioneering work of M. Dayhoff and her colleagues at the National Biomedical Research Foundation in the US (NBRF). The first editions of the database appeared in printed form as the ,,ATLAS of Protein Sequence and Structure" [Dayhoff 1965, 1972, 19791. The group at Georgetown introduced the use of computers in the collection of sequence data early in the 1960s. Using a flat-file organization of data as structured text, separated by data type identifiers and line-feeds, the first software to access and query sequence data (PSQ & NAQ programs [ Orcutt 1982, 19831) was implemented in the early 1980s. The sequence part of the information was accessed by a tripeptide lookup table, allowing the ..............................................................................

i

.....

Data Growth in the Protdn Sequence Dsltakese

i7O.OOO :60.000

-

i50.000 -

t'

i40.000 i30.000 j20.000 -

i 10.000 84 85 86 87 88 89 90 91 92 93 94

................................................................................... Figure 1. Growth of the Protein Sequence Database 1984-1994.

VI. 1 Protein Sequences and Sequence Databases

21 1

user instant reply to the question: is a particular subsequence already known to the database? When the technology of DNA sequencing emerged, biological research changed its direction, scope and goals in arevolutionary way. Within a short time, the data flow realized an exponential growth phase that continues today, not only for the DNA sequence data but also for protein sequences that are inferred from them. In response, DNA databases were established in the early 1980s and have developed rapidly into relatively large groups (with staffs of 30140). Data are directly submitted to these centers (EMBL, GenBank, NCBI) electronically; the reliance on published literature as a source for this information has dramatically declined. The macromolecular sequence databases double in size every 1-2 years; as of early 1994 the sequences of approximately 65.000 proteins and approximately 110.000nucleic acids have been recorded in these databases (Figure 1). Historically, the macromolecular database centers emerged independently; each adopting their own formats for data distribution (Figures 2 and 3) and each producing separate versions of the data sets. PIR-International emerged as an exception to this rule. The PIRInternational centers produce a single protein sequence database. The CODATA sequence data exchange format [George 19871 resulted from the first systematic investigation of sequence data representation by members of the PIR-International database centers (Washington,Martinsried, Tokyo); this work was further developed, specifically addressing database semantics, and a more generalized Sequence Database Definition Language SDDL has been developed [George 19931. This language provides the facility to define a fully integrated nucleic acid-protein sequence database system. As an alternative, the National Center of Biotechnology Information (NCBI) has fostered the use of ASN. 1 as a syntax for sequence data representation [Courteau 19911;this latter work does not provide a mechanism for the definition of database semantics.

3 Data Processing and Principles of Data Organization The technology for sequence data processing emerged from crude forms. Flat file handling with the help of text editors and deposition of the data in a linear order for data distribution is still one of the major operating principles. The introduction of commercial relational database management systems required a massive effort and encountered basic problems in schema development (GenBank, Sybase 1988 [Burks 19921; EMBL, Oracle 1989 [Higgins 19921). These difficulties reflect the properties of the data. There are few independent fields suitable for query through relational selection operations and the relationships among the biological data are highly complex. As a result, there are few sensible views on the data. A critical issue not addressed in these early approaches was the influence of semantics. Much of the data are not directly stored but are encrypted in specialized operator notations that specify transformation on the data. The associated semanticswere not formally defined resulting in ambiguous data specification and data corruption. Many of these issues have been addressed in the development of the SDDL. There are basic differences between the operating principles of the nucleic and protein sequence databases. In both cases, the starting point of data processing is a scientific report (either extracted from the literature or directly submitted to the database centers), based on

212


CCHU #type complete cytochme c - human #formalsame Homo sapiens #mmmon_nameman #sequence-revision 30-Sep-1991 #text-change 03-Feb- 1994 A31764; A05676 AOOOOl A31764 #authors Evans, MJ.; Scarpulla,R.C. #journal Proc. Natl. Acad. Sci. U.S.A. (1988) 859625-9629 #title The human somatic cytochrome c gene: two classes of processed pseudogenes demarcate a period of rapid molecular evolution. #cross-references MUD8907 1748 #accession A3 1764 ##molecule-type DNA ##residues 1-105 ##label EVA ##cross-references GB:M22877 REFERENCE A05676 #authors Matsubara, H.; Smith, EL. #journal J. Biol. Chem. (1963) 238:2732-2753 #title Human heart cytochme c. Chymotryptic peptides, tryptic peptides, and the complete amino acid sequence. #accession A05676 ##molecule-type protein ##residues 2-28;2946;47- 100;101-105 ##label MATS REFERENCE AOOOOl #authors Matsubara, H.; Smith, EL. #journal J. Biol. Chem. (1962) 237:3575-3576 #title The amino acid sequence of human heart cytochrome c. #contents annotation #note 6 6 - k is ~ found in 10%of the molecules in pooled protein GENETIC #introns 57/1 CLASSIFICATION #superfamily cytochrome c KEYWORDS acetylation; electron transfer; heme; mitochondrion; oxidative phosphorylation; polymorphism; respiratory chain SUMMARY #length 105 #molecular-weight 11749#checksum 3247 FEATURE 2-105 #product cytochrome c #status experimental <MAT> 2 #site #class modified acetylated amino end (Gly) (in mature form) #status experimental 15.18 #bindingsite heme (Cys) (covalent) #status experimental 19,81 #bindmg_site heme iron (His, Met) (axialligands) ENTRY TITLE ORGANISM DATE ACCESSIONS REFERENCE

1 31

61 91

Ill

M P G E

G N E R

D L D A

V B T D

5

E G L L

K L M I

G P E A

K G Y Y

K R L L

10

I K E K

F T N K

I G P A

M Q K T

K A K N

15 C S Q C H P G Y S Y Y I P G T E

a0 a5 30 T V E K G G K H K T G T A A N K N K G I I W K M I F V G I K K K E

Figure 2. Cytochrome c - human; PIRl -International C 0 2 representation.


ID

xx AC xx

HSCYClA standard; DNA; PRI; 4622 BP.

J W ;

DT DT

22-APR-1989 (Re]. 19, Created) 24-DEC-1990 (Rel. 26, Last updated, Version 2)

DE

Human cytochrome c- 1 gene. complete cds.

xx

xx KW xx 0s

OC

oc

xx

RN RT

RL

xx DR DR

xx cc cc

xx FH FH

FT Fr FT FT FT FT FT FT FT Fl-

xx

21 3

cytochrome; cytochrome cl.

Homo sapiens (human) Eukaryota; Animalia; Metazoa; Chordata; Vertebrata: Mammalia: Theria; Eutheria: Primates; Haplorhini; Catanhini, Hominidae. [l] RP 14622 RA Suzuki H., Hosokawa Y., Nishikimi M., Ozawa T.; "Structural orgainization of the human mitochondrial cytochrome c-1 gene"; J. Biol. Chem. 264:1368-1374(1989). CPGISLE; HSCYClA; Release 2.0. SWISS-PROT P08574; CY 1-HUMAN. Draft entry and computer-readable sequence [l] kindly submitted by T.Ozawa, 28-OCT-1988.

Key

Location/Qualifiers

source CDS

l..4622/organisrn="Homo sapiens" join( 1407.. 1535.2 138..2334,2429..2555,264 1..2798,2888..3048, 3311..3411,3508..3612)/note="cytochrome c-1" prim-transcript bases Database Entries Type Rel. Description

__-_--------___-_-_---- ----------

* MIPSOWN * PIRl

*

PIR2 PIR3 MIPSTRN SWISS

* PATCHX * * * *

NRL-3D GB-BCT GB-EST GB-INV GB-MAM GB-PAT GB-PHG GB-PLN GB-PRI GB-RNA GB-ROD GB-SYN

30593 11982 32885 19893 1781 33329 3890 1686 13800 26342 9966 5006 5281 929 14390 29168 3466

19054 1663

PROT PROT PROT PROT PROT PROT PROT PROT NUCL NUCL NUCL NUCL NUCL NUCL NUCL NUCL NUCL NUCL NUCL

39.01 39.00 39.00 39.00 39.01 27.00 39.00 13.02 80.00 80.00 80.00 80.00 80.00 80.00 80.00 80.00 80.00 80.00 80.00

Protein Seq DB YIPSOWN Section 1. Classified and Anuotated Entries section 2. Annotated Entries Section 3. mverified Entries Protein Seq DB YIPSTRN, preliminary SWISS-PROT Protein Sequence Database Protein Seq DB PATCHX (subseq of MIPSX) NRL Protein Sequences in Brookhaven PDB Nucleic Acid GB Bacterial Nucleic Acid GB EST Nucleic Acid GB Invertebrate Nucleic Acid GB other Nucleic Acid GB Patent Nucleic Acid GB Phage Nucleic Acid GB Plant Nucleic Acid GB Primate Nucleic Acid GB Structural Nucleic Acid GB Rodent Nucleic Acid GB Synthetic

Figure 4. Databases available in ATLAS (MIPS on-line Feb 1994). informational retrieval systems that access static copies of the data sets. Such systems are available from the NCBI (National Center for Biotechnology Information, Bethesda, Maryland, USA), EMBL (SRS [Etzold 19931) and from PIR-International (ATLAS, available through MIPS, MPI f. Biochemie, 82 152 Planegg, Germany). As an example, we will introduce some of the some properties of the ATLAS system.


217

4.1 The ATLAS Multidatabase Information Retrieval System The ATLAS system provides simultaneous access to multiple databases. It addresses the problem of data access by using an inverted index data structure. Fields such as authors, journal citations, accession numbers, titles, feature names, species names, and superfamily names, are indexed. Database entries are logically addressed as units of information. Text queries are evaluated as substring searches. Retrieval operations generate a current list (active subset of database entries), which may be modified by Boolean combinations of successive commands. The user interface is through a command language interpreter modeled after the DEC (Digital Equipment Corporation) Command Language of the VMS operating system, a commandkommand-modifier based query language. Although designed primarily to handle macromolecular sequence databases, ATLAS can operate on any structured-text database.The program employs a single multidatabase, multifield index that allows simultaneous retrieval from any selected set of indexed databases and any combination of indexed fields within those databases. The index can be partitioned by database (placing different sets of databases in different partitions); the combination of partitions used at any specific time is defined as a logical list specified in a configuration file (external to the program). All information concerning the databases, the fields included in the indexes, and paths to the physical locations of the database files is stored in the configuration file and is external to the program. At start-up time, this information is loaded by the program and used to build the command language table. Figure 4 shows the set of databases available on the MIPS on-line system through ATLAS. The system supports the use of macros that allow commands to be renamed or new commands to be defined from any commandkommand-modifier combination and set of field names declared in the configuration file. These macros can be defined globally in the configuration file, locally in user-initialization files that are read at start-up time, or interactively. A summary of the available commands in the configuration used at MIPS is shown in Figure 5. HELP HELP displays the following types of information about the program: o An introduction for new users. The topic-name for this topic is: Introduction o Individual comanands and command modifiers. The topic-names for these topics are: AUTHOR, FIND, TYPE,

...

o Special topics. The topic-names for these topics are: genetic-code, symbols, ...

Additional information available: ACCESSION DEFINE HOST bfEl4BERS SET SHOW

AUTHOR

BASES commands FIND GENE JOURNAL Introduction PRINT QUIT REFERENCE SPECIES SUPERFAMILY FEATURE

COPY GET

KEYWORD

SCAN symbols

CROSS HELP LIST HATCH SEARCH TYPE

Figure 5. Commands of the ATLAS sequence database retrieval program.

218


ATLAS> scan YEIRE

10 matches found Database: MIPSOWN S20487 cell division control protein cdc27 - yeast (Schizosaccharomyces pombe) 124 SDFDELLPAV YEIRE KDVLYKKEDA S24417 dmpQ protein - Pseudomonas putida 6 +MNRAG YEIRE TVSGQTFRCL S38239 hypothetical protein Coxiella burnetii 220 VFGADREIFG YEIRE PIEFFRRRPS Database: MIPSTRN F33979C Caenorhabditis elegans cosmid FSSH2 41 VNLKIQDEKS YEIRE QWLIEAIQLV

-

Figure 6. Searching peptides in multiple databases. Care has been taken to ensure format independence. In particular, the program can access the GenBank and EMBL data in their native forms (the line-type formats as supplied by GenBank and the EMBL Data Library). With minor exceptions, format dependence has been relegated to index-generating procedures that are external to the retrieval program. The ATLAS program is written in ANSI standard C. With limited use of precompiler definitions it has been ported to VAX/VMS, OpenVMS and DEC OSF/l on Alpha AXP workstations, DEC ULTRIX, SunOS, Silicon Graphics, Macintosh, and PC/DOS under a wide variety of compilers, including MicroSoft C, Turbo C++, and Borland C++. A prototype version of ATLAS that runs under the client-server architecture using the TCP/IP protocol has been developed. The ATLAS-server uses an alternate UO channel in the ATLAS program; the remainder of the source code is identical to that of the terminalinterface version. Our paradigm in the development of client-server software is that the clients should reflect the native characteristics of the computers upon which they operate. People buy Macintosh computers because they wish to work within a MAC environment. Likewise, those operating on PC/DOS systems have grown accustomed to that specific working environment. This can be handled very efficiently in a client-server architecture because a variety of clients can be developed to specifically reflect these characteristics. To date, ATLAS-clients have been developed for the VAX/VMS, ULTRIX, UNIX, and Macintosh environments and have been distributed on a trial basis. ATLAS has also been designed to run in batch mode; this property allowed the program to be incorporated easily into the PIR Network Request Server; ATLAS has been distributed on the Atlas of Protein and Genomic Sequences CD-ROM since July 1992.Thus ATLAS now provides uniform database access to a broad-ranging user community whose computer capabilities span the spectrum from low-end standalone PC systems to the highend InterNet networked workstations.


219

4.2 Searching Molecular Sequence Databases Molecular sequence databases provide an extraordinary resource to gain insight into biological function. They serve as data repositories for experimental results as well as reference compendia, summarizing the current state of biological knowledge. In principle, they are structured in the form of separate entries, representing unbranched chains of nucleic acid or amino acid residues (Figures 2 and 3). A primary use of the sequence database is sequence data analysis by homology searches. An experimentally determined sequence can be scrutinized against the most up-to-date information of the sequence databases allowing the following questions to asked:

Is the sequence already known in the database? Do significantly similar sequences exist in the database? Is any additional information related to the similar sequences found in the database available (e.g., functional assignment, 3D structure)? Can we trust the information deduced from sequence comparisons? These questions are typically addressed first by searching against the sequence portion of a database. Most database searching programs do not employ the text portion of the database directly in resolving such queries, rather the text is made available after the database is searched to aid in the interpretation of the results. Because of the size of the current databases (20-200 million residues), even simple searches require several minutes for read and compare operations. Hence the emphasis is on prestructuring the data and/or limiting the operations necessary during the initial examination. Homology searches are conducted as complex string comparisons. For proteins, the alphabet of matching characters consists ofthe set of 20 naturally occurring amino acids. Based on similar physico-chemical properties and the redundancy of the genetic code, similarities between pairs of amino acids are quantified by the use of comparison matrices [Jones 19921. The optimal alignment of a sequence pair is the arrangement of two sequences (including gaps) that exhibits the maximal sum of the scores of matching amino acid pairs. Although rigorously proven algorithms have been developed to compute such optimal alignments, the methodology is fundamentally limited by the nature of the problem. The essential question is whether two (or more) sequences specify proteins that fold into similar structures and/or that exhibit similar biological functions. Because the structure/function problem remains unsolved, it is not possible to construct an effective measure for this propensity. Hence, sequence comparison methods can only suggest (rather than prove) that the observed sequence similarity is indicative of similarity of structure or function. Early analytical methods focused on the global comparison of two complete sequences along their entire extents [Needleman 1970, Waterman 19831. The classical concept of a protein superfamily was based on the assumption that point-mutational events related members of a superfamily to their common ancestors. The PIR-International Protein Sequence Database was originally designed based on this concept. The term homology is used to indicate that pairs or sets of sequences are evolutionary related. Provided that the sequences are closely related, the identification of a query sequence as matching a well described family is straightforward. The problem becomes increasingly complex as the evolutionary distance between the proteins increases. It is now well recognized that genetic

220


transposition has played a major role in the evolutionary process resulting in multidomain proteins assembled from evolutionarily distinct segments (or domains). Nevertheless, all sequence comparison methods focus on the point mutational process (and localized insertionldeletion events) as a model for the comparison and identification of related protein segments. The current generation of sequence database searching algorithms is based on local sequence comparison methods. These methods do not differ fundamentally from their global progenators, rather they select locally matching subsegments for comparison instead of attempting to match along the entire extents of the sequences. Exhaustive comparisons are computationally demanding. To obtain reasonable response times, either special parallel computer architectures [Brutlag 19931) or heuristic methods (e.g. FASTA [Lipman 19851) are applied. The evaluation of the results is often timeconsuming and statistical evaluations are subject to misinterpretations [for review see Altschul 19941. The following observations must be taken into account whenever sequences of weak similarity are compared: The scores associated with query or randomized sequences are not normally distributed but more closely follow an extreme value distribution. Probabilities calculated in the form of standard deviations from the average matching score against all sequences in a database (which assume a normal distribution) are underestimates. Many naturally occurring sequences show strong deviations from random distribution in their amino acid compositions.Standard sequencecomparison methodology is not applicable in these cases. It has been shown that the length of an aligned domain influences the significance of the comparison, i.e., longer stretches of the same statistical score are more reliable. Evolutionary modification of sequences occurs continuously in time. At a given moment in the evolutionary process, one may observe examples of any evolutionary distance from near identity to statistically insignificant similarity. Thus, no mathematical algorithm per se is able to distinguish between related and unrelated sequences as long as the three dimensional structure of proteins to be analyzed is unknown. Whenever possible, multiple alignments of sequence families should be used for the analysis of sequences. Conservation of specific residues in all members of a family can give strong indication of mutual relatedness.

5 Future Developments Current databases display a high level of redundancy. The nucleic acid sequence databases contain a large number of independent sequencing reports describing the same sequence; discrepancies are often observed among these various overlapping reports. The emphasis has been on assembling large collections of data at relatively low cost with limited concern over datareliability. Current estimates indicate that as many as 10-20 in every 10 000 bases deposited in the nucleic acid sequence databases may be in error, suggesting that more than 50% of the translations into open reading frames are subject to error. Proper assessment of the problem is complicated by the widely varying degrees of reliability found among the currently employed sequencing techniques. A crucial reinvestigation will be required as soon as appropriate and inexpensive technologies are available for the determination of molecular sequences within well-defined error limits.

VI.1 Protein Sequences and Sequence Databases

221

Large sequencing projects will have major impact on the future development of the macromolecular sequence databases. Because systematic sequencing of complete genomes is only at its advent and most experimentally determined sequences have been selected to elucidate known cellular functions, the current collections are known to exhibit strong biases. Model organisms like Saccharomyces cervisiae, Mycobacterium tuberculosis, Bacillus subtilis, Escherichia coli, and Arabidopsis thaliana are currently being systematically sequenced. These projects, together with the ongoing human genome project will fundamentally change both the magnitude and sampling of sequence data available for processing by the sequence databases. The integration of information available from different biological databases presents a major challenge. In addition to requiring the development and implementation of new informatics concepts, a significantly increased level of collaboration between database centers will be required [Fields 19921. Some of the concepts presented to address this problem have not been clearly focused and do not provide a clear definition of the data structures to be considered [Fuchs 19921. Modern software for navigation among data and informatics tools for their analysis and manipulation must be developed. However, such tools alone cannot compensate for the redundancy, incompleteness, inconsistency, and outright errors found within today’s sequence databases. Effectively organizing the sheer volume of data elucidated will require the underlying concepts employed in the databases to be more precisely defined and eventually will point out the shortsightness of accumulating all data without regard to quality, correctness, accuracy, or semantic consistency.

6 References Altschul S.F, Boguski M.S., Gish W., Wooton J.C. (1994) Nature Genetics 6,119-129. Issues in searching molecular sequence databases. Brutlag D.L. et al. (1993) Comput.Chem. 17,203-207. BLAZE: an implementation of the Smith-Waterman sequence comparison algorithm on a massive parallel computer. Burks, C., Cinkosky, M.J., Fischer, W.M., Gilna, P., Hayden, J. E.-D., Keen, G.M., Kelly, M., Kristofferson, D., Lawrence, J. (1992) NucLAcidsRes. 20,2065-206. GenBank. Courteau, J.B., (1991) NCBI News 1(1), 3 and 7. National Center for Biotechnology Information, NLM, NIH, Bethesda, MD. Dayhoff, M.O., Eck, R.V., Chang, M.A., Sochard, M.R. (1965) Atlas of Protein Sequence and Structure, National Biomedical Research Foundation, Silver Spring, MD. Dayhoff, M.O. (1972) Atlas of Protein Sequence and Structure, Vol. 5, National Biomedical Research Foundation, Washington, DC. Dayhoff, M.O. (1979). Atlas of Protein Sequence and Structure, Vol. 5, Supplement 3, National Biomedical Research Foundation, Washington, DC. Etzold T., Argos P. (1993) CABZOS9,49-56. SRS-an indexing and retrieval tool for flat file data libraries. Fields, C. (1992) Tibtech 10,58-61. Data exchange and inter-database communication in genome projects. Fuchs, R., Rice P., Cameron, G.N. (1992) Tibtech 10,61-66.Molecular biological databases - present and future. George, D.G.,Mewes, H.-W., Kihara, H. (1987)Prot.Seq.DataAnal.1,27-39. A standardized format for sequence data exchange.

222


George, D.G., Orcutt, B.C., Mewes, H.-W., Tsugita, A. (1993) Protein Seq. Data Anal. 5,357-399. An object-oriented sequence database definition language (SDDL). Heumann, K., George, D.G., Mewes H.W. (1994) manuscript submitted to CABIOS. A new concept of data distribution on wide area networks. Higgins, D.G., Fuchs, R., Stoehr, P.J., Cameron, G.N. (1992) Nucl.Acids Res. 20,20712074. The EMBL-data library. Jones D.T., Taylor W.R., Thornton J.M. (1992) CABZOS 8,275-282. The rapid generation of mutation data matrices from protein sequences. Lipman D.J., Pearson W.R. (1985) Proc.Nat1.Acad.Sci. USA 85,2444-2448. Improved tools for biological sequence comparison. Needleman S.B., WunschC.D. (1970)J.Mol.Biol.48,443-453.A generalmethod applicable to the search for similarities in the amino acid sequence of two proteins. Orcutt, B.C., George, D.G., Frederickson, J.A., Dayhoff, M.O. (1982) NzdAcids Res. 10,157-174. Nucleic acid database computer system. Orcutt, B.C.,George,D.G., Dayhoff,M.O. (1983)Annu.Rev.Bioplzys.Bioeng. 12,419-441. Waterman M.S. (1983) Proc.Nat1.Acad.Sci.USA80,3123-3124. Sequence alignments in the neighborhood of the optimum with general application to dynamic programming.


VI.2

Sequence Database Searching by Mass Spectrometric Data Matthias Mann

1 Introduction The cloning of gene products, the determination of their complete primary structure and the unraveling of complex regulatory networks places ever increasing demands on the sensitivity and speed of protein microcharacterization. Specifically, one of its main duties is to obtain enough amino acid sequence data to find and clone the corresponding gene. Alternatively, if the protein is already known, it should be identified as soon as possible. Presently, both tasks are typically handled by the following procedures: microdigestion, HPLC separation of the resulting peptides, Edman sequencing of some of them, followed by database searches with the resulting sequences. If the protein is new and of interest, oligonucleotide probes are constructed and the gene is cloned. Edman microsequencing is an established and powerful technique, however it is not an easy one and it faces limitations. These limitations are partly of principle nature (i.e. blocked N-termini) but relate mostly to the sample amounts needed and to the sample throughput. As an example, it would be interesting to study the tens or hundreds of proteins up or down regulated in certain states of the organism or tissue and visualized on two dimensional gels. Current techniques can not tackle tasks like this. There are now two developments that together allow the needed analytical advances, complementing and in certain cases substituting for Edman microsequencing. These developments are the maturation of biological mass spectrometry and the exponential increase of the volume of sequence data in the databases. Together, these trends provide the means for new strategies for the microcharacterization of proteins. The introduction of electrospray mass spectrometry (ESMS) [Meng 1988; Fenn 19891 and matrix assisted laser desorptiodionization mass spectrometry (MALDI) [Karas 1988; Hillenkamp 19911 has lead to a tremendous advance in biological mass spectrometry. ESMS is turning out to be extremely useful in structural biology whereas the simplicity and high throughput of MALDI are making it a method of choice for the modern protein facility. Using either method (ESMS or LDMS), it is now easy to obtain molecular weights of many biomolecules for which mass spectrometry was not even an option a few years ago. Sensitivities below the picomole range have been demonstrated and in some cases structural information could also be gained at these levels. Furthermore, mass spectrometric development is still in full swing and capabilities are sure to improve in the next years. R. Kellner, F. Lottspeich, H.E. Meyer (1994) Microcharacterization of Proteins, VCH Weinheirn

224

M.Mann

As one practical consequence of the rapid growth of sequence databases we and colleagues find that today more than 30% of "unknown" proteins delivered to a core facility are actually already contained in the sequnce databases. In coming years this proportion is bound to increase significantly (see the chapter about databases). The case of E. coli gives a preview of how powerful the method of identifying genes by mass spectrometric information will be. Approximately 70% of the genes of E. coli have been sequenced. Thus by the methods discussed in this paper, any good peptide map or partial sequence has a very high chance to pinpoint the gene without the need for chemical sequencing. Several yeast chromosomes have already been sequenced completely and the yeast genome, like the E. coli one, is expected to be completely known within a few years. The last human genes are expected to be entered into the databases by the year 2005 but even now most abundantly expressed genes have been at least partially sequenced. With these developments, the main task of analytical protein biochemistry will shift from finding new proteins to identifying and correlating them to databases. In a second step protein analysis will be concerned with finding the difSerences between the mature protein and the primary structure as described in the database (i.e. verification of the sequence, determination of post-translational modifications and of protein processing). For this second step biological mass spectrometry is already well established. In this chapter it will be demonstrated it can be a powerful tool in the identification of proteins as well. The concepts of database searching by mass spectrometric information will be explained using Peptidesearch, aprogram written by the author. The combination of mass spectrometric information with sequence databases such as SWISSPROT or EMBL is a new development that the author and others are still actively investigating. Thus developments are continuing at a fast rate and capabilities will still improve. The rest of the chapter is structured as follows: there are four main sections dealing with the program itself, with searching by total mass information, searching by peptide masses and searching by acombination of peptide mass and partial sequence. Each section is divided into subsections dealing with particular parameters of the search or with particular search strategies. Not all of these parameters that are explained in detail here will be of interest to the casual reader. The main non technical points are contained in the first part of section 3,4 and 5 and in the recommendation andprospects subsection of those sections.

2 Program and Algorithm Peptidesearch was written in C and C++ for the Macintosh line of computers using the Symantec THINK C compiler and development environment (Symantec, Cupertino, CA). Object oriented programming techniques were used extensively and the program was constructed with the help of the Think Class Library.

2.1 Requirements for Installing Peptidesearch Currently Peptidesearch runs on the Macintosh line of computers and uses 600 kByte of memory. The free hard disk space required depends on the database(s) to be searched and corresponds roughly to the number of amino acids or nucleotides in the database (plus a small

VI.2 MS Data for Sequence Analysis

225

overhead for the header for each entry). Examples at the time of writing are 11 Mbyte for SWISSPROT and 4.5 Mbyte for the Protein Identification Resource (PIR), division I.

2.2 Data Structures and Algorithm Peptidesearch works with several database formats including SWISSPROTEMBL, PIR and FASTA. Other formats may be added in the future. A sequence database (nucleic acid or amino acid sequences) is translated into a data structure that allows efficient searching and which is saved to the local hard disk. The format includes a header for each protein entry consisting of an internal index number, the average molecular weight as calculated from the sequence and the isoelectric point (PI) of the protein. For easy identification of hits in the match list, it also includes a part of the description line from the database entry. The algorithm for searching by intact protein molecular weight uses the precomputed average molecular weights in the header of each protein entry. It currently does not take account of signal sequences or other post-translational modifications of proteins in the database. When searching by a set of peptide masses the following procedure is applied: An ordered set of masses is generated from the user's mass spectrometric data. This set is split in two columns - lower and higher mass limit of each peptide - according to the mass accuracy that the user has specified. For example, the measurements 1 000 Da and 2 000 Da with mass accuracy 0.1% would lead to (999,1001) and (1998,2002). Then the "cleavage" amino acids C-terminal to which the protease cleaves are loaded into an array. Another array defines the "exception" amino acids, i.e. amino acids like proline, N-terminal to which there is no cleavage. Enzymes that cleave N-terminal to certain amino acids (such as Asp-N) can also be accommodated in this scheme. Database searching in Peptidesearch is "on the fly" allowing great flexibility in search parameters and modes. For example, the user can specify all common cystein modifications, whether average or monoisotopic mass should be used and costume cleavage rules instead of the standard enzymes. Peptidesearch scans the database in the following way: for each entry in the database the protein sequence is obtained from buffer. The masses of the amino acids are then summed until one of the "cleavage" amino acids is encountered - under the condition that the next amino acid is not one of the exception amino acids. The amino acid masses are retrieved from a vector that can contain either average or monoisotopic masses and the masses of standard and modified amino acids (e.g. carboxymethylated cystein). The whole operation involves only pointer dereferencing, summing and binning and is thus extremely fast in the programming language C. When a valid peptide mass has been obtained it is compared to all measured peptide molecular weights. If it falls between the lower and higher mass limit for one measured peptide a peptide hit is scored. If the user has specified that one or two cleavage sites may be uncleaved, the mass of the previous peptide is added to the current peptide and it is in turn compared to the measured masses. At the end of each protein sequence the number of peptide hits is evaluated. If it is equal to or higher than the minimum number of peptide matches specified by the user, a protein match is recorded. Presently, up to 500 protein matches can be displayed.

226

M.Mann

In the case of searching by partial sequence information, protein or nucleotide entries are first searched for amino acid patterns that match the one given by the user. For this search an error tolerant string matching algorithm is used. After a pattern match has been found, the algorithm tries to extend the pattern N-terminal and/or C-terminal to obtain the correct peptide mass. If the sequence pattern and the molecular weight information match, the protein is displayed in the match list. Depending on the options set, the resulting peptide must also satisfy the cleavage conditions for the currently selected enzyme (N- or C-terminal or both). Peptidesearch also allows searching the database by incomplete mass spectrometry/mass spectrometry (MUMS) results. Again the limited sequence pattern is matched first. Using the known ion structures produced in MSMS such as B or Y'ions [Roepstorff 1984l.PeptideSearch tries to match the molecular weight information to the sequence under consideration. By default, isoleucine is treated as equal to leucine ("I = L") and glutamine equal to lysine ("Q = K"). These two pairs of amino acids have the same nominal amino acid masses and can only be distinguished by mass spectrometry in special experiments.

2.3 Performance The time needed for searches varies with the type of computer used and the size of the database. On a Centris or Quadra type of Macintosh average search times are about 20 seconds for SWISSPROT (about 30000 entries, data file of about 11 Mbyte). Search times for other databases are proportional to the size of their data files. Peptidesearch - having precalculated the molecular weight of all entries in the database - will only consider entries in the user defined mass window. Therefore, an additional speedup of the search is obtained if the size of the mass window of possible protein hits is decreased. When searching nucleotide databases, there is an additional "on the fly" translation step from nucleotide sequences to amino acid sequences. This step increases the search time, as does searching in all six reading frames.

2.4 Collaboration with Other Programs Peptidesearch was designed to be used in conjunction with mass spectrometry software and other sequence and database programs. The goal was to enable interactive work with protein data and with databases. From some programs, such as "LaserOne" for Matrix Assisted Laser Desorption Ionization (MALDI) (M. Mann and P. Mortensen) and "MacSpec"for electrospray (Sciex, Ontario, Canada), data can be transferred directly to Peptidesearch. For mass spectrometric programs from other manufacturers an ASCII output file should be generated containing the masses of the peptides (not the protonated peptide masses) in any order and separated by linefeeds. Peptidescan will then import these files and manual entry of the peptide masses will not be necessary. It is recommended to obtain a copy of EMBL-Search, a program by R. Fuchs at EMBL, for accessing the EMBL nucleotide databases and SWISSPROT (EMBL-Search, SWISSPROT and the EMBL nucleotide database are on the EMBL CDs; contact the author about availability). The full information about hits in the database can then be retrieved immediately. Furthermore, since Peptidesearch is not a dedicated peptide mapping program, it is advisable to use a program for detailed scoring of mass spectrometric information against one given


227

sequence. Such programs can search against a single sequence in more detail. For example, disulfide bond structure, if known, can be taken into account, as well as nonspecific cleavages and the like. On the Macintosh T. Lee’s MacBioSpec (Sciex, Ontario Canada) has most of these functions whereas GPMA by P. Hgjrup is an excellent program for this task under Windows [Hgjrup 19901. For LC MS data PeptideMap by S. Can: et. al. can be used (Sciex, Ontario Canada). Similar programs are also being developed in the author’s group and will be available. Peptidesearch will be ported to the PowerPC line of Macintosh computers making search times even shorter. For completeness, the program will be extended with standard homology searching features so that it can be used as a “onestop” first check of a short sequence against a database. At the time of writing, Peptidesearch is available through Bruker Instruments. A version of the program is available from Sciex/Perkin Elmer under the name of Peptidescan. The author plans to additionally make the searching facilities available on line through the internet. In case of problems or questions regarding the availability of the program please send Email to “MannOEMBL-Heidelberg.de” or otherwise contact the author.

3 Searching by Total Molecular Weight With the introduction of electrospray and MALDI the upper mass limit of proteins that can be analyzed by mass spectrometry has been pushed to several hundred kilo dalton . Mass spectrometry can thus in principle measure the molecular weight of almost all proteins. An exact mass could be used to search the database and identify the protein in question. An example will illustrate the search by complete molecular weight. A mass of 16951.9Da has been measured with an estimated accuracy of 0.03%,a realistic mass accuracy obtainable with ESMS. (The mass accuracies cited in the literature are often optimistic and represent best performances or a mean value rather than a worst case. Furthermore, to include outliers, it is recommended that the estimate for the mass error is doubled.) Figure 1 shows the result of the search. Three different myoglobins are matched, but with an uncertainty of rfr 0.03% there are 5 other possibilities. Double clicking on each of the entries brings up information about the protein (Figure 2) which can be used to quickly discard most possibilities (by their isoelectric point or species of origin, etc.). For example, match 4, troponin C, has apI of 3.8, much lower than that of myoglobin which has a PI of 9. For the remaining possibilities the accession number is copied to EMBL-Search which immediately displays the complete database entry including crosslinks to other databases and literature references. In our example - presuming that myoglobin is the correct match, it would most likely be possible to both discard the other possibilities and retain the correct protein as a strong candidate. One other independent piece of evidence can then be used to positively identify the protein. It is important to check the full database entry for signal sequences and known post-translational modifications. A hit in the database that contains these must of course be discarded because the mass of the mature protein will not be the same as the one calculated for the sequence. (See paragraph 3.1.) Thus for match number 4, SWISSPROT lists an acetylation and match number 5 and 7 turn out to be fragments of larger proteins, meaning that these proteins can be taken off the list. (The two other remaining proteins are viral proteins.)

228

1 3

M.Mann

Search Result

[

Index

PptslStartl Acc. Mum. :PO2189

.................

................. +:PO2508 ......................

iP15915

1 MY [Da] 1

Protein Name

i 16953.48 ~MYOGLOEIN.

i.16948.93 ................

...........

[ALPHA CRYSTALLIN A CHAIN (FRAG ........................................................................................ [ 16953.54 [PROTEIN FP7.

+

+

Figure 1. Output of search by total molecular weight 16 951.9 Da measured with a mass accuracy of 5 0.03% (+ 5 Da). SWISSPROT was searched. The first column lists the match number, the Index column an internal index number and the subsequent columns the number of amino acid residues, the accession number of the protein, its average molecular weight and a short description. The above hypothetical mass accuracy of 0.03% is normal for electrospray mass spectrometry but can often be improved. In this example, a mass measurement accuracy of 0.008% has actually been obtained, meaning that only the correct myoglobin (horse heart, average molecular weight 16 950.6 Da) would be retained and the protein could have been positively identified with very little additional information.

3.1 Limitations of Searching by Total Molecular Weight As pointed out above, there are principal problems in comparing measured molecular masses of proteins with masses as computed from the entries in the databases. These problems are: It is still difficult to measure the molecular mass of medium sized and large proteins accurately enough for a meaningful search. This is particularly often the case for proteins extracted from gels, as they often show a chemically induced, broad distribution of masses. Any protein modification will lead to a discrepancy between the molecular weight as recorded in the database and the measured molecular weight. If the modifications are known, programs can take account of them by reading the feature tables of the database entry.

V1.2 MS Data for Sequence Analysis

229

Alternatively, the search can be repeated with the most common modifications, i.e. if acetylation is possible the search can be repeated with the measured molecular weight minus 42 Da (which is the mass difference induced by acetylation). In this way the most common modifications can be taken into account. However, there are myriad possibilities for modifications and the complete primary structure has been solved for surprisingly few proteins. Almost any sequence error in the database will change the molecular weight appreciably. In genomic sequencing a 0.01% error rate per nucleotide is aimed at, however, it is our experience that even many of the well characterized proteins in the most reliable databases, i.e. SWISSPROT or PIR I and 11, still contain sequence errors or conflicts. Note also that the precise same protein must be in the database, even near identity in sequence will lead to a substantially different molecular weight as far as mass spectrometric searching is concerned.

Protein Info

1 flcce. Number:

r

P292911 I TROPONIN C, lSOFORM2B.

'

1

'

pl:

1

MDSLDAEQLSALQKAFDS FDTDSKGF ITPETVGVILR MMGVK I SEKNLQEV I SET DEDGSGELEFEEFVELAA KFLIEEDEEALKAELREAF RVYDRGGNGYITTDVLKE I LRELDNRLTEEDLDSI IEE VDEDG S GT LDFNEFMQMM

3.80

16950.8

Da

14 peptides

Figure 2. The "Protein Info" window. The accession number and part of the protein name and the protein sequence are displayed. The peptide masses are masses predicted by the currently selected cleavage agent.

230

M. Mann

3.2 Recommendations and Prospects for Searching by Total Molecular Weight Whenever an accurate protein molecular weight has been measured, by MALDI of small proteins or by ESMS, it is useful to search the database using this information. Three conditions have to be met for the search to be successful: The molecular weight must be obtained fairly accurately, i. e. between 0.1 % and 0.0 1% or better.

I'

The protein must be unmodified or the modifications must be known, (or at least be guessable"). The entry in the database must be correct.

These conditions can be fulfilled for many proteins in the range up to 20 - 30 kDa but are currently seldom met for larger proteins. There are now techniques that can yield the mass of small proteins with an even higher accuracy, i.e. ES-FT ICR MS (ElectroSpray - Fourier Transform Ion Cyclotron Resonance - mass spectrometry). If these methods can be implemented in routine practice, searches with a very small mass error can be performed. For small, unmodified proteins this form of identification may be very useful in the future. Furthermore, it is being suggested that these techniques should be used in protein and peptide charting [Feistner 19941. The term refers to a wholesale mapping of proteins in a specific tissue by obtaining the molecular weights of all proteins, preferably without pretreatment. By comparing the protein map of different tissues or tissues in different biological states, proteins with important biological functions can in principle be identified. Another class of applications is the identification or confirmation of a well characterized proteins, such as recombinantly expressed ones, from a small number of possibilities. In this case the above limitations do not apply or are very much diminished. With a high mass accuracy the chance for a false positive is low. If the mass has been measured within 0.01% there are usually only zero to ten matching proteins. Furthermore, all except the correct one can in such cases be rejected by considering for instance the species of origin. In summary, searching by molecular weight alone is useful in certain, well defined circumstances. With further advances in mass spectrometry and in the completeness and accuracy of the molecular weight databases, this approach will become a powerful tool in protein characterization. It might even become possible to identify several unseparated proteins in a single measurement, obviating the need to separate and digest them.

4 Searching by the Molecular Weight of a Set of Peptides Generated by Sequence-Specific Cleavage of a Protein A mass spectrometric peptide map can now be measured fairly routinely even on small amounts of sample. In the case of electrospray mass spectrometry a peptide map is typically obtained by HPLC separation of a digest mixture and analysis of the eluting peptides either on line or by infusion of collected fractions. The former approach has the advantage of


231

Rab7.2hdiq~st.0.5p/~l,ref,wash,210 shots; Mon,29 Nov 1993.21 :34:13

n n m

I87 90

4

1112.86

I 1000

1200

1400

1600

1800

Figure 3. Part of the MALDI mass spectrum of the tryptic digest of a small protein. All peaks are isotope resolved and the monoisotopic masses are determined. The mass scale is in Dalton. 250 fmol of the digest was applied to the target according to the method of [Vorm 19941.The measurement was made with a Bruker REFLEX instrument. No internal calibrant was used. minimal sample handling and of speed, whereas the latter approach allows simultaneous uses of a fraction for MS, MS/MS and Edman degradation or other chemical procedures. Sequence coverage depends on many factors including solubility of the resulting peptides, amount of sample and completeness of the digest. Values range between larger than 90% for large amounts of protein (more than 100 pmol) digested in solution to smaller than 20% for difficult digestions of low amounts from gels. By MALDI MS on the other hand, a peptide map of a protein can be obtained on a small aliquot of the unseparated mixture. The speed of the technique as well as recent advances in sensitivity and mass accuracy [Vorm 1994a, b] make MALDI MS especially attractive for this type of search. Peptide maps obtained from small proteins digested in solution can be complete or nearly complete (larger than 90%) whereas peptide maps from larger proteins

232

!

M.Mann

Search Result

Figure 4. Search Result for the data of Figure 3. One missed cleavage site was allowed, the mass accuracy was set to kO.5 Da (monoisotopic masses) and the full protein mass range was allowed.

obtained from gels have smaller sequence coverage of 20% to 50% using current protocols. Figure 3 shows an example of a peptide map obtained from 250 fmol of a peptide mixture with a MALDI reflector instrument (Bruker REFLEX, Bruker GmbH, Bremen). High quality peptide maps are a very powerful tool in database searching, because they represent a set of independent, highly specific data points that together constitute a fingerprint of the protein. While any one peptide mass, measured with a given mass accuracy, could be found in a number of proteins, it is extremely unlikely that many measured peptide masses would be found in the same protein sequence by chance. The idea of database searching by a set of peptide masses has first been mentioned by Henzel et al. and it is being investigated by a number of groups [Henzel 1993; James 1993; Mann 1993; Pappin 1993; Yates 1993; Mortz 19941. The practical steps involved in the search are as follows: first the protein is digested with a sequence specific protease such as trypsin or a sequence specific chemical agent such as CNBr. Then the resulting peptide mixture is analyzed, typically by ESMS or MALDI, and a set of peptide masses is obtained. These masses together with the estimated mass accuracy are entered into a program and a user specified database is searched. The result of the search is a list of proteins whose predicted peptide masses for the given cleavage agent contain at least a minimum, user specified number of measured masses. To illustrate, the set of masses resulting from the MALDI analysis of the protein digest in Figure 3 is entered into the program and a search is run. The two top entries have all eight peptides as peptide matches:


233

entries 3 through 11 only have four peptide matches each and are all larger than 100 kDa. Highlighting the peptide matches of the first two entries in the Protein Info window reaveals that all eight peptides share the same sequence. Thus the proteins are substantially the same. (The database entries in SWISSPROT reveals that they are the same protein from rat and dog respectively.) Table 1 compares the mass spectrometric data to the sequence as retrieved by Peptidesearch. The postcalibration column shows how the peptide masses compare to the sequence when the systematic measurement error is removed. (See paragraph 4.6.) A measurement and database search such as the example shown here positively identifies the protein in question.

Table 1.

1 2 3 4 5 6 7 8

Measured Mass"

Amb

Am(postcal.)c

[Dal

Pal

Pal

1111.85 1186.89 1282.96 1325.04 1475.15 1589.19 1647.29 1933.37

0.32 0.27 0.33 0.35 0.40 0.36 0.47 0.39 ~~

0.01 - 0.05 - 0.01

0.01 0.03 0.02 0.08 - 0.05

Sequenced

RAQAWCYSK FQSLGVAFYR NNIPYFETSAK TSLMNQYVNKK DPENFPFVVLGNK EAINVEQAFQTIAR LVTMQIWDTAGQER TLDSWRDEFLIQASPR

~

a monoisotopic masses from Figure 1 (proton mass is subtracted)

measured mass minus calculated mass from the sequence postcalibrated value of mass difference between measured and calculated value after postcalibration. The accuracy is on average 0.002% after postcalibration and 0.025% before. d Sequences as retrieved by Peptidesearch. Internal cleavage sites are underlined

b C

A fundamental feature of the search by peptide masses is its error tolerance and high search specificity. (Search specificity means the power to discriminate; thus a search with a factor ten higher search specificity will have a list of false positives which is only one tenth as long.) Every protein is characterized by a set of data points each of which can be measured with high accuracy. Even a small protein will generate a number of peptides only some of which are needed to pinpoint the protein in the database. For example, any four of the eight masses in the example are sufficient for identification. Thus we can tolerate very incomplete peptide maps covering only a small part of the sequence. We can also tolerate a number of discrepancies between the molecular weight as expected from the sequence in the database and that present in the mature protein (i.e.post-translational processing and sequence errors). In the following, the influence of the main parameters in the search is explained and some strategies to arrive at sound conclusions from the search results.

234

M.Mann

4.1 Influence of Mass Accuracy Peptide molecular weights are the data we are searching by, therefore the mass accuracy is one of the central parameters. Attainable mass accuracies in the most important part of the tryptic mass range (500 to about 2 500 Da) typically vary from f 2 Da (linear MALDI instruments without internal calibration) to less than 0.3 Da (high accuracy ESMS data and MALDI reflector data). This tenfold range of mass accuracies leads to an even higher range in search specificities because the chance to randomly match each peptide mass in a protein is proportional to the achieved mass accuracy. Thus a tenfold higher mass accuracy will increase the search specificity by a factor of ten for a single peptide. To a first approximation each peptide contributes an equal factor towards the search specificity. Hence, when we measure four peptides with a ten fold increased mass accuracy the specificity will be 1O4 times higher. (A search output of 10000proteins would on average be reduced to one!) Statistically, an improvement in mass measurement accuracy will improve the search specificity by a factor (improvement in mass accuracy) to the power of (number of peptides measured) .

4.2 Influence of Target Protein Mass Range When searching a given set of peptide masses, large proteins in the database are much more likely to result in a false positive than small proteins. This fact is easily appreciated by performing a theoretical tryptic digest on a protein of 500 kDa. Such a protein will result in more than 500 masses between 800 and 2000 Da. A given peptide mass then has up to a 50% chance to be randomly matched as every other mass is already "occupied" by an unrelated peptide. A 15 kDa protein on the other hand will only result in - say - 10 peptide masses in the same mass range and the chance to randomly match these masses would be fifty times lower. As in 4.1 above, the effect is multiplicative; therefore a larger protein can be orders of magnitude more likely to appear as a false positive than a small protein. The example in Figure 4 bears this out: Of all the random matches none has a mass below 100 kDa and there are several matches to the very few proteins larger than 300 kDa. As a rule of thumb the relative factor is (number of peptides measured) to the power of (ratio of upperprotein mnss limits). Thus, if the upper limit of proteins under consideration is reduced from 200 kDa to 100kDa and 10peptides have been measured, then the search will be 210 or 1 000 times more specific. A lower limit on the protein molecular weight is not nearly as important in limiting the number of false positives but of course it may reduce the number of proteins to be considered significantly. In most cases, at least a very rough estimate of the protein molecular weight can be made. Thus, if the protein has been isolated or previously run on a gel an upper mass limit can be given, such as25% more than the apparent molecular weight determined by comparison to molecular weight markers on the gel. Even this limited information is very useful in the search as apparent from the above discussion. Ideally the molecular weight would be obtained by mass spectrometry which would yield a very tight limit on upper mass. (The protein in the database may appear larger, however, due to signal sequences.) The influence of protein mass can also be taken into account automatically by incorporating the above relationships (expressed in mathematical fashion) into a score. However, it remains true that it is advantageous to experimentally determine an upper limit to the possible molecular weight range of the protein to be identified.


235

4.3 Influence of Minimum Number of Peptide Matches This parameter is purely one of convenience. The smaller this setting, the larger the reported number of protein matches. Since the program runs very fast the best strategy is to try a medium or high value first - such as half the number of measured peptides - and then reduce the number if very few protein matches have been found. In the example above, we could have begun the search with a setting of five or six and found only the correct protein.

4.4 Influence of Partial Digestion Setting This setting determines if partial digests are counted as peptide hits. Thus in Table 1 above, five peptides are limit peptides (contain no internal K or R) and three are "partials" containing one uncleaved site. Since the partial digestion parameter was set to "1" all peptides were matched. A setting of "2" would also have found peptides with two internal arginines and lysines. The obvious advantage of choosing settings I or 2 is that sequence specific peptides will be scored which would otherwise go unnoticed. Almost every digest contains at least some partial peptides and in some digests they can be the majority of peptides. In fact we have found examples which could only be identified in the database on a setting of 2. The disadvantage of using a high setting is that there will be many more false positives. The effect is similar to the one of having a larger protein mass window. The effective number of peptides which can generate a false positive is much higher. (It is effectively doubled when the setting is " 1".However, the peptide mass distribution is shifted to the high mass side since non-limit peptides consist of the sum of two limit peptides and therefore they are "less harmful" than a doubling of the protein mass range.) As a practical rule this parameter can be set to " 1" in average cases. If one is fairly certain about a complete digest (i.e. digest in solution; relatively small mean of the measured peptide masses) then a setting of "0" may be the more discriminating yet sufficiently inclusive one. Obviously, a digest which has proceeded essentially to the limit and hence allows the setting of "0"will produce a much more certain search result than a limited digestion. Thus experimental conditions should be optimized in the direction of a limit digest.

4.5 Choice of Enzyme The choice of enzyme or other cleavage agent will depend mostly on the problem at hand, as there are few general sequence specific proteases. Also, the digestion may not work with all of them and the resulting peptides may or may not be soluble. As far as the database search is concerned, the most important aspect of the enzyme is the absence of nonspecific cuts and cleavage specificity limited to few amino acids. Thus, chymotrypsin, while a common choice of enzyme, is not ideal for database searches. However, the choice of enzyme directly influences the average length of the resulting peptides and is therefore an important, optimizable parameter for the search. Trypsin, LysC and CNBr are three prototype enzymes that lead to relatively small peptides (trypsin, cleavage C-terminal to R and K) medium sized peptides (Lys-C, cleavage C-terminal to K) and large peptides (CNBr, cleavage C-terminal to the seldom occurring M). All are specific

236

M.Mann

and in general use so they represent valid choices for many proteins. The choice between them hinges on the following considerations: What is the relative mass accuracy in the resulting mass range? For instance, tryptic peptides may be measurable far more accurately than CNBr peptides in the particular MS set up used. What is the expected agreement between the sequence as given in the database and the primary structure as given in the mature protein? The more differences, the smaller the peptides should be. If, as an example, the protein is heavily glycosylated and otherwise modified; relatively small peptides should be produced to match as many of the predicted molecular weights as possible. If, on the other hand, there are few modifications and the database entry is presumed to be accurate, then larger peptides might be used to advantage. Then the peptide mass distribution is spread out over a wider range (more than 10 kDa for CNBr compared to about 1 kDa for trypsin). This fact can drastically decrease the chance to randomly match one or more peptide masses of an unrelated protein. Particularly in the case of large proteins, where a number of random peptide matches are statistically expected it may be a good strategy to produce large protein fragments. Of course, when using longer peptides there are also fewer data points to search (assuming a constant sequence coverage).

4.6 Avoiding False Positives For the practical application of the technique it is of special importance how to deal with false negatives and false positives. A false negative occurs if we didn't identify the protein in the database even though it was present. In such a case by definition the data set was poor (otherwise the protein would have been found). As long as that fact is realized and no positive conclusions are drawn, i.e. that the protein can not be in the database, no harm is done except for additional work in further characterization of this protein. More important is the avoidance of false positives, i.e. the incorrect identification of the protein in question. Below some experimental procedures are listed that can aid in the correct identification of proteins in difficult cases. However, some precautions can be taken when deciding on a likely match. First, there should be a clear difference between the top match and following matches (taking into account that larger proteins have a much larger probability to be matched randomly than small proteins). This process can be formalized by calculating a score based on the statistics of the search. However, since these statistics are complicated, no mathematically correct theory has been proposed yet. A simple way to estimate the statistical significance of the match is to change the search data slightly, for example by offsetting masses by two times the mass error, and then rerun the search. If proteins with a similarly high number of peptide matches are returned, the result was not significant. (A quick way to perform such a check in Peptidesearch is to switch from the current setting of monoisotopic or average mass to the other. This will shift masses by about 0.07% and is thus a valid method when mass accuracy is high.) In determining the significance of a match the next step is the evaluation of all the information about the protein. Obviously, the species of origin should make sense as should the tissue type and characteristics of the protein such as the isoelectric point, if known. Secondly, the sequence of the proposed match should be "fine scored" against the mass


237

spectrometric data. For example, the likelihood that the match is correct is increased if small peaks that have not been used in the search can be assigned to the sequence. Likewise, it increases the confidence in the search if previously unassigned peaks can be assigned using known post-translational modifications and if the absence of expected peptides can be rationalized (i.e. suppression of very hydrophobic peptides, washing out of very hydrophilic ones). The mass differences between measured and calculated peptides should be determined. If they are systematically different there could be a systematic error in the measurement. In the example (Table 1) masses are systematically too high. Once the systematic error is removed by "post-calibration'' a much better fit can be obtained. Thus in column three of table 1 mass differences are below 0.1 Da for all peptides. This would be extremely unlikely to happen if the protein was a false positive.

4.7 Special Searches: Time Course Digestion, Parallel Digestion and Subdigestion Some simple experimental procedures can be followed to allow positive identification of a suspected match. In a time course digestion we can search the database by the final result of the digestion. If a likely match is found, it can be independently corroborated by examining the mass spectra at earlier stages of the digest. It should be possible to correlate some of the larger masses of these spectra, which are not present or much diminished in the final spectrum, to the proposed sequence. When we use parallel digestion, an aliquot of protein solution is digested by another preferably independent - enzyme (i.e. the pair trypsin and Glu-C). The proposed protein should appear at or near the top in both searches. In subdigestion, a large fragment of the protein is isolated, mass measured and further digested (e.g.a tryptic digest of a CNBr fragment is performed). the program can then search for matches within a "sliding window" given by the mass of the large fragment. Note that in this "sliding window" algorithm the detrimental influence of a large protein mass on search statistics is removed. In fact, the mass of the parent protein is irrelevant. Thus this last strategy is particularly attractive in the identification of very large proteins that yield few peptide masses in a given digest. In addition to the strategies listed here, many others can be imagined. The main point is the addition of independent evidence to positively identify the protein of interest. Note also that in cases where a decent sequence coverage and a good mass accuracy have been achieved the match will be immediately obvious and the above strategies will not be necessary.

4.8 Special Searches: DNA Database Searching Since almost all sequence data today originates from DNA sequencing it would be attractive to search DNA databases directly. By searching in all six reading frames one might avoid wrong assumptions about coding regions and one might be able to find proteins in spite of frame shift errors. Direct searching of DNA databases by MS information has been

238

M.Mann

demonstrated [Mann 19931; here we will only point out some caveats of this method. The main problem currently lies in the fact that search programs must be able to intelligently read the feature table of the database entry. This is more difficult for DNA than for amino acid databases because in DNA databases the same gene can be spread over several entries. Furthermore, eukariotic genes have an intronlexon structure that should be taken into account when searching the protein. Otherwise, at least the peptides spanning introns would be lost and most likely the correct reading frame would be lost as well. On the other hand, if all the information from the feature table is taken into account then almost by definition no protein will be found in an unexpected reading frame. For the moment the following procedure is recommended. First search the amino acid translation of the database in question (ie.Patch X or GenPept) which will take care of the matches in the most likely reading frame. Subsequently, the DNA database itself can be searched in all reading frames. However, it appears that DNA databases can more profitably be searched by partial sequence information as described in section 5.6 below.

4.9 Searching Incompletely Purified Proteins and Protein Mixtures If the MS data are good it is possible to identify two or more proteins from the same digest by an iterative procedure. If a first match has been found, all the peptide masses corresponding to its sequence can be subtracted from the spectrum. A subsequent search will then identify the second protein. In the important case of protein - protein interaction where one partner is known, its peptide masses can immediately be subtracted from the spectrum and a search be performed with the remaining masses.

4.10 Limitations of Searching by Peptide Masses There are few limitations of searching by peptide masses. The MS technology for obtaining such spectra is in hand and the most common proteins have already been sequenced or will shortly be sequenced. One limitation is that closely related, almost sequence identical proteins may not be identified if the differences are enough to cause changes in every or almost every peptide. Thus the protein in question may be known in dog but the protein in chicken may nevertheless not be identifiable by tryptic peptides if there is on average just one amino acid substitution every 10 amino acid.

4.11 Recommendations and Prospects of Searching by Peptide Masses In summary the following rules apply to the identification of proteins by their mass spectrometric peptide maps. The digest should be as much as possible a limit digest (in which case the favorable "0" setting can be used, excluding partial digest products) and the mass accuracy of the peptide map should be as high as possible. An upper mass limit to the possible protein masses very much relaxes the requirements on the search. The average size of the peptides should be small for proteins where many discrepancies between database entry and mature protein are expected and large for large proteins. The significance of a match can be


239

determined statistically by performing closely related searches and its plausibility is determined by comparing all available data on the protein with the information in the database. The strategies of time course digestion, parallel digestion and subdigestion can be used in difficult cases and for independent identification of a proposed match. DNA databases are currently best searched in their auto translated form. It is reasonable to expect that the above procedure will soon be the method of choice to quickly identify proteins in minimal quantities. Compared to Edman degradation mass spectrometry should be more sensitive and it has the added advantage of not consuming chemicals and of probing a larger percentage of the sequence; hence allowing identification not only of the protein family but the actual protein.

5 Searching by the Molecular Weight of a Peptide and Its Partial Sequence In the previous two sections searches by either the intact protein molecular weight or a set of peptide molecular weights were considered. Now we will investigate searching by the mass of a single peptide combined with partial sequence information obtained by either Edman degradation or by MSMS information. As will be discussed this is the most powerful and general of the three mass spectrometric searching methods [Mann 19931. Once peptides have been produced of an unknown protein these can be mass measured as discussed in the previous section. Furthermore, sequence information can be obtained by either Edman degradation or mass spectrometric fragmentation (MS/MS). The result of either method may be a complete sequence. However, in a general case we will know the sequence of part of the peptide while we only have the summed mass of its amino acids in other parts.

5.1 Searching by Partial Sequence Obtained by Edman Degradation and the Peptide Molecular Weight When sequencing at very low sample levels by Edman degradation a part of the sequence may be tentatively assignable, however the sequence obtained may not be enough to search the database for a meaningful identification. Since the amount of peptide used in Edman degradation is typically more than one picomole, a MALDI spectrum can be obtained of part of the sample. As described below, the combined information - peptide molecular weight and partial sequence - can then be used to search the peptide in sequence databases. A hypothetical example will illustrate the search. In an Edman sequencing run the first and second amino acids are unclear, then the sequence is AXDE, where X could be any amino acid. The enzyme used is trypsin. Searching the database just by XXAXDE produces more than 2300 matches in SWISSPROT or almost 8% of all entries. If we further assume that we have measured the molecular weight of the peptide to be 1800.3 Da with an uncertainty of 0.03% we can rescan the database. Now only 19 matches are left. If we require that both of the termini of the peptide should obey the cleavage rules for trypsin (i.e. that there should be R or K at the C-terminus of the preceding peptide and R or K at the end of the found

240

M.Mann

peptide), then there are no peptides are matched at all. Thus the protein that the peptide derived from must still be unknown. This example shows how the search efficiency can be increased more than a thousand fold by taking into account a simple mass measurement (Actually, the information that the peptide is tryptic is now used as well, somthing that could not have been done without the mass measurement.) The combination of Edman and MS information allows much weaker Edman data to be used in searching than previously possible. In a second hypothetical example to prove this point the sequence XEAXE has been obtained from a CNBr peptide. However, the data is ambiguous and it is felt that even the three amino acids might still contain an error. Searching XEAXE with results in more than one match per protein on average! Entering the measured molecular weight as 4557 Da with a mass accuracy of 0.03% results in serum albumin and Ribulose Biphosphate Carboxylase (easily discharde because of a factor three lower molecular weight) as the only matches. Thus the combination of just three common amino acids, one of which could be wrong, can in this case uniquely identify a protein when coupled to a measured molecular weight. The amino acid sequence information used in the search does not necessarily have to originate from Edman degradation. Other possibilities include exopeptidase digestion and mass measurement of the resulting peptide mixture that typically yields some N-or Cterminal sequence information. By choosing ,$-terminal extension" the program will attempt to find a peptide sequence in the database which (in the case of carboxypeptidases) ends in the given sequence and matches the measured molecular weight. Again just three amino acids will typically be enough to uniquely identify the protein if an accurate molecular weight has been obtained. Uses of this technique could be mainfold. Sequencer time could be saved on peptides that are already known (i.e. the sequencer can be stopped after 3 cycles if sequence and molecular weight match a plausible protein in the database). A prototype application would be the identification of immune peptides such as MHC complex peptides of which typically only the molecular weight and very few amino acids can be obtained.

5.2 Searching by an MSMS Pattern MSMS is a technique in which peptides - typically in the mass range up to two kDa - are mass selected in a first step and fragmented in a second step, usually by collisions with rest gas. The fragments from a single peptide are then mass analyzed and the resulting pattern contains information about the sequence of the peptide. In favorable cases the complete sequence of the peptide can be reconstructed but it is more usual that only a part of the sequence spectrum can be assigned. The nomenclature employed for peptide MSMS spectra [Roepstorff 19841 identifies the breakage point and whether the fragment contains the N-terminal or the Cterminal part of the peptide. For example, cleavage of a ten residue peptide at the C-terminal amide bond of the third amino acid could lead to a €33 ion or a Y7 ion or both. MS/MS spectra can seldom be used to positively assign the complete sequence of the peptide ion, but it is usually possible to identify a run of fragment masses representing a stretch of amino acids. This information is highly specific for the peptide and is usually enough to identify it uniquely in the database when using the procedure below.


24 I

Figure 5 shows a schematic of the information contained in the MS and M S M S spectra of peptides. As can be seen in the figure, there are three regions of the sequence about which there is information. The run of sequence ions provides the sequence information from a part of the peptide. From the start of the sequence ion run, i.e. from the mass of the lowest fragment in the series, we know the mass of the unsequenced part N-terminal to the sequence run, marked by m i in the figure. In the example of Figure 5, B3. Bq, B5 and B6 would be known. The differences between them yield the sequence pattern - AFIL - and the value of B3 tells us what the sum of the first three amino acid masses is. By subtracting the largest fragment ion mass from the total molecular weight we obtain m3 the mass of the unsequenced part C-terminal region 111. In the example m3 is equal to Mr - B6. The three regions I, 11,111 about which total mass (I and 111) and sequence (11) is known, can be used in combination to search the database. Note that we do not necessarily know in which direction the sequence runs. Therefore, the search should generally be performed in both directions.

Figure 5. The information contained in an M S N S spectrum. The data available are the complete molecular weight Mr, the mass up to a run of sequence ions (ml), the sequence in a part of the peptide (AF and I or L, in this case) and the mass of part of the peptide C-terminal to the run of sequence ions (m3). Note that the direction of the sequence is not necessarily known. "WL" stands for isoleucine/leucine, two amino acids that normally can not be distinguished by mass spectrometry.

242

M.Mann

An alternative way to search the database is to generate a list of all peptides that match the molecular weight measured and to predict their MS/MS fragmentation pattern. This pattern is then matched against the experimentally obtained one [R. Johnson, Washington University, personal communication]. The principle disadvantage of that technique is that it can only find exact matches in the database because any sequence error can change most of a fragmentation series.

5.3 Matching Peptides with Sequence Errors If the sequence of a peptide is in the database but contains errors this peptide can not be matched based on the molecular weight alone. However, the high specificity of the search by partial sequence and by peptide molecular weight often allows retrieval of even these peptides. We use the information about the three regions of the peptide, as explained above. One can search by the total mass of region I and 111as well as the sequence of region I1 while allowing an error in one of the three. Suppose, for example, that there is an error in the sequence corresponding to region I of a peptide. The partial sequence and the mass m3 of region I11 can then be used to find a candidate sequence in the database. If region I can be reconciled with its measured mass by a plausible amino acid substitution, then the correct protein might have been found and there may be a sequence error in the database. The same procedure can be used to find nearly identical sequences differing only by one amino acid substitution. Again the peptide can be matched using two of the three regions of the measured peptide. If a plausible substitution (such as a functionally neutral mutation) can explain the mass shift, one might have found a closely related protein.

5.4 Matching Peptides with Post-translational Modifications The procedure for this search is nearly identical to that of searching with the possibility of sequence error. In this case the mass shift is most likely in region I or I11 of the protein since it would have been noticed in the sequence run if it had been in region 11. Furthermore, the most common modifications have a limited set of mass shifts associated with them, such as + 80 Da for sulphations and phosphorylations. Therefore it is possible to apply the above technique of matching two of three regions and later try to match the remaining region using a catalog of typical molecular weight shifts due to post-translational modifications.

5.5 Removing Contaminating Proteins One obstacle to obtaining higher sensitivity in protein identification is that the protein needs to be purified to a very high degree; otherwise contaminating, abundant proteins (such as actin) will be sequenced instead. Using molecular weights and just a few amino acids, such proteins can quickly be discarded and peptides from the protein of interest can be identified.


243

5.6 Searching DNA Sequence Libraries In the last section it was mentioned that it was desirable but not straightforward to search DNA databases directly. However, when searching by molecular weight and partial sequence information, the problems of extracting a whole protein sequence from a nucleotide database do not arise. Only one peptide needs to be matched at a time. For any given peptide it is unlikely that its DNA sequence spans an intron. Since even a single peptide is enough to identify an entry in the database the present method is ideally suited to scan large DNA databases. No assumptions about coding regions and reading frames need to be made. Work is in progress to determine if even insertion and deletion errors can be compensated for within a single peptide sequence. Lately randomly sequenced cDNA libraries have attracted much interest. These so-called Expressed Sequence Tags or ESTs, are short pieces of randomly sequenced cDNA from cDNA libraries of different tissue [Adams 19931. It is estimated that most human genes are already partially sequenced and that most of the remaining ones will be added in just a few years. If a protein can be matched to its corresponding EST, homology searching and cloning become trivial. (At least when compared to a situation were only 10 - 15 amino acids are known.) Matching the ESTs can only be done at the single peptide level because ESTs cover only a short piece of the total sequence (about 70 - 100 amino acids using current technology [Adams 19931). Minimal information about each peptide - such as 2 -3 consecutive amino acids, together with its molecular weight - is needed to perform the matching. If only one such peptide is contained in the EST, the protein can be cloned.

5.7 Recommendations and Prospects Whenever a peptide yields partial sequence data in addition to molecular weight data it is useful to search the databases by the combination of that data. When Edman degradation is used the mass of the peptide should be determined in advance. With partial sequence information of just two to three amino acids the peptide, if known, can be retrieved from the database. MSMS information should be used in a similar way. Before starting time consuming sequence interpretation a run of sequence ions should be identified. These few amino acids provide information about part of the sequence. The masses of the lowest and highest fragment ion yield the molecular weights of the unsequenced parts of the peptide. Together, this information is usually enough to find the protein in the database. The peptide can then be positively identified by matching the complete MSMS spectrum against the one predicted from the proposed sequence, The procedure is specific enough to find peptides even with amino acid substitutions or post-translational modifications. The power of the technique makes it applicable to the interpretation of unpredicted peptide masses in apeptide map of a protein for which the sequence (i.e.a cDNA sequence) is already at hand. Again two to three amino acids will locate the peptide in the protein, and the remaining mass difference can be explained by a modification or sequence change in one part of the peptide. It is possible to search by a combination of several peptide masses and some partial sequence; however, the author recommends that each peptide should be searched separately.

244

M.Mann

The information in even a very ambiguous amino acid pattern is usually enough to find the peptide in the database. Furthermore,when severalpeptides are present it is less straightforward to search DNA databases. Running the searches independently for each peptide also increases the confidence in the results of the search as the correct protein will be the top match in several independent peptide searches.

6 Conclusion Database searching with protein molecular weights, peptide molecular weights and a combination of peptide molecular weight and partial sequence information is a powerful technique for the identification of proteins. It will become increasingly popular as instrumentation for obtaining high quality MS and MS/MS spectra becomes more widely available. Even using currently widespread technology such as Edman degradation and MALDI mass spectrometry in combination, tremendous gains can be made. For protein analysis ESMS remains the method of choice and the high mass accuracy of that technique should lend itself to useful database searching. At the peptide level MALDI and in particular MALDI with a reflector instrument hold the most promise for the identification of proteins with high throughput and with very high sensitivity. In some years hundreds of proteins will probably be identifiable in a single laboratory using standard protocols and software. Partial MS/MS data is the most potent tool in finding a protein in sequence databases. Currently, the instrumentation needed is only available in specialized laboratories, but the technology needed has already largely been developed. MALDI M S N S in particular is a natural extension to MALDI MS as the same sample preparation technique is used and there is no large increase in measurement time. It is currently being developed in laboratories such as the author’s and will soon be used routinely. The approach developed here allows finding peptides in large sequence libraries. Often, however, the problem will be to correlate a particular mass to a given protein of which the DNA sequence is already known. Many of the techniques described above, especially the ones of paragraph 5 , are also applicable to that problem.


245

7 References Adams, M. D., Kerlavage, A. R., Fields, C., Venter, J. C. (1993) Nuture Genet. 4, 256. Feistner, G. J., Faull, K. F., Barofsky, D. F., Roepstorff, P. (1994) Biological Muss Spectrometry, to be submitted. Perspectives for Mass Spectrometric Peptide and Protein Charting. Fenn, J. B., Mann, M., Meng, C. K., Wong, S. F.,Whitehouse, C. M. (1989) Science 246,647 1. Electrospray Ionization for the Mass Spectrometry of Large Biomolecules. Henzel, W. J., Billeci, T. M., Stults, J. T., Wong, S. C. (1993) Proc.Nut.Acud.Sci. USA 90,501 1. Hillenkamp, F., Karas, M., Beavis, R. C., Chait, B. T. (1991) Anal.Chemist. 63,1193A. Herjrup, P. (1990) General Protein Mass Analysis (GPMA), a convenient program in studies of proteins by mass analysis. IFOS V, Chichester,Wiley & Sons. James, P., Quadroni, M., Carafoli, E., Gonnet, G. (1993) Biochem. Biophys. Res.Com. 195, 5- 8. Protein Identification By Mass Profile Fingerprinting. Karas, M. and Hillenkamp, F. (1988) Anulyt.Chemist. 60, 2299. Mann, M. (1993) Posterpresented at the 7th Symposiumof the Protein Society, Sun Diego. Identification of Proteins in Sequence Databases Using Molecular Weight and Partial Sequence Data. Mann, M. (1993). Proceedings of the 41st ASMS Conference on Muss Spectrometry und Allied Topics, Sun Francisco, USA.Searching DNA Databases by Mass Spectrometric Information. Mann, M., Hejrup, P., Roepstorff, P. (1993) Biolog. Muss Spec. 22,338. Use of Mass Spectrometric Information for the Identification of Proteins in Sequence Databases. Meng, C. K., Mann, M., Fenn, J. B. (1988) Zeitschr. Physik D, Atoms, Molecules and Clusters 10,361-368. Of Protons and Proteins. Merrtz, E., Vorm, O., Mann, M., Roepstorff, P. (1994) Biolog.Muss Spect. 23,249-261. Identification of Proteins in Polyacrylamide Gels by Mass Spectrometric Peptide Mapping Combined with Database Search. Pappin, D. J. C., Hejrup, P., Bleasby, A. J . (1993) Current Biology 3,327. Roepstorff, P., Fohlman (1984) Biomed.Muss Spec. 11,601. Proposed Nomenclature for Sequence Ions. Vorm, O., Mann, M. (1994a) J.Arn.Soc.Muss Spec., in press. Improved Mass Accuracy in Matrix Assisted Laser DesorptiodIonization Time-of-Flight Mass Spectrometry of Peptides. Vorm, O., Roepstorff, P., Mann, M. (1994b) AnuLChem.,submitted. Matrix Surfaces Made by Fast Evaporation Yield Improved Resolution and Very High Sensitivity in MALDI TOF. Yates, J. R., Speicher, S., Griffin, P. R., Hunkapiller, T. (1993)AnuLBiochem.214,397-408. Peptide Mass Maps: A Highly Informative Approach to Protein Identification.

Acknowledgment: The work pursued by the author and his group at EMBL began in Professor Roepstorff's group at Odense University in Denmark. Many of the developments described here result from collaborations with that group.


VI.3 Program Packages for Personal Computers Bernd Eiserrnann and Helmut E. Meyer

1 Introduction In molecular biological and protein sequencing laboratories the computer is one of the most powerful tools supporting cloning and sequencing projects. It is used to handle and interpret laboratories' own data and the flood of data being generated by the Human Genome Project and its companion efforts on model organisms from worm to mouse [Ken 19931. Entries in nucleotide and protein sequence databases have been growing exponentially during the past ten years from 10000 to 120 000 entries a year. While the data accumulate more and more, the researchers want information about the proteins and genes they are studying. They want to see whether they are working on a new protein or gene or if there are any related ones with similar functions. One main problem is the spreading of the information over nearly 50 different databases and that they are working with many different data formats not compatible to each other. In this article we want to give an overview of the databanks software packages commonly available to gain access to the required information.

2 Overview of Protein and DNA Databanks GenBank: GenBank is the main DNA database of the USA, compiled at the National Center for Biotechnology Information (NCBI) of the National Library of Medicine, a department of the National Institutes of Health (NIH) [Benson 19931. EMBL Databank: The European counterpart to GenBank as a DNA databank is the Heidelberg-based European Molecular Biology Laboratory (EMBL) data library [Rice 19931. NBRF-PIR Protein Database: The NBRF-PIR protein database (National Biomedical Research Foundation-Protein Identification Resource, USA) is produced cooperatively by the Protein Information Resource (PIR), the Martinsried Institute for Protein Sequences CMIPS, Germany), and the Japan International Protein Information Database (JIPID, Japan) [Barker 19931. SWISS-PROT Protein Sequence Database: SWISS-PROT is another protein sequence database including sequence data, indices and documentation [Bairoch 19931. Besides these four main databanks there are a number of smaller specialized databases. R. Kellner, F. Lottspeich, H.E. Meyer (1994) Microcharacterization of Proteins, VCH Weinheim

248

B. Eisermann and H.E. Meyer

3 What a Program for Sequence Analysis Should Do Today, numerous program packages for protein and DNA sequence analysis are available. Some of them are specialized for distinct purposes like creating restriction maps or the multiple sequence alignment of DNA or proteins. These diverse programs work with very different routines. If such programs are not worked with daily or only from time to time, these very different routines cause problems. Therfore, for irregular use an integrated program package for databank retrieval, sequence analysis and management is the best choice. Clearly, the four main databanks mentioned before should be accessible. In a compressed form they are stored on one or several CD-ROM disk(s) together with some smaller databanks. To handle the data deluge certain program packages already need a second CD-ROM drive to be able to swap from one CD-ROM drive to another during a session. Special tools are required for analysis of DNA or protein sequences. Programs for protein sequence analysis should include: Determination of the amino acid composition Molecular weight calculation Sequence comparison of two sequences Dot matrix plot The ability to zoom into a particular region Sequence alignment of two or more sequences Graphical and text output of - hydropathy - antigenic determinants - alpha helix and beta sheet potential - net charge - amphipathy - PEST sequences Prediction of possible glycosylation sites Prediction of possible phosphorylation sites Calculation of the isoelectric point Reverse translation of protein sequence into DNA sequence or vice versa Programs for DNA sequence analysis should include: Composition analysis Open reading frames analysis Translation into protein sequence Restriction maps - with table of restriction data - with display of restriction data as maps - showing simulated gels of digests Create circular restriction maps Drawing plasmids Planning of cloning experiments

VI.3 Program Packages for PCs

249

Planning of PCR experiments Sequence comparison of two sequences Dot matrix plot The ability to zoom into a particular region Recognition of inverted repeats Sequence alignment of two or more sequences With DNA and protein sequences, homology searches should be possible with a scan function for a fast search of sequences matching with 100 %, lasting only seconds. For a sequence homology search allowing mismatches a program analogous to the FASTA program, working with the algorithm of Pearson and Lipman [Pearson 19901, should be possible in an acceptable time. Annotations should be retrieved by: Entry name Accession number Authors Citation Journal name Organism Free text

4 Available Program Packages Services accessible on-line via Win-Net:

1) HUSAR 2) MIPS 3) BLITZ E-Mail Server 4) BLAST E-Mail Server Program packages for personal computers:

1) DNASIS/Prosis 2 ) GenePRO 3) GeneWorks 4) CD-SEQ 5) ATLAS 6) Entrez 7) MassMap

250


4.1 On-Line Services Since 1990 Win-Net has connected all universities in Germany and some scientific institutes as well, and therefore the direct dialog between them is possible. This service is free at the moment. Win-Net is also the launching net for the European 1x1-Net (International X.25 Infrastructure) and the international Internet. Thus it is possible to get connected to many networks worldwide. To get access to Win-Net, a modem connecting to the local area network is needed. In Germany, on-line interactive datasearch is possible at two places: at the Martinsried Institute for Protein Sequencing (MIPS) and at the Heidelberg-based Unix Sequence Analysis Resources (HUSAR). Using an E-mail server you can gain access to the BLITZ E-mail server at EMBL. Heidelberg, Germany, or to the BLAST E-mail server at the NCBI, NIH, Bethesda, U.S.A. Using on-line services has two great advantages: (i) the integrated sequence libraries are regularly updated, some of them weekly or even daily, so the sequence data are as up to date as possible; (ii) at the same time the search can be done very fast since the algorithms and the hardware used in the local area networks are optimized for managing a high data throughput. To use the on-line search facility at MIPS and HUSAR, the user must be registered in order to be given the username needed to log in to the system. After registration the user get the user’s guide describing the different programs and routines running on the main frame computer. Both institutions offer courses to introduce new users to the complex program features.

HUSAR The HUSAR program package offers a lot of different sequence manipulation tools. However, in most cases external program users want to use the on-line datasearch. At HUSAR the search can be done interactively. Using the FASTA program for homology searches the program needs between 2 h and overnight, depending on the network load.

MIPS MIPS includes many tools to retrieve and analyze protein and DNA sequences. The interactive on-line datasearch, mostly wanted by external users, lasts between 10 min and 2 h, depending on the length of the peptide. For a fast search you can use the SCAN routine, which takes only seconds to sweep through the databases. In this mode, only 100% matches will be found. In comparison to HUSAR the search at MIPS is faster and the searching routine is easier to use. The program packages at HUSAR and MIPS offer a lot more sequence manipulation tools, but in most cases they are difficult to handle in the interactive mode or they are only accessible by users in the institutes themselves.

BLITZ E-Mail Server Another way to scan several databases is using an E-mail server. The EMBL-based BLITZ is an automatic electronic mail server for the MPsrch program [Sturrock 19931. MPsrch allows sensitive and fast comparisons of protein sequences against the SWISS-PROT protein sequence database using the Smith and Waterman algorithm [Smith 19811 for the best local similarity. A typical search time for a query is approximately about 40 sec to


251

search the entire SWISS-PROT database. Additional time is required to reconstruct the alignments. MPsrch is the fastest implementation of the Smith and Waterman algorithm currently available on any machine. To use the EMBL BLITZ server a formatted electronic mail message is sent to BLITZOEMBL-Heidelberg.DE containing the sequence for the query. The answer will automatically be sent to the user. The manual for the BLITZ E-mail server will be sent using the HELP command. If there are any problems using the BLITZ service a mail message to NETHELPOEMBL-Heidelberg.DEwill give help. To obtain a complete copy of the matching sequences from the SWISS-PROT database, use of the EMBL file server is required. Sending a mail message to NETSERVaEMBL-Heidelberg.DE containing a command with the required accession number or an entry name gives access to the complete sequence information. Using the HELP command, some introductory information can be obtained. The file server offers the lastest sequence data, several other databases and free software for molecular biology.

BLAST E-Mail Server Similar possibilities are offered by the NCBI-based BLAST E-mail server. The NCBI BLAST E-mail server permits users to send a specially formatted mail message to [email protected] containing a nucleotide or protein query sequence to the BLAST search server at the NCBI. A BLAST search is then performed against the specified database using the BLAST algorithm [Altschul1990], and the results are returned in an Email message. Available databases are the four major databases and a set of several smaller ones. To receive the current set of instructions on using the BLAST E-mail server, a HELP message must be sent to the normal BLAST E-mail server address. Complete sequence records can be retrieved either by locus name or by accession number from the NCBI RETRIEVE server. To obtain full instructions for using the RETRIEVE server, a HELP message to [email protected] must be sent.

4.2 Program Packages for Personal Computers Nearly all of the sequence manipulation tools can be handled much more comfortably at one's own personal computer. The sequence information needs only to be extracted from the databases and copied to a file on the hard disk. After conversion of the transferred file into the right format the desired analysis can start.

DNASIS/PROSIS DNASIS/PROSIS is a comprehensive package for DNA and protein sequence analysis which can be used on an IBM or compatible or on a Macintosh computer. With an additional CD-ROM drive a direct search in databases can be performed. Available databanks for this program package are the Genbank and EMBL DNA databanks and the PIR and SWISSPROT protein databanks. DNASISPROSIS allows the use of a wide range of input and editing functions on DNA, RNA or amino acid sequences. A translation editor is available and subsequence search can be done. A helpful tool is the automatic readback option of an edited sequence by a voice synthesizer making it easier to find input failures. DNASIS includes the option to make use of a semiautomatic digitizer that allows the reading of bands of a sequencing gel indicating the input points by touching an ergonomic stylus pen to the digitizing surface.

252


DNASIS allows comparison and alignment of multiple DNA, RNA and protein sequences. A dot matrix homology plot is possible and it is capable of zooming in and out in the dot matrix homology plot graphic. Homology searches against GenBank, EMBL, SWISSPROT, NBRF-PIR or the user's database can be done. To manage a sequence project, the contig manager handles and maintains DNA sequencing projects. It automatically assembles sequences of DNA fragment data into a consensus sequence giving a strategic view. For restriction site analysis DNASIS offers a wide range of capabilities for mapping DNA as circular restriction maps, linear restriction maps, restriction fragment maps and allows the creation of new constructs. DNASIS provides the user with a wide range of secondary structure analysis tools, including RNA secondary structure calculation. For protein structure DNASIS offers an enhanced protein analysis package using color graphics output. Pictures of these graphics can be saved in PICT files and used for further editing by graphics software. Protein analyses include isoelectric point calculation, charge distribution and hydropathy score calculation as well as the amino acid composition and molecular weight determination. As system requirements for DNASIS an IBM PC/AT or fully compatible computer is necessary. The operating system must be MS-DOS version 3.1 or higher. The memory should be more than 640 KB. One or two floppy disk drives and a hard disk with aminumum of 10 MB for the program is recommended. However, a hard drive with gigabyte capacity will improve the speed of homology searches drastically. The MacDNASIS version runs on any Macintosh computer. The operating system should be system 6.0.5 or higher and a minimum RAM of 2 MB is needed with system 6 and 4 MB with system 7. The program price is DM 4200 and there is no university discount price. The CD-ROM contains the main molecular biology databases used for sequence homology searches. The data are conveniently accessible for fast homology searches of the GenBank, EMBL, NBRF-PIR and SWISS-PROT nucleic acid and protein databases. The CD-ROM is issued every three months at a cost of DM 490 per release and a oneyear subscription is available for DM 1250. There are no special rates for universities. The program is easy to install and it can adjusted to the computer enviroment. While the editor is of a clear shape, some of the analyzing features are illogically placed in different menues. For example, the submenu SEARCH contains different functions to analyze nucleic acids and the function to make a hydrophobicity plot of a protein. Thus, a long time is required to become familiar with the program. The multialignment function is very time-consuming needing hours to calculate one alignment. The prediction of secondary structures in RNA molecules can last a whole night [Templin 19931. DNASIS is a program package that is useful for most of the typical tasks which occur in DNAor protein research. The sequence editor allows many input and editing functions and is clearly organized but the speed of some functions is not acceptable. One of the strong features of the program package is the sequence comparison with the CD-ROM based databanks, even if there is no connection to an on-line service. GENEPRO This program allows retrieval and analysis of protein and DNA sequences on an IBM or compatible personal computer. GENEPRO will list all sequences that share some identity


253

with a probe sequence almost instantly from either GenBank or the PIR databank. Alternatively, GENEPRO is able to compare a protein sequence with all the sequences in a DNA database (such as GenBank), translated on the fly into all six reading frames. However, to get this information in an acceptable time, a fast 486 PC is required; otherwise thedatabanksearchmust bedone overnight. TheFASTSEARCHroutineallows comparison of a sequence with an entire database in less than a minute in the case of complete identity. There are many analysis functions for DNA and protein sequences. It is possible to make hydropathy plots, to determine the antigenicity and amphipathy of a protein or to detect PEST sequences. Furthermore, possible glycosylation sites can be predicted. Addional features are homology searches, sequence alignment, randomize sequence and dot matrix plots. GENEPRO allows the user to manage DNA sequencing projects by finding overlaps between sequence fragments and ending up with with a consensus sequence. The optional GENEPRO digitizer allows DNA sequencing gels to be read. The digitizer is an electronic device that reads DNA sequence data into a data file of a personal computer semiautomatically. As basic system requirement a IBM 386 computer or higher is required. A minimum of 640K RAM is needed and the program will run under MS-DOS 5.0 or higher. In addition, a CD-ROM drive is needed, because from the beginning of 1994 the databases and the index files are delivered on CD-ROM. The price ofr the program package will be US $3 000. For universities there is a discount price of US $2 000. A one-year subscription is available for US $995. The different menus of GENEPRO are structured logically and work fast. The great disadvantage of the program is the time a search lasts. Because every record of a database is shown during a search, much of the time is spent on the display output. A search algorithm analogous to the FASTA routine [Pearson 19901 would accelerate the search at least up to 3 to 4 times. On the other hand, the program is not expensive compared to others.

GeneWorks GeneWorks is a program for Mac computers thatallows visualization and manipulation of sequence data. GeneWorks includes a broad set of sequence analysis functions as multiple sequence alignment and the ability to make dot matrix plots. GeneWorks permits the databank searching in the GeneBank, EMBL and SWISS-PROT data library. It scans the sequence databanks by author, keywords, organisms or phylogenetic classification. The sequence data search can be done by sequence similarity, sequence patterns and motifs and amino acid or base composition. By providing a large collection of drawing tools, GeneWorks allows production of graphics in color even as printouts. To run GeneWorks a Mac operating system 6.0.5 or higher is required. If a Mac I1 is used a system 6.0.7 or 7.0 is needed if the sound capability is desired. The hard disk must have a minimum of 20 MB disk space capacity.3 MB of RAM is also required. A CDROM drive for the databanks is needed as well. GeneWorks costs US $3 950 and the CD-ROMs including software update are available for US $ 1100 a year. An optional sonic digitizer for the automatic reading of sequencing gels is available for US$ 2000. GeneWorks is a sequence analysis package whose great advantage is in the displaying of nucleic acid and protein data. The high-quality diagrams and graphics are easy to generate.


254

Program (CD-SEQ

jWISS-PROT

I+

(Atlas

1-

Mass Map

Entrez

!+

!+

+

+

2enBank iardware

IBM 3 8 6

IBM or MAC, ULTRIX IBM 286 (RISC), SUN, VAX/VMS

3perating system

MS-DOS 3.1 not 5.0

MS-DOS 3.0 ULTRiX 4.3. VAXNMS 5.5

ninirnum RAM 1640 KB iarddisk: Space for t h e / + 3rogram

1+

or

MAC iBM Windows 3.1

MS-DOS 3.1

l2MB

MS-DOS 5.0

2MBl2 ME (100 ME

iarddisk: Space ecommended for j a t a b as es

1 GB

200 ME

3dditional Hardware

2 CD-ROM

1 CD-ROM

1 CD-ROM better 2 CD-ROM

1 CD-ROM

'rogram

included

inciuded

included

DM 5.000.00

price

Jniversity discount ,rice

1 GB

-

=rice Update

DM 800,00/year

DM 1.200,OO / year

$ 400,OO /year

7

Iistributor

EMBL Data Library

MIPS at Max-Planck-lnstit. for Biochemistery

National Library of Medicine

Finnigan MAT GmbH

Street

P.O. Box 102 2 0 9

Am Klopferspitz

National Institute of Health Bldg.38A 8600 Rockville Pike

P.O. Box 1 4 0 1 6 2

rown

D-69012 Heidelberg Germany

D-82152 Martinsried Germany

Bethesda, MD 2 0 8 9 4 U.S.A.

D-28088 Bremen Germany

FAX.: 06221-387519

FAX.: 089-85782655

FAX.: (202)512-2233

FAX.: 0421-5493396

VI.3 Program Packages for PCs Program HUSAR online service

MIPS online service

BLAST E-mail server

SWISS-PROT

+

+

+

EMBL

+

+

+

VT-100 Terminal + Modem

VT-100 Terminal + Modem

access t o WIN-net or

li ardware Operating system

BLITZ E-mail server

I+

I

access t o WIN-net or Internet

-

minimum RAM Harddisk: Space for the oroaram Harddisk: Space recommended for databases additional Hardware

I

I

-

Program price

DM 250,OO per 1 / 2 year

University discount price

-

255

I

at the moment no fees

no fees

no fees

Price UDdate Distributor

German Cancer Research Centre

MIPS at Max-Planck-lnstit. for Biochemistery

NCBl at the National Library of Medicine

EMBL Data Library

Street

P 0.Box 101 9 4 9

Am Klopferspitr

National Institute of Health Bldg 38A 8 6 0 0 Rockville Pike

P.0.Box 10 2 2 0 9

Town

0-69009 Heidelberg Germany

D 821 52 Martinsried Germany

Bethesda, MD 2 0 8 9 4 USA

D-69012 Heidelberg Germany

ZP

Tel.: 06221-484372 FAX.: 06221-401271

Tel.: 089-85782657 FAX.: 089-85782655

E-mail: blastQncbi.nlm.nih.gov retrieveQncbi.nlm.nih. gov

E-mail: NETHELPQEMBLHeidelberg.DE [email protected]

256

B. Eisermann and H.E. Meyer ;enepro

DNasis

L

c

Sene Works

L

c

+

t

+ Hardware

BM 286

IBM

Operating system

IS-DOS 5.0

MS-DOS 3.1

540 KB minimum RAM Harddisk: Space for the L Droaram Harddisk: Space recommended for databases ladditional Hardware Program price

+ or

640 KB 1 0 MB

MAC MAC

MAC 6.0.5

6.0.5. or later

2-4 MI 3 MB 20 MB

I GB

1 GB

1 GB

I CD-ROM

1 CD-ROM

1 CD-ROM

L

DM 4.200,OO

$ 3.950,OO

3.000,OO

6 2.000,OO Price UDdate Distributor

6

DM 1.200.00/vear

9 1.1 00,00/vear

IIVERSIDE Scientific 3terDrises

EuroSciSoft L t d

IntelliGenetics

I5705

2 Britannia Centre

Amoco Laan 2

3ainbridge Island NA98110 J.S.A

Point Pleasant Tyne and Wear GB-NE28 6HQ

B-2440 Gee1 Belgium

re[.: 2 36) 84 2-94 9 8

Tel.: 0044-91 /2953000 FAX.: 0044-9 1 / 2 95303 0

Tel.: 0032(3)2195352 FAX.: 0032(3)2195354

995.00/vear

'oint Monroe Drive N.1

~

: 'AX.: :236) 842-9 5 3 4

~~


257

In comparison to other program packages GeneWorks is expensive, and if the graphics support of the program is not required other program packages will do the job as well.

CD-SEQ CD-SEQ is a CD-ROM retrieval software for MS-DOS systems. It enables a range of text queries on the EMBL nucleotide and/or SWISS-PROT protein databases on the CD-ROM. It allows one to search for entry names, accession numbers and the name of authors, as well as the search for citations and organisms. In the free text line several keywords can be connected by the logical keys , and <not>. After the search a results screen will be displayed if there are any hits. In this screen a list of the current results from the latest query is displayed. Each hit is described by an entry name and a one-line description. It is possible to browse through the hits and bring the selected entry to the screen. With the CD-SEQ software the EMBLScan program is also delivered. This is an MSDOS program for a quick search in the EMBL nucleotide sequence database for sequences that are very similar to a particular DNA sequence. EMBLScan is not suited for finding distantly related sequences . Sequences with a minimum of 20 nucleotides up to 3000 base pairs can be used for the homology search. For hardware an IBM or compatible MS-DOS PC with at least 640 KB RAM, one floppy drive and a hard disk is required. From 1994 two CD-ROM drives will be necessary, or a free hard disk capacity to hold either indices (180 MB) or database files. CD-SEQ will be configurable to allow location of index and database files on either CD-ROM or hard disk. If only one CD-ROM drive is available to the system it is advisable to move the indices to hard disk, rather than the database files, because they are much smaller. The required operating system is MS-DOS 3.1 or higher. The price for the yearly update is DM 800 and this price includes the price for the program. CD-SEQ is a good tool for text search in databases and works really fast. A search through both databases lasts only a minute. The EMBLScan program is to insensitive for a database screening program and other programs will do the job more comfortably. A version of the program for Mac personal computers is also available.

ATLAS The ATLAS retrieval system, developed by the NBRF, is a multidatabase information retrieval program specifically designed to access macromolecular sequence databases. The program has been configured to provide simultaneous retrieval from all or a subset of the databases on the CD-ROM. The ATLAS CD-ROM contains the ATLAS retrieval system, the FASTA database searching program, the PIR international protein sequence database, the MIPS PATCHX merged protein sequence database, the N R L 3 D sequence structure database, the ALN protein sequence alignment database, and the JIPID E. coli database. The ATLAS retrieval system allows searching through all major protein databanks with over 120000 entries at the moment. The sequence search can be done in different ways. The Scan mode is a rapid search for identically matching protein segments (up to 30 amino acids), that lasts only seconds. If mismatches are to be allowed, the FASTA routine is needed. However, transfer of the sequence data to a hard disk is recommended if such a search is to be performed in a reasonable time. The commands of the ATLAS program can be classified into several categories according to function:

258


The text search commands provide the primary retrieval capabilities of the ATLAS program by searching the term indices. Some of the text search commands are accession numbers, authors, gene name, journal citation, keyword, and reference numbers. Several display commands are designed to display database information. Some file interface commands such as "copy" and "print" allow export of nformation derived from the ATLAS program to external files. The ATLAS program package operates on VAX/VMS, ULTRIX(R1SC) SunOS systems and PC/DOS and Macintosh systems as well. The price of the yearly update is DM 1200 and includes the program package. The ATLAS program package is a tool for fast searching of DNA and protein sequences in databases. It is fast, especially the Scan mode and even with theFASTA routine the result is obtained in an acceptable time. The CD-ROM is updated quarterly and is distributed in Germany by MIPS.

Entrez Entrez is a molecular sequence retrieval system developed at the NCBI. It provides an integrated approach for gaining access to nucleotide and protein sequence information, to the MEDLINE citations in which the sequences were published, and to a sequenceassociated subset of MEDLINE. The sequence records are derived from a variety of database sources, including the four main databases located on CD-ROMs together with the retrieval software. It can also be installed on multiple computers, as well as on Macintosh or IBM or compatible, and on local networks if desired. A special tool of the MEDLINE documents and sequence records is the value concept called "neighboring". After an initial query, neighboring allows a user to find references or sequences related to a given paper or sequence. The neighbors are precomputed using special algorithms that relate records within the same database by statistical measurements of similarity. In addition to that, there are links to any protein or nucleotide sequence published in the article. The pre-computed neighbors and hard links are stored on the CDROM along with the databases and retrieval indices. The Entrez databases are distributed on two CD-ROM disks, one containing the sequences and one the references. When Entrez is used with both disks in a single session,the user has to swap disks if there is only a single CD-ROM drive installed. The update includes the program price and is available for US $400 per year.

MassMap A very special program to identify a protein is MassMap; this connects data yielded from mass spectrometric measurements with the known protein structures published on the Entrez CD-ROM. From a proteolytic digest of a small amount of a protein 4 or 5 peptide masses using matrix-assisted laser desorption ionization (MALDI) [Karas 19881 or electrospray ionization (ESI) mass spectrometry [Dole 19681are provided. This molecular weight fingerprint information allows the MassMap software to search the database of mass values calculated from a comprehensive database of protein sequences and to identify the protein [Pappin 1993; Yates 19931. Experimental errors of 2 - 3 Da are tolerated by the scoring algorithm. In many cases, this procedure can be used as an alternative to protein sequencing to confirm the identity of proteins. The MassMap software score system allows the user to discriminate between even related proteins.


259

A successful match in the databases generates a report window containing the complete text of the database entry together with a tabular or graphical comparison of experimental and calculated masses. MassMap runs on an IBM or compatible computer under Microsoft Windows 3.1. The source sequence database used is on CD-ROM and distributed by NCBI. This currently contains about 140 000 protein sequences drawn from GenBank, EMBL, SWISS-PROT, PIR and several smaller databases. For each entry in the NCBI database, the MassMap software provides both the whole sequence molecular weights and calculated molecular weights for complete digests using a range of reagents. To store this set of data a harddisk of 1 GB is recommended. The program price is about DM 5 000. The MassMap software opens up new possibilities in fast identification of proteins to the users of MALDI or ESI spectrometers. A mass fingerprint information is easily obtaine with a small amount of protein. The high percentage of correct matches and the easy handling of the software makes it a good tool for protein chemists.

5 References Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J. (1990) J.Mo1.Biol. 215,403-410. Basic local alignment search tool. Bairoch, A., Boeckmann, B. (1993) Nucleic Acid Res. 21,3093-3096. The SWISS-Prot protein sequence data bank, recent developments. Barker, W.C., George, DG., Mewes, H.W., Pfeiffer, F., Tsugita, A. (1993) Nucleic Acid Res. 21,3089-3092, The PIR-International databases. Benson, D., Lipman, D.J., Ostell, J. (1993) Nucleic Acid Res. 21,2963-2965. GenBank. Dole, M., Mack, L.,Hines, R.L., Mobley, R.C., Ferguson, L.D., Alice M.B (1968) J. Phys. Chein. 49,2240-2249. Molecular beams of macroions. Karas, M., Hillenkamp, F. (1988) AndChem. 60,2299-2301. Laser desorption ionisation of proteins with molecular masses exceeding 10 000 Da. Kerr R.A. (1993) Science 262,502-505. Managing the Genome data deluge. Pappin D.J.C., Hojrup P., Bleasby A.J. (1993) Curr.Bio1. 3,327-332. Rapid identification of proteins by peptide-massfingerprinting. Pearson W.R., Lipman D.J. (1990) Proc.Nat1.Acad.Sci. USA 85,2444-2448. Improved tools for biological sequence analysis. Rice, C.M., Fuchs, R., Higgins, D.G., Stoehr, P.J., Cameron, G.N. (1993) Nucleic Acid Res. 21,2967-2971. The EMBL data library. Smith, T.F., Waterman, M.S. (198 1) J.Mol.Bio1. 147,195-197. Identification of common molecular subsequences. Sturrock, S.S., Collins, J.F. (1993). MPsrch version 1.3. Biocomputing Research Unit, University of Edinburgh, UK. Templin M. (1993)Nnchr.Chenz.Tech.Lab.41,1004-1008.DNA-Sequenzanalyseim Paket. Yates 111 J.R., Speicher S., Griffin P.R., Hunkapiller T. (1993) Aizal.Biochenz. 214.397408. Peptide mass maps: A highly informative approach to protein identification.

Microcharacterizationof Proteins by R. Kellner, E Lottspeich & H. E. Meyer 0 VCH Verlagsgesellschaft mbH, 1994

Index absorption coefficient 119 acceleration voltage 152, 163 accession number 217 accuracy 161 acetonitrile 3 1 acrylamide 63 acrylamide monomer 15 activation energy 191 acylaminoacid releasing enzyme (AARE) 17 additive 18, 24 adsorption 11 affinity capture 164 alignment 219 alkylation 11, 14 Amido Black 12 amino acid analysis 93 from blots 85 amino acid composition 15, 93 ammonium acetate 32, 158 ammonium bicarbonate 17, 158 amplifier 154 aniline 117 anilinothiazolinone (ATZ) amino acid 118 annotation 249 artefact peak 19 ATLAS 210, 216, 257 atmospheric pressure ionization (API) 175 autohydrolysis 18 autosampler 33, 176 average mass 194 back pressure 3 1 biocompatible 34 biopolymer analysis 149 blank sample 19 blot sandwich 76 blotting efficiency 81 blotting membrane 82 BNPS-skatole 16, 24

borontrifluoride etherate 120 n-butyl (C4) 34 by-products 101, 117 calibrant 157 calibration 154 capillary column 33 capillary electrophoresis (CE) 47 capillary LC 36 capillary surface 52 capillary zone electrophoresis (CZE) 49 carbohydrate content 157 carboxypeptidase 17, 18 carrier ampholyte 69 Cathepsin C 17 cathode buffer 67 CD-ROM 215, 248 CD-SEQ 257 CE-MS 55 chaotropic salt 12 charge density 169 chloramine T 99 chromatography affinity 30 anion exchange 29 cation exchange 29 gradient systems 31 HPLC 29 hydrophilic interaction (HILIC) 30 hydrophobic interaction (HIC) 30 ion-exchange 29 normal phase 30 partition 30 perfusive-particle 35 reversed-phase (RPC) 30 size-exclusion 29 solvents 31 chymotrypsin 16 classification of proteases 17 Cleland' s reagent 13 Cleveland 13 client-server software 218

262

Index

client/server mode 215 cluster ion 155 coating 52 collision-induced decomposition (CID) 193, 197 computer science 2 15 computer search 162 continuous flow (CF-)FAB 190 convection 47 conventional LC 36 conversion 117 conversion dynode 154 Coomassie Blue 11 Coulomb force 169 counter electrode 168 countercurrent flow 169 cross-linker 66 a-cyano-4-hydroxycinnamic acid 151 cyanogen bromide (CNBr) 16, 19 cystine 13 cytochrome C 157 dabsyl chloride 102 dansyl chloride 103 data collection 154 data organization 21 1 data processing 21 1 database 209 deconvolution 171 dehydroalanine 135, 145 denaturation 13, 64, 69 derivatization 97 post-column 97 pre-column 97 desalting 11 desamidation 22, 168 desorption 150 detergent 11 dextran 161 dextrin 161 diethylene triamine-penta-acetic acid 157 diffusion coefficient 49 digital oscilloscope 154 2,5-dihydroxy benzoic acid (DHB) diode array detector 37 diphenylthiourea (DPTU) 117

151

disaggregation 13 discontinuous buffer system 52 discontinuous electrophoresis system 64 disulfide bond 13 disulfide bridges 11 disulphide bond 194 dithiothreitol (DTT) 13 DNASISPROSIS 25 1 E-mail server 250 BLAST 250 BLITZ 250 EC code number 17 Edman degradation 1 17 EDTA 18 Einstein-Nernst equation 49 electrical field strength 47 electroblotting 12, 69, 76 buffer 80 electrochemical reaction 79 electrode plate 78 electroelution 68 electrolyte system 54 electron multiplier 154, 171 electronic mail query 2 15 electroosmotic flow (EOF) 50 electrophoretic velocity 50 electrospray ionization (ESI) 167 electrospray source 168 electrostatic interaction 29 EMBL Data Library 218, 247 EMBnet 215 endoglycosidase H 157 endoproteases 17,94 endoproteinase Arg-C 16 endoproteinase Asp-N 16 endoproteinase Glu-C (V8) 16 endoproteinase Lys-C 16, 39 Entrez 258 Enzyme Commission 17 error tolerance 233 ethylacetate I 18 excitation electronic 150 rovibrational state 150 exoprotease 17, 94

Index

false positive 236 farnesyl-cysteine 139 fast atom bombardment 190 FASTA 225, 249 filtration 33 fingerprint 16, 232 fittings 34 flow cell U-shaped 38 Z-shaped 38, 43 1-(9-fluorenyl)ethyl chloroformate (FLEC) 103 9-fluorenylmethyloxycarbonyl group (FMOC) 102 fluorescarnine 101 fluorescent isothiocyanate 119 Fluorotrans 83 formic acid 21 fraction collection 57, 38, 121 micropreparative 57 fragment ation 157 notation 193 fragmentation 11 chemical 11, 19 enzymatic 11, 15 excessive 16 limited 16 MALDI 157 fused-silica capillary 34, 50, 174 ganglioside 161 gas phase ion 169 gel electrophoresis 63 GenBank 211, 247 GENEPRO 252 Geneworks 253 glass fibre membrane 12, 82 Glassybond 83 gluco-arginine 138 glycans 161 glycerol 151 glyco-amino acids 134 glycoconjugate 161 glycoprotein 157 glycoshingolipid 161 gradient, HPLC 32 gradient gel 67

guanidine hydrochloride guard column 34

263

13, 158

heptafluorobutyric acid 3 1 heterogeneity 157, 158 hexafluoroacetone (HFA) 32 high performance liquid chromatography (HPLC) 101 homology search 219, 249 homoserine 21 human genome project 221 HUSAR 250 hydrodynamic flow 5 1 hydrolysis 94 acid 95 alkaline 96 conditions 95 enzymatic 94 gas-phase 95 liquid-phase 95 3-hydroxy picolinic acid 159, 160 4-hydroxy picolinic acid 151 hydroxylamine 22 Immobilon P 83 Immobilon PSQ 83 in matrix 13 in solution 11 in-line filter 34 incubation buffer 18 infrared laser 153 initial yield 121 injection electrokinetic 52 hydrodynamic 52 injector 33 inlet 169 insolubility 11 instrumentation 106, 121 internal sequence analysis 16 Internet 250 intrinsic charge 63 iodoacetamide 14 iodoacetic acid 14, 96 ion acceleration energy 154 ion desorption mechanism 169 ion detection 154

264

Index

ion evaporation 169 ion source 149 ion spray 169 ion-pairing reagent 3 1 ionic strength 29 ionization technique 149 electrospray 149 fast atom bombardment 149, 190 field desorption 149 laser desorption 149 plasma desorption 149 thermospray ionization 149 isoaspartate 24 isobaric amino acids 194 isobaric peptides 181 isoelectric focusing (IEF) 65, 69 isoelectric point (PI) 63 isomerization 22 isopropanol 3 1 isotachophoresis (ITP) 48 Joule heating 47 journal citation 217 kinetic energy 151, 163 Kohlrausch function 48 ladder sequencing 192 lanthionine 145 lantibiotic 135 laser 150, 153 far ultraviolet 150 infrared 150 laser ion source 153 laser power 150 LC-MS coupling 174 leading electrolyte 48 linear TOF 152 lyophilization 11 magnetic media 215 magnetic sector analyzer 151 make-up flow 57 MALDI 149 blot membrane 162 CZE 163 gel electrophoresis 162

ion signal 157 matrix 150, 159 oligonucleotides 151 peptide mixture 162 salt tolerance 158 MALDI sequencing 163, 191 Marfey’s reagent 103 mass accuracy 158, 228, 234 mass analyser 150, 170 fourier transform ion cyclotron resonance (FT-ICR) 150 mass determination accuracy 158 mass range 159, 161 mass resolution 152, 157 mass-to-charge ratio 167 MassMap 25 matrix molecule 150 MEDLINE 258 mellitin 152 2-mercaptoethanol 13, 96 3-mercaptopropionic acid 102 metal contaminant 159 metastable decay 163, 191 metastable fragment ion 163 metastable ion 191 methanol 31, 82 5-methoxy salicylic acid 151 1-Methyl-Histidine 133 MHC peptides 197 micro-column 32 microbore LC 36 microcapillary LC 184, 20 1 microchannel plate 154 microscale 11 migration velocity 63 MIPS on-line system 21 7,250 mobile phase 30, 31 mobility 48, 49 molecular ion 155 molecular sieving effect 63 molecular weight search (MOWSE) 196 monoisotopic mass 194 N-acetyl group 17 N-ethylmorpholine 17

Index

N-terminal blockage 13, 17, 21, 126, 132 isoaspartate 24 Na-borate buffer 8 1 narrowbore LC 36 native technique 65, 70 NBRF-PIR Protein Database 247 NCBI 211 nebulizer gas 169 network access 215 neutral density filter 153 nicotinic acid 151 ninhydrin 99 2-nitro-5-thiocyanobenzoic acid (NTCB) 24 nitrocellulose 12, 18 Nn 36 nonspecific binding 18 notation 193 o-iodosobenzoic acid 24 n-octadecyl(C18) 34 octadecyl 30 n-octyl (C8) 34 oligonucleotides 159 oligosaccharide 161 on-line service 250 open reading frame 220 organic modifier 31 orthophosphoric acid 3 1 orthophthaldialdehyde (OPA) 99 partial acid hydrolysis 22 partial digestion 235 particle size 35 peak broadening 152, 163 peak dispersion 47 peak fractionation 38 peakvolume 36 Pep5 135 pepsin 17 peptide library 179 peptide mapping 16, 162, 184, 195 PeptideMap 227 Peptidesearch 224 pH-stability 35 phenol 96

265

phenylisocyanate (PIC) 128, 192 phenylisothiocyanate (PITC) 101, 117 phospho-serine 140 phospho-threonine 142 phospho-tyrosine 135 phosphorylation 131 photoexcitation 151 photoionization 151 PIR-International 21 1 polarizer 153 polyacrylamide 63 polybrene 123 polyetheretherketone (PEEK) 33 polyisoprenyltransferase 139 polypropylene 82 polystyrene 35 polyvinylidene fluoride (PVDF) 12, 18, 21, 82,96, 126, 162 polyvinylpyrrolidone 18 Ponceau S 13 pool sequencing 180 pore diameter 34 post-column split 174 post-translational modification 131, 168, 195 power supply 47 pre-column split 174 precipitation 11 program package 248 protease 15 protease activity 12, 16 Protein Identification Resource (PIR) 225 protein ladder sequencing 128, 192 protein modification 15 protein superfamily 219 proton tranfer 151 protonation 31 pulsation 32 pulse duration 153 pulse widths 150 Pump reciprocating pistons 32 syringe 32 purity control 177 PVP quenching 19 PVP-40 18

266

Index

4-pyridylethy1)cysteine 14 pyridylethylation 14 pyroglutamate 17 quadrupole ion trap 151 quadrupole mass filter 170 quantitation 11, 105, 126 quasi-molecular ion 169 racemization 22 radio frequency 170 Rayleigh limit 169 recovery of tryptophan 95 reducing agent I 1, 13 redundancy 214, 220 reflection attenuater 153 reflector voltage 191 reflectron TOF 152 repetitive yield 121 resolution 152 resonance absorption 150 retention time 35 retrieval program 216 S-carboxymethyl cysteine 14 sample adsorption 34 sample loss 11, 16, 34 sample preparation 11 MALDI 155 sample recovery 36 sample surface 153 scavenger 96 SDS-PAGE 63 Laemmli 65 tricine 65 search parameter 225 search specificity 233 secondary electron 154 selectivity 58 semidry blotting 78 sensitivity 159, 176 sequence comparison 220 Sequence Database Definition Language SDDL 211 sequence error 229 sequencer Biphasic Column 124

gas phase 123 liquid phase 122 pulsed liquid phase 124 solid phase 123 silanol group 30 silica gel 30 silica media 34 sinapinic acid 151, 158 singly charged dimer 157 soft ionization 167, 190 solid support 30 solubilization 13 solute-wall interaction 51 solvent split 33 spinning cup 122 stacking gel 13 stacking system 53 stationary phase 29, 34 strategy 11, 75, 197 substance P 163 succinic acid 151, 159, 160, 162 succinimide 22 5-sulphosalicylic acid 97 SWISS-PROT 224,247 tandem mass spectrometry 191, 201 tank blotting 76 Taylor cone 169 Teflon 34 terminating electrolyte 48 theoretical plate 49, 51 thermolysin 17 thermospray 169 thioglycolic acid 96 thiol group 14 titanium 34 TOF mass analysis 151 total ion current (TIC) 175, 198 total molecular weight 227 toxicity 21 Trans-Blot 83 transfer yield 82 tributylphosphine 13 trifluoroacetic acid 21, 3 1 trimethylamine 123 triple quadrupole 170, 193 TritonX-100 19, 192

Index

trypsin 16 tryptamine 96 tubing 34 Tween-20 19 ultraspray 169 urea 13, 69, 158 UV absorbance 37 UV transparency 31 UV-detection 55 vapor-phase reaction 14 variance 49 video-microscope 153 4-vinylpyridine 14, 96 viscosity 31 volatile by-products 21 voltage 171 Western blotting Win-Net 250 zeta potential 50

76

267


Get a grip on vour specialty! J

LIFE SCIENCES Ibelgaufts, H.

Dictionary of Cytokines 1994. Ca. XX, 776 pages with many 4-colored figures. Hardcover. Ca DM 198.-. ISBN 3-527-30042-2 This completely revised and updated edition is a unique review in dictionary form. It draws together information from widely scattered literature, containing more than 14 000 references. It provides concise yet comprehensive information on such topics as: alternative nomenclature/ sources and targets of cytokines/ protein characteristics/ gene organization and chromosomal location/ related factors and factor families/ receptor structure, expression and signalling processes/ biological activities in vitro and in vivol assay systems/ cytokine studies in transgenic animal models The book integrates much primary information and summarizes essential features in more than 3200 entries. It is an in-

valuable aid for the growing number of clinical researchers, clinicians, scientists, and advanced students of immunology, hematology, oncology, cell biology and molecular biology confronted with the necessity of coming to terms with this key area of interdisciplinary research.

Kahl, G .

Dictionary of Gene Technology

jungle of acronyms and swamps of jargon that have frustrated many a researcher. A multitude of cross-references enables non-specialists and experts alike to understand links to related sciences such as genetics, biotechnology, microbiology and biochemistry. Students, researchers, officials and journalists will soon find it difficult to imagine tackling gene technology without the assistance of this user-friendly dictionary.

1994. Ca VIII, 520 pages with ca 215 figures. Hardcover. Ca DM 168.-. ISBN 3-527-30005-8 The most up-to-date and comprehensive collection of all terms of this modern science. With more than 4000 technical terms, this dictionary reflects the importance of gene technology for present-day biology. Extensive explanations and illustrations accompany the terms, providing admirably clear access to the complexities of this vital discipline. Moreover. the book elucidates the

Dateof-: scptanber1994 VCH, P.O. Box 10 1161, D 4 W l Welnheira, TWit~(0)6201 fB6184

-

Indispensable for the Specialist ! ELECTROPHORESIS Westermeier, R.

Electrophoresis in Practice 1993. XV, 277 pages with 154 figures and 4 I tables. Hardcover. DM 68.-. ISBN 3-527-30012-0 A comprehensive compilation of methods for all variations of modem electrophoresis. In-depth presentation of the physical-chemica1 background is followed by numerous practical tips based on application-oriented examples. Both newcomers and experts will appreciate the detailed instructions and the carefully workedout problems that characterize this book.

Foret, F. /Krivankova, L. / Bocek, P.

Mosher, R.A. ISaville, D.A./ Thormann, W.

Capillary Zone Electrophoresis

The Dynamics of Electrophoresis

Series: Electrophoresis Library

Series: Electrophoresis Library

1993. XIV, 346 pages with 201 figures and 32 tables. Hardcover. DM 228.-. ISBN 3-527-30019-8

1992. XV, 236 pages with 105 figures and 17 tables. Hardcover. DM 195.-. ISBN 3-527-28379-X

Topics covered include

This book is unique in its use of a combination of computer simulation and experimental data to illustrate electrophoretic principles. The dynamics, i.e. transient behavior, of the components involved are described for these methods:

- Fundamental

concepts and theoretical principles - Phenomena accompanying electrophoresis - Instrumentation - Principles and Components - How to operate instrumentation - Applications

- isoelectric focusing - isotachophoresis -

zone electrophoresis

- moving boundary electroJOURNAL

phoresis.

ELECTROPHORESIS An International Journal on Applications, Methods, Theory

Reiner Westermeier has become well-known in the field of electrophoresis through his numerous publications, seminars and lectures.

1995. Volume 16. Published monthly. Annual subscription rate DM 998.- incl. p & h outside Germany. ISSN 0173-0835. Language of publication: English. Membership rates available.

DateofWorma~ Septanber1994

VCH,P.O. Box 10 1161, D-69451weinbclm, Telefax (0) 62 01 60 61 84

-

Zollner, H.

LIFE SCIENCES

Handbook of Enzvme Inhibitors d

2nd, revised and enlarged edition

1992. XVI, 1068 pages. Hardcover. In two parts. DM 545.-. ISBN 3-527-28436-2 An impressive amount of information: 0

0 0

8000 different inhibitors for circa 2000 enzymes 15000enzyme-inhibitor interactions

Biochemists and other scientists working with enzymes can now plan and interpret their experiments more effectively and accurately. The organization of the first edition, which was welcomed enthusiastically by experts worldwide, has been retained. The user can search either for an inhibitor of a particular enzyme or for all enzymes which are inhibited by a particular compound.

Date of infomation:

September1994

VCH,P.O. Box 10 1161,

DdWJl Weiaheim, Tdefax (0) 62 @I 60 61 84

-

Microcharacterization of Proteins

Microcharacterization of Proteins

Microcharacterization of Proteins (2nd edition)

Proteins

Proteins

The Secrets of Proteins

Molecular Modeling of Proteins

Surface Activity of Proteins

Posttranslational Modifications of Proteins

The physics of proteins

Production of Recombinant Proteins

Anslational Modifications of Proteins

Surface Activity of Proteins

Proteins of Iron Metabolism

Modification of Proteins

Fluorescent Proteins

animal proteins

Structural Bioinformatics of Membrane Proteins

Structural bioinformatics of membrane proteins

Structural Bioinformatics of Membrane Proteins

Unfolded Proteins

Isolation and Purification of Proteins

Isolation and Purification of Proteins

Adsorption of Ammonia by Proteins

Actin-Monomer-Binding Proteins

Iron-Sulfur Proteins

The gluten proteins

Heparin-Binding Proteins

Acute Phase Proteins

Microcharacterization of Proteins