Functional and Structural Proteomics of Glycoproteins
Raymond J. Owens · Joanne E. Nettleship Editors
Functional and Structural Proteomics of Glycoproteins
123
Editors Raymond J. Owens Oxford Protein Production Facility-UK University of Oxford The Research Complex at Harwell R92 Rutherford Appleton Laboratory Harwell Science and Innovation Campus Oxfordshire OX11 0FA UK
[email protected] Joanne E. Nettleship Oxford Protein Production Facility-UK University of Oxford The Research Complex at Harwell R92 Rutherford Appleton Laboratory Harwell Science and Innovation Campus Oxfordshire OX11 0FA UK
[email protected] ISBN 978-90-481-9354-7 e-ISBN 978-90-481-9355-4 DOI 10.1007/978-90-481-9355-4 Springer Dordrecht Heidelberg London New York © Springer Science+Business Media B.V. 2011 No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
Preface
Large-scale sequencing of the human and other mammalian genomes has created an enormous database of protein sequences for functional and structural analyses. It has been predicted that nearly half of all human proteins are glycosylated indicating the functional importance of glycoproteins in human health and disease. However, the study of glycoproteins presents major challenges. Unlike nucleic acid and amino acid sequences, the glycans attached to proteins are not directly coded for by a template. Rather, they are the result of a complex processing mechanism which acts on proteins destined for the cell surface either to be secreted or retained in the membrane. The glycans attached to proteins are no longer regarded as a byproduct of biosynthesis but are functionally significant in their own right. Importantly, these glycans have emerged as biomarkers in the diagnosis of human diseases such as cancers and play a significant role in the mechanisms by which pathogenic viruses gain entry into human cells. Manipulation of the glycosylation patterns of therapeutic antibodies has led to improvements in their mechanism of action which may ultimately translate into increased clinical efficacy. In the last few years, technology developments, in particular, advances in high throughput separation methods and detection techniques, have accelerated the characterization of the glycosylation patterns of cells and tissues. The use of lectin microarrays coupled to highly sensitive fluorescence-based detection systems has enabled the rapid profiling of glycan expression. Structural analysis is central to understanding the function of glycosylated proteins, though due to their heterogeneity, the attached glycans make glycoproteins difficult to crystallize for x-ray crystallography. The recent development of glyco-engineering techniques coupled to rapid protein production using transient expression in mammalian cells is facilitating the structural determination of glycoproteins. Key to exploiting the information generated by functional and structural studies of glycoproteins is the organization of the primary experimental data into public databases and the development of tools to search and analyse glycan structure and composition. In this volume, the state-of-the art in all these areas is reviewed by experts in the field of glycoproteomics. We are grateful to all the contributors to this book for sharing
v
vi
Preface
their experience and knowledge. We also thank Springer Verlag for the opportunity of undertaking this project and for their assistance during the production of the book. Oxford, UK
Raymond J. Owens Joanne E. Nettleship
Contents
1 Glycoproteomics in Health and Disease . . . . . . . . . . . . . . . . Weston B. Struwe, Eoin F.J. Cosgrave, Jennifer C. Byrne, Radka Saldova, and Pauline M. Rudd
1
2 Glyco-engineering of Fc Glycans to Enhance the Biological Functions of Therapeutic IgGs . . . . . . . . . . . . . T. Shantha Raju, David M. Knight, and Robert E. Jordan
39
3 Bioinformatics Databases and Applications Available for Glycobiology and Glycomics . . . . . . . . . . . . . . . . . . . . René Ranzinger, Kai Maaß, and Thomas Lütteke
59
4 Lectin Microarrays: Simple Tools for the Analysis of Complex Glycans . . . . . . . . . . . . . . . . . . . . . . . . . . . Lakshmi Krishnamoorthy and Lara K. Mahal
91
5 The Application of High Throughput Mass Spectrometry to the Analysis of Glycoproteins . . . . . . . . . . . . . . . . . . . . Sasha Singh, Morten Thaysen Andersen, and Judith Jebanathirajah Steen 6 Solutions to the Glycosylation Problem for Low- and High-Throughput Structural Glycoproteomics . . . . Simon J. Davis and Max Crispin
103
127
7 Role of Glycoproteins in Virus–Human Cell Interactions . . . . . . Thomas A. Bowden and Elizabeth E. Fry
159
Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
181
vii
Contributors
Thomas A. Bowden The Division of Structural Biology, The Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford, OX3 7BN, UK,
[email protected] Jennifer C. Byrne National Institute for Bioprocessing Research and Training, Dublin-Oxford Glycobiology Group, Conway Institute for Biomolecular and Biomedical Sciences, University College Dublin, Belfield, Dublin 4, Dublin, Ireland,
[email protected] Eoin F.J. Cosgrave National Institute for Bioprocessing Research and Training, Dublin-Oxford Glycobiology Group, Conway Institute for Biomolecular and Biomedical Sciences, University College Dublin, Belfield, Dublin 4, Dublin, Ireland,
[email protected] Max Crispin Department of Biochemistry, Oxford Glycobiology Institute, University of Oxford, South Parks Road, Oxford, OX1 3QU, UK,
[email protected] Simon J. Davis Nuffield Department of Clinical Medicine and MRC Human Immunology Unit, Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DS, UK,
[email protected] Elizabeth E. Fry The Division of Structural Biology, The Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford, OX3 7BN, UK,
[email protected] Robert E. Jordan Discovery Technology Research, Biologics Research, Centocor R&D Inc, 145 King of Prussia Road, Radnor, PA 19087, USA,
[email protected] David M. Knight Discovery Technology Research, Biologics Research, Centocor R&D Inc, 145 King of Prussia Road, Radnor, PA 19087, USA,
[email protected] Lakshmi Krishnamoorthy Department of Chemistry, New York University, 100 Washington Square East, Room 1001, New York, NY 10003, USA,
[email protected] ix
x
Contributors
Thomas Lütteke Faculty of Veterinary Medicine, Institute of Biochemistry and Endocrinology, Justus-Liebig University Gießen, Frankfurter Str. 100, 35392 Gießen, Germany,
[email protected] Kai Maaß Department of Chemistry, Institute of Inorganic and Analytical Chemistry, Justus-Liebig University Gießen, Schubertstrasse 60, Building 16, 35392 Glessen, Germany,
[email protected] Lara K. Mahal Department of Chemistry, New York University, 100 Washington Square East, Room 1001, New York, NY 10003, USA,
[email protected] T. Shantha Raju Discovery Technology Research, Biologics Research, Centocor R&D Inc, 145 King of Prussia Road, Radnor, PA 19087, USA,
[email protected] René Ranzinger Complex Carbohydrate Research Center, The University of Georgia, 315 Riverbend Road, Athens, Georgia 30602, USA,
[email protected] Pauline M. Rudd National Institute for Bioprocessing Research and Training, Dublin-Oxford Glycobiology Group, Conway Institute for Biomolecular and Biomedical Sciences, University College Dublin, Belfield, Dublin 4, Dublin, Ireland,
[email protected] Radka Saldova National Institute for Bioprocessing Research and Training, Dublin-Oxford Glycobiology Group, Conway Institute for Biomolecular and Biomedical Sciences, University College Dublin, Belfield, Dublin 4, Dublin, Ireland,
[email protected] Sasha Singh Proteomics Center at Children’s Hospital Boston, Boston, MA 02115, USA; Departments of Pathology, Harvard Medical School and Children’s Hospital Boston, Boston, MA 02115, USA; F. M. Kirby Neurobiology Center, Children’s Hospital Boston, Boston, MA 02115, USA,
[email protected] Judith Jebanathirajah Steen Proteomics Center at Children’s Hospital Boston, Boston, MA 02115, USA; F. M. Kirby Neurobiology Center, Children’s Hospital Boston, Boston, MA 02115, USA; Department of Neurobiology, Harvard Medical School, Boston, MA 02115, USA,
[email protected] Weston B. Struwe National Institute for Bioprocessing Research and Training, Dublin-Oxford Glycobiology Group, Conway Institute for Biomolecular and Biomedical Sciences, University College Dublin, Belfield, Dublin 4, Dublin, Ireland,
[email protected] Morten Thaysen Andersen Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Denmark,
[email protected] Chapter 1
Glycoproteomics in Health and Disease Weston B. Struwe, Eoin F.J. Cosgrave, Jennifer C. Byrne, Radka Saldova, and Pauline M. Rudd
Abstract The addition of oligosaccharides to proteins is a significant posttranslational modification that modulates protein structure, function and localization. Glycans are vital for development in all eukaryotes and are profoundly connected to a large number of human diseases, ranging from glycan genetic diseases to autoimmune disorders and cancer. Glycans present a difficult challenge in the analytical field because of the intricate dynamics of their synthesis as well as the complexity of the structures themselves. In addition to the role of glycans in development and disease, they are of great interest in the biotherapeutic industry where modification of glycosylation can significantly enhance therapeutic efficacy and biological activity in a range of glycoprotein products. However, glycosylation on a global scale in humans is yet to be fully appreciated as researchers are discovering that glycosylation is not only protein, cell or tissue specific, but is additionally influenced by individual genetics and environmental factors. Functional glycomics and glycoproteomics are emerging as a central field in systems biology and will continue to be a key focus in discerning health and disease. Keywords Biotherapeutics · Cancer · Glycoproteomics · Glycosylation · Systems biology Abbreviations ADCC AFP AGP CCD CDG
antibody-dependent cell mediated cytotoxicity α-fetoprotein α1-acid glycoprotein cross reactive carbohydrate determinant congenital disorders of glycosylation
W.B. Struwe (B) National Institute for Bioprocessing Research and Training, Dublin-Oxford Glycobiology Group, Conway Institute for Biomolecular and Biomedical Sciences, University College Dublin, Belfield, Dublin 4, Dublin, Ireland e-mail:
[email protected] R.J. Owens, J.E. Nettleship (eds.), Functional and Structural Proteomics of Glycoproteins, DOI 10.1007/978-90-481-9355-4_1, C Springer Science+Business Media B.V. 2011
1
2
CE CEA CHO CID DMB EAATs EGFR EPO ER ERT ESI FcγR FDA Fuc FucT Fuc-TIII Gal GalNAc Gal-T Glc GlcNAc GnT-I GnT-III GnT-V GU GWAS HA HCC β-HCG HILIC HPLC IgG LacNAc Lex LLO LSD MALDI Man MBL MS MSn Neu5Gc Neu5Nac NK NMR
W.B. Struwe et al.
capillary electrophoresis carcinoembryonic antigen Chinese hamster ovary collision induced dissociation 1,2-diamino-4,5-methylene-dioxybenzene excitatory amino acid transporters epidermal growth factor receptor erythropoietin endoplasmic reticulum enzyme replacement therapy electrospray ionization Fc-gamma receptor Food and Drug Administration fucose α(1,6)-fucosyltransferse α(1,3/4)-fucosyltransferase galactose N-acetylgalactosamine β(1,4)galactosyltransferase glucose N-acetylglucosamine N-acetylglucosaminyltransferase-I N-acetylglucosaminyltransferase-III N-acetylglucosaminyltransferase-V glucose unit genome-wide association study hemagglutinin hepatocellular carcinomas β-human chorionic gonadotrophin Hydrophilic interaction chromatography high performance liquid chromatography immunoglobulin G N-acetyllactosamine Lewisx lipid-linked oligosaccharide lysosomal storage disease matrix assisted laser desorption ioziation mannose mannose-binding lectin mass spectrometry sequential mass spectrometry N-glycolylneuraminic acid N-acetylneuraminic acid natural killer nuclear magnetic resonance
1
Glycoproteomics in Health and Disease
OST PGC PNGase F PSA RA RNase 1 RP-HPLC SLE sLea sLey ST3GalIV ST8Sia II ST8Sia IV TGFβR TOF WAX Xyl Xyl-T 2-AB 2D-DIGE 2-DE
3
oligosacchryltransferase porous graphitized carbon peptide-N-glycosidase F prostate specific antigen rheumatoid arthritis ribonuclease 1 reverse phase-HPLC systemic lupus erythematosus sialyl Lewisa sialyl Lewisy β-galactoside α(2,3)-sialyltransferase α(2,8)-sialyltransferase II α(2,8)-sialyltransferase IV transforming growth factor-β receptor time of flight weak anion exchange xylose β(1,2)-xylosyltransferase 2-aminobenzamide 2D difference gel electrophoresis 2D gel electrophoresis
Contents 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Basic Glycan Structure . . . . . . . . . . . . . . . . . . . . . . . . 1.1.2 Biosynthetic Pathway for N- and O-Linked Glycosylation . . . . . . . . . 1.1.3 Glycan Diversity and Biological Function . . . . . . . . . . . . . . . . 1.2 Glycan and Glycoprotein Analytics . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Mass Spectrometry . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 High Performance Liquid Chromatography . . . . . . . . . . . . . . . 1.2.3 2D Gel Electrophoresis . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Glycosylation and Disease . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Glycosylation in Cancer Biology . . . . . . . . . . . . . . . . . . . . 1.3.2 Role of Glycosylation in Autoimmune Diseases . . . . . . . . . . . . . 1.3.3 Congenital Disorders of Glycosylation (CDGs) . . . . . . . . . . . . . . 1.3.4 Lysosomal Storage Diseases (LSDs) . . . . . . . . . . . . . . . . . . 1.4 Glycobiology in the Treatment of Disease . . . . . . . . . . . . . . . . . . . 1.4.1 Bioproduction of Glycoprotein Therapeutics . . . . . . . . . . . . . . . 1.4.2 Manipulating Glycosylation for Enhanced Biotherapeutic Function . . . . . 1.5 Systems Glycobiology, Glycoproteomics and Glycogenomics in Disease Diagnosis and Pathology . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 4 5 8 10 11 12 13 14 14 21 22 23 23 24 28 29 30
4
W.B. Struwe et al.
1.1 Introduction Carbohydrates, when attached to proteins or lipids, form large complex biomolecules collectively termed glycoconjugates. The carbohydrate moieties of glycoconjugates fall into three main categories: those attached to lipids, and those attached to proteins either through a nitrogen atom (N-linked) or through an oxygen atom (O-linked). The attachment of glycans influences protein structure and function, as well as the localization of cell surface and secreted glycoproteins. Glycans can confer cell-type specificity and are critical components of cell-to-cell signaling [1]. Carbohydrates are also involved in the immune response and host-pathogen interactions [2]. Moreover, changes in N-glycan biosynthesis have been identified as key components in tumour progression in mice and humans [3]. The early stage of N-glycosylation is conserved from yeast to humans and the complete loss of function is lethal [4, 5]. Protein modification with carbohydrates is the most common and complex type of posttranslational modification. A glycoprotein can exist as a number of different glycoforms where the glycan structure(s) vary. The structural complexity and variability of N- and O-glycans is able to provide a high degree of variability to a protein and can modulate its function and/or structure. Glycans also alter the solubility, half-life and aggregation properties of glycoproteins. It is indisputable that glycans play an essential role in the activity of proteins and a fundamental role in biological functions of both native and recombinant glycoproteins. The cell dedicates 1% of its entire genome to glycosylation machinery and upwards of 70% of all proteins are modified by glycans in human cells [6]. When compared to the fields of genomics or proteomics, the study of glycoconjugates is still in its early stages and lacks robust and all encompassing analytical and bioinformatic tools to unravel its complexity. Understanding the glycome and glycoproteome is a much greater challenge and is confounded by the elaborate mechanisms of glycosylation and the fact that the glycome is meticulously intertwined with the genome and proteome.
1.1.1 Basic Glycan Structure In contrast to DNA, RNA and proteins, carbohydrates form branching structures and as a result, a relatively small set of monosaccharides (Table 1.1) can provide considerably complex structures. The dynamics of glycan synthesis, the complexity of the glycans and the extrinsic properties attributed to glycans are significant factors that make glycobiology one of the less well understood disciplines in systems biology research today. Approximately 17 monosaccharides constitute the building blocks of N- and Oglycans, and it is the linkages and branching of these residues that gives rise to the complexity of oligosaccharides. One report calculated the number of possible structural isomers based on a hexasaccharide to be greater than 1.05×1012 [7], however in practice N-glycan isomers are present far fewer in number in nature. However,
1
Glycoproteomics in Health and Disease
5
Table 1.1 Common monosaccharides by types Monosaccharide
Type
Abbreviation
2-Keto-3-deoxynononic acid Fucose Galactose Galactosamine Galacturonic acid Glucose Glucosamine Glucuronic acid Iduronic acid Mannose Mannosamine Mannuronic acid N-acetlygalactosamine N-acetlyglucosamine N-acetlyneuraminic acid N-glycolylneuraminic acid Xylose
Sialic acid Deoxyhexose Hexose Aminohexose Uronic acid Hexose Aminohexose Uronic acid Uronic acid Hexose Aminohexose Uronic acid Aminohexose Aminohexose Sialic acid Sialic acid Pentose
Kdn Fuc Gal GalN GalA Glc GlcN GlcA IdoA Man ManN ManA GalNAc GlcNAc Neu5NAc Neu5Gc Xyl
the degree of structural information carried by glycans is potentially a great deal more than proteins or nucleic acids.
1.1.2 Biosynthetic Pathway for N- and O-Linked Glycosylation The complexity of N-glycans (Fig. 1.1) is achieved through a non-template driven biosynthetic pathway that relies on the transcription and translation of glycosylation enzymes that are precisely localized throughout the cell. There are four distinct stages in eukaryotic cells. (i) Formation of the lipid-linked oligosaccharide (LLO) precursor on the surface and in the lumen of the endoplasmic reticulum (ER). This process is highly conserved among all eukaryotes and defects of enzymes in this pathway are the basis for congenital disorders of glycosylation type I (CDG-I) in humans. (ii) The en bloc transfer of the precursor oligosaccharide to the nascent polypeptide in the lumen of the ER facilitated by the oligosacchryltransferase (OST) complex hetero-oligomeric proteins composed of eight subunits. (iii) A series of quality control steps to ensure correct folding. This is achieved by the chaperones calnexin and calreticulin as well as the glycosidase enzymes glucosidase I, glucosidase II and ER α (1,2)-mannosidase. (iv) Glycan processing by the addition of new sugar residues to the truncated glycan in the medial and trans-Golgi (Fig. 1.2). Glycosylation in the Golgi is responsible for the highly complex structural diversity of N-glycans found in mammals and other higher species. The identification of over 200 glycosylation enzymes has aided in understanding the mechanism of glycosylation [8]. Several factors are important to glycan formation including transport rates of glycopeptides from the ER to the Golgi, the duration of glycocopeptides in the Golgi, sugar nucleotide metabolism, and localization of
6
Fig. 1.1 (continued)
W.B. Struwe et al.
1
Glycoproteomics in Health and Disease
7
Fig. 1.2 N-linked glycan biosynthesis in the endoplasmic reticulum and Golgi. Initiation of N-linked glycan biosynthesis occurs on the cytosolic side of the ER, where dolichol diphosphate acts as the scaffold for the extension of sugar moieties to the Man5 GlcNAc2 glycoform. Following a translocation to the lumen side of the ER, further extension is performed and transfer of the Glc3 Man9 GlcNAc2 structure from dolicholpyrophosphate to the nascent polypeptide arriving from ribosomal activity. Calnexin/calreticulin mediated quality control for proper protein folding precedes the migration of the naïve glycoprotein to the cis-Golgi, where further glycan processing occurs. Trimming of mannose residues and addition of a GlcNAc residue on the α (1,3)-mannose arm signals migration to the medial-Golgi, where further assembly of complex glycans occurs. This includes the replacement of the α(1,3)- and α(1,6)-mannose residues on the α(1,6) arm with a GlcNAc and the addition of a core α(1,6)-fucose. Transfer to the trans-Golgi results in the addition of galactose and sialic acid residues, completing the process of N-linked glycan biosynthesis. Glycoproteins are subsequently targeted to specific intra- or extra-cellular locations
Fig. 1.1 Structural examples of N- and O-linked glycans. Glycans can be attached to asparagine and either serine or threonine residues, resulting in the formation of N- and O-linked glycans, respectively. N-linked biosynthesis involves a multitude of glycosyltransferases and glycosidases, together acting to generate glycans of unparalleled complexity. These can broadly be categorized as either high mannose type, hybrid type, or complex type. Examples of each are presented. O-linked glycosylation is defined by the biosynthesis of eight core structures on which further complexity is normally found. The structure of each of the core structures is presented. Individual residues are shown with distinct shapes and shading. Linkage positions are represented by the angle of the line linking adjacent monosaccharides. Anomericity is indicated by using either a full line to represent a β-linkage or a dashed line to represent an αlinkage
8
W.B. Struwe et al.
glycosyltransferases in the Golgi [9]. The focus of functional glycomics is to understand how glycan diversity and microheterogeineity results from and contributes to biology in development and disease. The direct links between glycan structure and gene expression are becoming increasingly important in the context of systems biology, whereby all cellular factors including genomics, proteomics, transcriptomics and metabolomics are largely considered.
1.1.3 Glycan Diversity and Biological Function Deciphering the association between a particular glycan structure and its function is an essential question that plagues glycobiologists. Presently, more than 7000 glycan structures have been determined, but their significance in cellular function remains to be established [10]. Some suggest that glycomics is at least an order of magnitude more difficult than proteomics [11]. Considering the complex diversity of glycans and how they influence proteins, it is no surprise that Schachter [12] once asked, “will it ever be possible to determine the role that a specific posttranslational modification plays in the function of a specific protein for every protein in the genome?”. Nonetheless, notable advances have been made in elucidating the role of N-linked glycans. It is known that glycans influence cell growth and development, tumour growth and metastasis, anticoagulation, immune recognition and response, cell-cell communication and microbial pathogenesis [13]. While defects in the glycan biosynthetic machinery can have fatal or debilitating consequences that manifest in diseases such as autoimmune disorders and lysosomal storage diseases (Section 1.3.4). The goal of functional glycomics is to assign specific glycans to a particular protein and determine their function. There is increasing interest in functional glycomics and several international collaborative efforts have been established, striving to deduce the biological roles of glycans. These groups include the Consortium for Functional Glycomics, EuroCarbDB and the Japanese Consortium for Glycomics. Becoming increasingly evident is that the complexity of glycomics requires a systems approach that investigates biosynthesis, structural analysis as well as glycan-protein interactions to delineate glycan-structure relationships. The many experimental approaches taken to understand the roles of glycans include inhibition of glycosylation, alterations to processing mechanisms, elimination of glycosylation sites, enzymatic or chemical de-glycosylation of complete glycan chains and the study of glycosylation mutants [14]. The consequences of altering individual glycosylation mechanisms are highly unpredictable and the effects can range from virtually undetectable to lethal. Moreover, altering glycosylation changes all glycoprotein structures and functions in a cell simultaneously which needs to be considered in such experiments. Recently, Chinese hamster ovary (CHO) glycosylation knockouts have revealed high mass complex N-glycans in the order of m/z ∼13,000 that consist of up to 26 N-acetyllactosamine repeats [15]. In many cases, investigating the role or structure of glycans through knockout models
1
Glycoproteomics in Health and Disease
9
is beneficial, but the universal presence of glycoconjugates makes understanding cellular phenotypes as a function of their gene and protein components difficult. On the most basic level, glycans alter proteins either intrinsically or extrinsically where the carbohydrate modulates the function of the underlying peptide. The external location of glycans on proteins can serve as a shield, protecting the protein from proteases or antibodies. Carbohydrates are exceedingly hydrophilic which alters the conformation and solubility of proteins. Protein folding is driven by its folding energy landscape initiated by hydrophobic collapse. The free energy for each possible conformation is largely determined by the primary sequence and by the contacts of its nonpolar groups. The addition of carbohydrates during translation in the ER greatly alters the energy landscape of proteins. Depending on the size, glycoforms and extent of occupancy, a protein will fold until a native structure is formed and the lowest free energy is reached. Individual protein motifs (α-helices and β-turns) fold within microseconds, which is why quality control measures are in place to determine proper folding before any protein exits the ER [16]. The presence or absence of glycans can affect the primary function or activity of a protein. For example, β-human chorionic gonadotrophin (β-HCG) can bind with similar affinity to its receptor with and without its glycan component. β-HCG activates adenylate cyclase leading to increased cAMP production, but β-HCG fails to do so in its deglycosylated form [14, 17]. This illustrates that glycans can regulate the primary function of a glycoprotein without changing its binding properties. Glycans can also influence the longevity of proteins to which they are attached. In the case of human erythropoietin (EPO), the presence of sialic acid on its termini increases the half-life, but decreases the activity in vitro [18]. The extent of branching can determine binding of EPO to its receptor in specific tissues [19]. The tuning effect of glycan sequences act in protein function, although the effect may be a change in the binding mechanism seen through changes in glycoprotein structure. Glycans act as specific ligands for endogenous and exogenous receptors. The role of glycans as ligands for lectins is perhaps the most well explored functional aspect of oligosaccharides in cellular systems. For example, the glycoprotein hemagglutinin (HA) on the surface of the avian influenza virus is responsible for binding to the viral host cell [20]. Avian HA binds specifically to α (2,3)-sialylated glycans, which are absent in the respiratory tract of humans. It is thought that a switch in HA binding from α(2,3)-sialylated glycans to α(2,6)-sialylated glycans, which are present in humans, enables infections in humans [20]. Currently functional glycomics lacks any high-throughput method for determining the site specific structures and functions of each glycan moiety on a case by case basis. Structural characterization of glycans is only one of many important aspects of functional glycomics. The challenges include understanding glycan structure as a function of extracellular signaling, determining the basis for glycan-protein specificity and interactions, and elucidating how glycan diversity is generated as a function of its biosynthesis. Furthermore the biology and biosynthesis of glycosylation on a cellular, let alone multicellular level remains unclear. Addressing the fundamental biology of glycosylation is vital in order to link all facets of functional glycoproteomics.
10
W.B. Struwe et al.
Genomics, proteomics and metabolomics all play a part in glycosylation thereby necessitating researchers to consider additional factors in determining the role of oligosaccharides in disease. The new era of glycomics will not only set out to answer the structure-function relationship of glycans, but will seek to determine the extrinsic factors that lead to the variable glycosylation observed in disease and what the implications are for the patient. New analytical trends will aspire to comprehensive positional structural analysis of glycoproteins on a sensitive and high-throughput scale. But the function of a particular glycan(s) on a specific protein and its corresponding expression is the key to understanding glycoproteomics role in the clinical setting.
1.2 Glycan and Glycoprotein Analytics The drive to understand glycoproteomics has fueled the technological development of new and innovative techniques that aim to determine the sequence of both the protein backbone and glycan component in addition to site occupancy. Glycoprotein analysis seeks to determine not only the overall glycan profile of a given glycoprotein, but the individual glycoforms on each site of glycosylation that together contribute to the complete glycan profile. Methods for analyzing glycoproteomics and glycans are developing rapidly to investigate these problems. The majority of glycan analytical techniques employ high performance liquid chromatography (HPLC), nuclear magnetic resonance (NMR), mass spectrometry (MS), capillary electrophoresis (CE), lectin arrays, glycan arrays or 2D gel electrophoresis (2-DE). Mass spectrometry has emerged as the preferred tool to determine the structure of intact and/or digested glycoproteins because of its sensitivity and capability to determine the size of higher mass molecules [21–23]. The abundance and variability of glycosylated proteins in biological samples are the most daunting factors in the field of glycoproteomics. For this reason, glycoprotein analysis may incorporate enrichment or purification of specific glycoforms via lectins, immunoprecipitation, and single- or two-dimenional gel electrophoresis prior to MS analysis. In bottom-up methods glycoproteins are digested by a specific protease (e.g. trypsin) following enrichment. From this step, glycopeptides can be further enriched by lectins, hydrophilic interaction chromatography (HILIC), size-exclusion chromatography or hydrazine resin [24]. Mass mapping of the glycopeptides via tandem MS (MS/MS) will provide information of glycan structure and heterogeneity. Concurrently, MS of the deglycosylated peptide, after PNGase F for N-glycans and reductive β elimination for O-glycans, aids in determining site occupancy. PNGase F removes N-glycans between the reducing end N-acetylglucosamine (with and without α (1,6) linked fucose) and the asparagine residue on the peptide. The asparagine residue will form an aspartic acid following PNGase F treatment, resulting in a one m/z mass shift in the mass spectrometer. Topdown methods analyze intact glycoproteins, which is valuable in identifying the type and location of a particular glycans composition. Top-down approaches are not ideal for large glycoproteins, heterogeneous or novel glycoproteins. Moreover, top-down
1
Glycoproteomics in Health and Disease
11
strategies do not supply in-depth glycan information due to the loss of the glycan component during the fragmentation step or the type of glycan fragment detected, which is typically between the penultimate and reducing end N-acetylglucosamine. This common fragmentation does little more than confirming the presence of the glycan and overall topology. Generally, additional experiments are required for complete glycan analysis.
1.2.1 Mass Spectrometry Current approaches for glycan analysis mainly focus on the liberated glycan and overlook site occupancy. Alternatively analysis of intact glycoforms does not provide the detailed structural information achievable from purified glycans. The difficulty with glycoproteomics is the impediment of analytical and bioinformatics tools to support researchers in defining the structure of glycoproteins. Mass spectrometry is the chosen tool to analyze glycans because of the sensitivity, reproducibility and relative ease it provides. MS based analyses of glycans employ both matrix assisted laser desorption ionziation (MALDI) and electrospray ionization (ESI) ion sources either coupled to on-line chromatographic separation systems. Each source has its advantages and disadvantages. MALDI-time of flight (TOF) instruments offer analysis in the higher mass range with limited sample expenditure and can tolerate minor sample contaminants. However, the sensitivity and level to which a precursor ion can be isolated and fragmented (i.e. sequential mass spectrometry or MSn ) is limited. MALDI and ESI ionization result in primarily protonated or sodiated ions in positive and deprotonated ions in negative mode, especially when sialic acids are present. However, the individual wet-lab workflow can greatly influence the type of ion/adduct present. In addition to sodium ions, lithium and potassium ions are also likely. ESI based instruments can be joined with a greater range of mass analyzers and therefore can be used in different capacities and access more information based on the type of sample being analyzed. Negative mode ESI-QTOF MS/MS has been used with great success in analyzing native glycans from cancer samples [25– 28]. The attraction of negative mode analysis is that the majority of fragments generated during the collision induced dissociation (CID) step are cross-ring cleavages, which are more informative than B-type and C-type fragments typical of positive mode analyses. Likewise MSn techniques are unparalleled in the amount of structural information that can be generated [29–31]. MSn techniques have been instrumental in detecting cancer specific glycans or potential biomarkers in metastasis [32]. Typically, ESI analyses require larger sample volumes that must also be more pure than those subject to MALDI. As a result, many ESI analyses will couple the MS instrument to a chromatographic separation, typically porous graphitized carbon (PGC), amide-80 resins, ion exchange or reverse phase C18 or C4 . The consequence of on-line MS approaches is the level of “mining” that can be performed. Specifically, for MSn applications, the procedure requires a direct and continuous
12
W.B. Struwe et al.
injection so a single m/z composition can be analyzed for greater periods of time which is not possible in LC-MS/MS or LC-MSn methodologies. Generally, a full MS analysis of N- or O-glycans will incorporate both types of ion sources. All in all, sample preparation and purity is crucial in any MS approach and LC separation, either off- or on-line can be advantageous. Alternatively solid phase extraction using PGC, C18 or cation exchange resin is also useful to purify carbohydrates from proteins and/or salt contaminants [33]. Glycans can also be derivatived by permethylation or methylation, which are compulsory for MSn and sialic acid linkage detection respectively [34, 35]. Permethylation also acts as an adhoc purification step and the permethylated products ionize more readily as sodium adducts so that sialic acid and neutral structures can be analyzed simultaneously.
1.2.2 High Performance Liquid Chromatography In addition to mass spectrometry, HPLC is an equally robust analytical tool used to analyze glycans. The advantage that HPLC has over MS is its increase sensitivity, quantitative qualities, reproducibility and the ability to provide monosaccharide sequence and linkage information. However, novel glycans routinely require MS to confirm the overall composition or structure. The analysis of the serum glycome relies on the ability to detect low abundant glycans due to the minimal expression of the majority of serum glycoproteins. Although HPLC is not generally used for analysis of whole or digested glycoproteins, its attributes towards glycan analysis makes it favorable for analysis of the glycan component. Additionally, HPLC-based glycan characterization have emerged as a powerful high-throughput method [36]. Amide-based hydrophilic interaction chromatography (HILIC) glycan separation is a well-established technique capable of providing highly reproducible profiles [37–40]. Since 2-aminobenzamide (2-AB) labeling of glycans has a stoichiometry of 1:1, the resulting chromatogram is quantitative. HPLC is also helpful to analyze charged and neutral glycans simultaneously, since the analysis does not cleave sialic acid bonds as in the case of MALDI-MS and ESI-MS. The use of a dextran ladder enables the glycan retention times to be standardized into glucose unit (GU) values, which allows the normalization of all glycans analyzed by HILIC, effectively eliminating variability between HPLC systems. HPLC (and MS) analysis can be strengthened by exoglycosidase digestion because of the specificity of the enzymes which only cleave specific linkages, monomers and anomers. HPLC techniques have also grown to include the option of sialic acid quantitation and speciation, facilitated by the use of weak anion exchange HPLC (WAX-HPLC) and 1, 2-diamino-4, 5-methylene-dioxybenzene (DMB) labeling. In sialic acid speciation, terminal sialic acid residues are typically released from glycoproteins by acid hydrolysis and the resulting isolated sialic acid-based saccharides are labeled with DMB. Similar to 2AB, DMB couples with sialic acid in a 1:1 stoichiometry,
1
Glycoproteomics in Health and Disease
13
allowing for the relative quantitation of individual sialic acid species. Labeled glycans are separated by reverse phase-HPLC (RP-HPLC) using a C18 solid phase, where sialic acid orthologues such as Neu5Gc (N-glycoylneuraminic acid), Neu5, 7Ac2, and Neu5, Gc9Ac are identified and quantified. Separation of glycans by WAX-HPLC assists in identifying the number of sialic acid moieties present on a given glycoprotein. Used in conjunction with sialidase digestions, glycans can then be verified as containing terminal sialic acids of a known number. Whether glycans are present that carry a negative charge not provided by N-acetylneuraminic acid, such as phosphate or sulfate, can be determined. From an analytical perspective, quantitation of individual sialic acid species provides essential information that cannot be accomplished by MS methods alone.
1.2.3 2D Gel Electrophoresis 2D gel electrophoresis (2-DE) is a notable tool to analyze complex mixtures of glycoproteins from serum, tissue or whole cell lysates. 2-DE separates proteins and glycoproteins based on their isoelectric point in the first dimension and molecular mass in the second dimension. The advantage of 2-DE lies in the fact that whole proteomes are analyzed and the data reflects the presence of isoforms and changes in glycosylation of specific proteins between samples. 2-DE can also accompany MS analysis to further protein identification and characterization. Depletion of high abundant proteins (albumin, IgG, anti-trypsin, IgA, transferrin and haptoglobin) is required for analysis of serum samples and the failure to do so results in masking of medium- to low-abundant proteins. Reports suggest that relevant disease markers will not be one of the six aforementioned proteins, but will instead be proteins excreted or cleaved from specific tissue sites [41]. Aside from serum proteins, housekeeping proteins are also a consideration for 2-DE, where the number may be 105 to 106 copies per cell [42]. Comparatively speaking the number of protein receptors may only be present in 100 eV) or low- (< 100 eV) energy collision fragmentation. Generally, TOF-TOF instruments use high energy collision fragmentation and Q-TOFs low energy CID. Low energy tandem MS was applied to a glycopeptide linked with a N-glycolylneuraminic acid (NeuGc)2 GalGalNAc moiety. The MS/MS spectrum showed the loss of NeuGc, Gal, and GalNAc groups from the precursor ion [74]. Similar fragmentation studies have been performed
5
The Application of High Throughput Mass Spectrometry Fragmentation method
Fragmentation (Primary)
115
Structural information
Examples B ions
CID HCD
Glycosidic bonds Mostly B / Y ions + oxonium ions are produced
Glycan identity Monosaccharide composition and branching
IRMPD
Y ions
*
*
+ Oxonium ions:
*
m/z 163 HexNAc m/z 204 LacNAc m/z 366
* H-Q-G-N-D-T-S-R
ECD ETD
N-Cα bonds of peptide Mostly c/z ions are produced
Peptide identity Sequence, location of glycosylation site and mass of glycan
z ions
H-Q-G-N-D-T-S-R c ions
Fig. 5.2 Overview of the fragmentation techniques used for glycopeptides in glycoproteomics. CID, HCD and IRMPD represent fragmentation techniques that preferentially cleave the glycosidic bonds whereas ECD and ETD cleave the peptide backbone. Thus, the former yields information of the glycan identity and the latter on the peptide identity and glycosylation site. Examples of the two groups of fragmentation techniques are given. Abundant fragments are marked with asterisks
on N-glycosylated peptides where the most abundant signals in the MS/MS spectra correspond to fragments from the glycan moiety [75–78]. In addition to glycan fragmentation, CID also induces the production of oxonium ions in the low m/z region of the MS/MS spectra, which can be of great diagnostic value. For example, production of oxonium ions was demonstrated for mono- and di-fucosylated O-linked peptides using an ESI-Q-TOF mass spectrometer under low-energy fragmentation conditions [79]. On individual instruments the collision energy can be manipulated to obtain optimal fragmentation of a given analyte. This has led to a special high energy variant of CID called higher-energy collision dissociation (HCD). If HCD is applied to glycopeptides, fewer, but more abundant fragments appear in the spectrum (unpublished observation). These fragments are mostly oxonium ions and specific glycan fragments. Occasionally, there will be low abundant fragments of the peptide backbone present in the spectra as well. The invention of ECD in FTICR MS instruments [80, 81] has facilitated the localization of labile PTMs such as O- and N-linked glycans to their specific glycosylation sites [82, 83]. ECD produces odd-electron, free radical driven fragmentation which results in cleavage of the N-Cα bond of the peptide backbone, and the preservation of the PTM on the peptide. Thus, ECD yields mostly information of the peptide moiety and the site of glycosylation. ECD has successfully been used in combination with IRMPD, which has been shown to selectively cleave glycosidic bonds rather than peptide bonds [82, 84].
116
S. Singh et al.
Similar to ECD, ETD has emerged as an alternative fragmentation technique complementary to CID and IRMPD [85, 86]. Peptide fragmentation is achieved through gas-phase electron-transfer reactions and the glycan moiety is left intact [87]. Generally, the efficiency of ETD fragmentation increases with the charge density of the glycopeptides meaning that analytes with higher charge states will fragment more readily. Thus, ETD is usually limited to glycopeptide ions with three or preferably more charges [88, 89].
5.3.2 Structural Characterization of Glycopeptides 5.3.2.1 Identification of Glycoproteins and Their Glycosylation Sites in Glycoproteomics Following enrichment (summarized in Fig. 5.3) and deglycosylation, previously glycosylated peptides derived from glycoproteins can be identified using regular proteomics techniques. The removal of N-glycans can be performed efficiently using endoglycosidases such as PNGase F or A or endo-β-acetylglucosamidinases (endo H, D or F). PNGase F cleaves the GlcNAc-Asn bond converting the asparagine to aspartate. This conversion results in a 0.98 Da mass increment, a discernable remnant at the mass spectral level [90]. The major pitfall of using the asparagine-to-aspartate conversion is that the approach does not distinguish between
lysate
Selective Derivatization Metabolic or chemical labeling
Streptavidin or hydrazide columns, or Staudinger reaction capture
Lectin affinity chromatogrpahy
Endoglycosidase treatment PNGase F, Endo H, Endo D
Glycan detection / verification Probe against tag or glycan
Gel electrophoresis Protein digest Trypsin, chymotrypsin, etc.
Streptavidin or hydrazide columns, or Staudinger reaction capture
HILIC LC/MS with glycan specific precursor ion scans
Fig. 5.3 An overview the number of strategies for glycoprotein/glycopeptides enrichment. These strategies are often combined for fine-tuned enrichment of a subset of the glycoproteome. This figure emphasizes that a significant amount work in the MS-dependent glycoproteome analysis is in the preparation of the glycans themselves
5
The Application of High Throughput Mass Spectrometry
117
deglycosylation-induced conversion and other causes of deamidation, either in vivo or in vitro. Deglycosylation in heavy isotopic water (H2 18 O) has been performed as the +2.98 Da mass shift is more easily recognized [91]. The relative abundance of glycopeptides from two samples can also be determined by performing the deglycosylation of the two samples in H2 16 O and H2 18 O, respectively, with subsequent mixing and determination of the relative intensities of their respective MS signals. An alternative to PNGase F/A is Endo H/D/F, which cleaves the glycosidic bond between the two GlcNAc residues of the N-glycan core leaving a single GlcNAc residue attached to the peptide. The increased mass difference incurred at the modified peptide is 203 Da (349 Da for core fucosylated N-glycans), which represents a unique N-glycosylation site identifier [92]. 5.3.2.2 Characterization of Glycan Structures from Glycopeptides Site-specific characterization of glycans includes the determination of monosaccharide composition, linkage types and branch points. Targeted studies on specific glycoproteins can address all of these levels; however, for larger scale studies determination of the monosaccharide composition is usually the aim. In particular for N-glycans, the overall structure can be hypothesized from a given monosaccharide composition due to the presence of a conserved core structure and a well-defined and restricted glycan synthesis pathway. As described in Section 5.3.1.2, tandem MS fragmentation can be used to obtain information about the glycan structure of the glycopeptide. CID is efficient for generating glycan fragmentation of glycopeptides, but fails to give much information about the peptide and glycosylation site. Occasionally, low abundant peptide fragments appear in the CID MS/MS spectra from which the sequence can be deduced. For N-linked glycosylation, the glycosylation site is usually quite easy to identify since N-glycans are linked to asparagine residues in the restricted sequence NXT/S, where X can be any amino acid residue except for proline. However, for O-linked glycosylation, the glycosylation site is often impossible to determine with CID due to the fact that no consensus sequence is known for this type of glycosylation and that O-linked glycans often appear in regions rich in serine and threonine residues. In these cases, ETD and ECD fragmentations are beneficial since they retain the glycan on the peptide backbone and generate peptide information. Thus, both fragmentation techniques are often required to obtain the information needed in glycoproteomics. Although several glycopeptide enrichment strategies are available (see Table 5.1 for overview), samples will often contain an amount of non-glycosylated peptides in particular when starting from complex peptide mixtures, which is usually the case in glycoproteomics. Even if only little “contamination” from non-glycosylated peptides is present, the sample can still be rather complex since a large fraction of the proteome is glycosylated. Hence, a separation step is often needed in front of the mass spectrometer making ESI-LC-MS an attractive workflow. If a fairly wellenriched glycopeptide sample is analyzed with an ordinary LC-MS setup (90 min gradient, reversed-phase column, selecting the top 5 MS signals for CID MS/MS
118
S. Singh et al.
fragmentation), a large amount of information can usually be obtained although interpretation can be time-consuming and laborious. However, for samples containing relatively low amounts of glycopeptides there is a risk that these signals will never be selected for fragmentation and will be lost in the “noise” of the nonmodified peptides. Hence, some specific operation modes, which will be described below, have been designed to optimize the MS analysis. In the early 1990s a set of landmark studies established standard MS protocols to screen for glycan-specific reporter ions/fragments, which serve to guide the mass spectrometer to enhance further fragmentation of glyosylated peptides from which the reporter ions were generated [93, 94]. In a typical tripe quadrupole MS experiment (QqQ MS), the first quadrupole scans a mass range such that ions are transferred to the second quadruopole (q) for CID. These fragmented ions are then transferred to the third quadrupole where the mass scan range is specific for the glycan reporter ions. If the reporter ions are detected, the mass spectrometer prioritizes further fragmentation of their precursor masses – the intact glycopeptides from which they were derived, thus the term precursor ion scanning. The common glycan reporter ions include Hex+ (m/z 163), HexNAc+ , (m/z 204), HexHexNAc+ (m/z 366); and m/z 274 and 292 which are reporters for the presence of sialic acid [17, 95, 96]. If multiple reporter ions co-elute or are derived from the same precursor mass, the confidence in the glycan identification is increased [17, 93, 97]. Neutral loss scanning is also used for isolating post-translationally modified peptides. The name is derived from the observation that a loss in a PTM (or part of the PTM) can affect the parent or precursor mass without affecting the charge. Observed neutral losses are often unique to a given PTM. For example, in a typical QqQ experiment, when samples are subjected to CID, ions are transferred to the third quadrupole which is set to scan for an offset in the precursor mass corresponding to the neutral loss of m/z 203 for the HexNAc moiety [43, 98]. A neutral loss corresponding to hexose (162 Da), m/z 81 and 54 for doubly and triply charged species respectively, can also be observed when collision energies are optimized for an ESI-QTOF instrument set-up [99]; a neutral loss of 146 Da is also an indicator of fucosylated glycans as observed by LTQ-FT [43]. In summary, complete glycan characterization is extremely difficult in glycoproteomics. At present, researchers are mainly focusing on the identification of glycosylated proteins, their glycosylation sites and the monosaccharide compositions of the attached glycans. Largely, these levels are achievable with the current techniques and instrumentation. However, as is the case for proteomics, only the most abundant subset of the glycoproteome is detectable with these methods. Higher sensitivity and dynamic range of the analytical techniques and better separation, pre-fractionation and enrichment of the sample might increase the depth of the glycoproteome coverage. In addition, the advancement of glycoproteomics is dependent on the development of bioinformatic tools for the automatic assignment of glycopeptide MS/MS spectra and of robust search engines similar to the ones available for regular proteomics.
5
The Application of High Throughput Mass Spectrometry
119
5.4 Conclusions Glycoproteomics is conceptually based on the large scale analysis of glycopeptides or alternatively directly on intact glycoproteins. Detailed structural characterization of glycopeptides generates biologically relevant information, however, it is technically challenging compared to the structural analysis of released glycans in glycomics. Some of the challenges are associated with mass analysis, for example, the higher analyte mass and generating (and interpreting) fragmentation of both the glycan and the peptide moieties. However, the main challenge is found at the sample preparation level, where the heterogeneity of protein glycosylation dictates selective glycopeptide enrichment. Different characteristics have been used to discriminate for glycopeptides: mass, hydropathy, structure and charge. Naturally, the most unique characteristic is the glycan structure; however, structural pattern recognition inherently introduces a bias for subsets of the glycoproteome, making it unsuited for quantitative experiments. In contrast, the hydrophilic character is a common feature of glycopeptides and hydrophilicity is consequently an ideal physicochemical parameter for the selective and non-biased enrichment of glycopeptides. Hence, it is expected that the use of HILIC will increase in the coming years. The development and application of HILIC are expected to be paralleled by the use of other enrichment techniques, in particular lectins. The bias of lectins can be reduced by using lectins with broad specificity or mixtures of lectins. For targeted glycoprotein analysis, proteolytic enzymes generating glycopeptides of appropriate mass can easily be predicted in silico. It is much harder to design universal strategies for complex protein mixtures because the polypeptide sequences around the glycosylation sites vary among the glycoproteins. The use of proteinase K or pronase, which generate short glycopeptides around the sequon irrespective of the polypeptide sequence, are options when aiming for the development of universal workflows. Technology improvements are continuously increasing the sensitivity, resolution, accuracy and speed of mass spectrometers. However, it seems that the performance of the modern mass spectrometers fulfill the requirements for the majority of the reported studies. Thus, the quality of the obtained data is to a higher degree dependent on the quality/condition of the sample applied to the mass spectrometer rather than the last few percent of MS performance. It is anticipated that improvements in sample preparation (i.e. enrichment and desalting) and analyte separation techniques (pre-fractionation / on- or off-line LC-MS) will contribute more to the progress of the field than further improvements of the mass spectrometers. However, there are recent examples where MS technology inventions have aided carbohydrate analysis significantly e.g. ETD for the selective fragmentation of the peptide backbone of glycopeptides to facilitate glycosylation site assignment. There is little doubt that MS will continue to be the main analytical tool for glycoproteomics in the coming years. In conclusion, judging by the technical challenges still associated with the analysis of glycopeptides from relatively simple peptide mixtures there is still a long
120
S. Singh et al.
way to go before glycoproteomics is performed routinely in non-expert laboratories. Developments on the sample preparation level and on the general workflows are crucial for moving more rapidly in this direction and acquiring data of higher quality. In addition, improvements in the bioinformatics tools available are similarly required as a consequence of the increasing demand for large-scale glycoproteomics experiments.
References 1. Roseman S (2001) Reflections on glycobiology. J Biol Chem 276:41527–41542 2. Rudd PM, Wormald MR and Dwek RA (2004) Sugar-mediated ligand-receptor interactions in the immune system. Trends Biotechnol 22:524–530 3. Apweiler R, Hermjakob H, Sharon N (1999) On the frequency of protein glycosylation, as deduced from analysis of the SWISS-PROT database. Biochim Biophys Acta 1473:4–8 4. Hart GW, Housley MP, Slawson C (2007) Cycling of O-linked beta-N-acetylglucosamine on nucleocytoplasmic proteins. Nature 446:1017–10122 5. Hwang HY, Olson SK, Esko JD, Horvitz HR (2003) Caenorhabditis elegans early embryogenesis and vulval morphogenesis require chondroitin biosynthesis. Nature 423:439–443 6. Collins BE, Paulson JC (2004) Cell surface biology mediated by low affinity multivalent protein-glycan interactions. Curr Opin Chem Biol 8:617–625 7. Lin X (2004) Functions of heparan sulfate proteoglycans in cell signaling during development. Development 131:6009–6021 8. Lowe JB, Marth JD (2003) A genetic approach to mammalian glycan function. Ann Rev Biochem 72:643–691 9. Dube DH, Bertozzi CR (2005) Glycans in cancer and inflammation – potential for therapeutics and diagnostics. Nat Rev Drug Discov 4:477–488 10. Inatani M, Irie F, Plump AS, Tessier-Lavigne M, Yamaguchi Y (2003) Mammalian brain morphogenesis and midline axon guidance require heparan sulfate. Science 302:1044–1046 11. Kinjo Y et al (2005) Recognition of bacterial glycosphingolipids by natural killer T cells. Nature 434:520–525 12. Casu B, Guerrini M, Torri G (2004) Structural and conformational aspects of the anticoagulant and anti-thrombotic activity of heparin and dermatan sulfate. Curr Pharm Des 10: 939–949 13. Guo Y et al (2004) Structural basis for distinct ligand-binding and targeting properties of the receptors DC-SIGN and DC-SIGNR. Nat Struct Mol Biol 11:591–598 14. Liu D, Shriver Z, Venkataraman G, El Shabrawi Y, Sasisekharan R (2002) Tumor cell surface heparan sulfate as cryptic promoters or inhibitors of tumor growth and metastasis. Proc Natl Acad Sci U S A 99:568–573 15. Ludwig JA, Weinstein JN (2005) Biomarkers in cancer staging, prognosis and treatment selection. Nat Rev Cancer 5:845–856 16. Anderson NL, Anderson NG (2002) The human plasma proteome: history, character, and diagnostic prospects. Mol Cell Proteomics 1:845–867 17. Jebanathirajah J, Steen H, Roepstorff P (2003) Using optimized collision energies and high resolution, high accuracy fragment ion selection to improve glycopeptide detection by precursor ion scanning. J Am Soc Mass Spectrom 14:777–784 18. Jebanathirajah J, Stensballe H, Jensen A, Roepstorff P (2002) Modification specific proteomics: integrated strategy for glyco and phosphospecific proteomics. In Presented at the ASMS 2002, Orlando, FL 19. Zhang H, Li XJ, Martin DB, Aebersold R (2003) Identification and quantification of N-linked glycoproteins using hydrazide chemistry, stable isotope labeling and mass spectrometry. Nat Biotechnol 21:660–666
5
The Application of High Throughput Mass Spectrometry
121
20. Khidekel N, Arndt S, Lamarre-Vincent N, Lippert A, Poulin-Kerstien KG, Ramakrishnan B, Qasba PK, Hsieh-Wilson LC (2003) A chemoenzymatic approach toward the rapid and sensitive detection of O-GlcNAc posttranslational modifications. J Am Chem Soc 125: 16162–16163 21. Saxon E, Bertozzi CR (2000) Cell surface engineering by a modified Staudinger reaction. Science 287:2007–2010 22. Sprung R, Nandi A, Chen Y, Kim SC, Barma D, Falck JR, Zhao Y (2005) Tagging-viasubstrate strategy for probing O-GlcNAc modified proteins. J Proteome Res 4:950–957 23. Khidekel N, Ficarro SB, Peters EC, Hsieh-Wilson LC (2004) Exploring the O-GlcNAc proteome: direct identification of O-GlcNAc-modified proteins from the brain. Proc Natl Acad Sci U S A 101:13132–13137 24. Khidekel N et al (2007) Probing the dynamics of O-GlcNAc glycosylation in the brain using quantitative proteomics. Nat Chem Biol 3:339–348 25. Vocadlo DJ, Hang HC, Kim EJ, Hanover JA, Bertozzi CR (2003) A chemical approach for identifying O-GlcNAc-modified proteins in cells. Proc Natl Acad Sci U S A 100:9116–9121 26. Prescher JA, Dube DH, Bertozzi CR (2004) Chemical remodelling of cell surfaces in living animals. Nature 430:873–837 27. Kho Y et al (2004) A tagging-via-substrate technology for detection and proteomics of farnesylated proteins. Proc Natl Acad Sci U S A 101:12479–12484 28. Zhao Y, Kwon SW, Anselmo A, Kaur K, White MA (2004) Broad spectrum identification of cellular small ubiquitin-related modifier (SUMO) substrate proteins. J Biol Chem 279: 20999–21002 29. Hagglund P, Bunkenborg J, Elortza F, Jensen ON, Roepstorff P (2004) A new strategy for identification of N-glycosylated proteins and unambiguous assignment of their glycosylation sites using HILIC enrichment and partial deglycosylation. J Proteome Res 3:556–566 30. Carr SA, Roberts GD (1986) Carbohydrate mapping by mass spectrometry: a novel method for identifying attachment sites of Asn-linked sugars in glycoproteins. Anal Biochem 157:396–406 31. Zhang Y, Wolf-Yadlin A, Ross PL, Pappin DJ, Rush J, Lauffenburger DA, White FM (2005) Time-resolved mass spectrometry of tyrosine phosphorylation sites in the epidermal growth factor receptor signaling network reveals dynamic modules. Mol Cell Proteomics 4: 1240–1250 32. Pan S, Zhang H, Rush J, Eng J, Zhang N, Patterson D, Comb MJ, Aebersold R (2005) High throughput proteome screening for biomarker detection. Mol Cell Proteomics 4:182–190 33. Wollscheid B, Bausch-Fluck D, Henderson C, O Brien R, Bibel M, Schiess R, Aebersold R, Watts JD (2009) Mass-spectrometric identification and relative quantification of N-linked cell surface glycoproteins. Nat Biotechnol 27:378–386 34. Kobata A, Endo T (1992) Immobilized lectin columns: useful tools for the fractionation and structural analysis of oligosaccharides. J Chromatogr 597:111–122 35. Harada H, Kamei M, Tokumoto Y, Yui S, Koyama F, Kochibe N, Endo T, Kobata A (1987) Systematic fractionation of oligosaccharides of human immunoglobulin G by serial affinity chromatography on immobilized lectin columns. Anal Biochem 164:374–381 36. Cummings RD, Kornfeld S (1982) Characterization of the structural determinants required for the high affinity interaction of asparagine-linked oligosaccharides with immobilized Phaseolus vulgaris leukoagglutinating and erythroagglutinating lectins. J Biol Chem 257:11230–11234 37. Hirabayashi J (2004) Lectin-based structural glycomics: glycoproteomics and glycan profiling. Glycoconj J 21:35–40 38. Kaji H et al (2003) Lectin affinity capture, isotope-coded tagging and mass spectrometry to identify N-linked glycoproteins. Nat Biotechnol 21:667–672 39. Kaji H, Yamauchi Y, Takahashi N, Isobe T (2006) Mass spectrometric identification of Nlinked glycopeptides using lectin-mediated affinity capture and glycosylation site-specific stable isotope tagging. Nat Protoc 1:3019–3027
122
S. Singh et al.
40. Jung K, Cho W, Regnier FE (2009) Glycoproteomics of plasma based on narrow selectivity lectin affinity chromatography. J Proteome Res 8:643–650 41. Kubota K et al (2008) Analysis of glycopeptides using lectin affinity chromatography with MALDI-TOF mass spectrometry. Anal Chem 80:3693–3698 42. Gallagher JT, Morris A, Dexter TM (1985) Identification of two binding sites for wheat-germ agglutinin on polylactosamine-type oligosaccharides. Biochem J 231:115–122 43. Jia W et al (2009) A strategy for precise and large scale identification of core fucosylated glycoproteins. Mol Cell Proteomics 8:913–923 44. Hirabayashi J, Hayama K, Kaji H, Isobe T, Kasai K (2002) Affinity capturing and gene assignment of soluble glycoproteins produced by the nematode Caenorhabditis elegans. J Biochem 132:103–114 45. Hemstrom P, Irgum K (2006) Hydrophilic interaction chromatography. J Sep Sci 29: 1784–1821 46. Thaysen-Andersen M, Thogersen IB, Nielsen HJ, Lademann U, Brunner N, Enghild JJ, Hojrup P (2007) Rapid and individual-specific glycoprofiling of the low abundance Nglycosylated protein tissue inhibitor of metalloproteinases-1. Mol Cell Proteomics 6:638–647 47. Thaysen-Andersen M et al (2008) Investigating the biomarker potential of glycoproteins using comparative glycoprofiling – application to tissue inhibitor of metalloproteinases-1. Biochim Biophys Acta 1784:455–463 48. Thaysen-Andersen M, Mysling S, Hojrup P (2009) Site-specific glycoprofiling of N-linked glycopeptides using MALDI-TOF MS: strong correlation between signal strength and glycoform quantities. Anal Chem 81:3933–3943 49. Zhang J, Wang DI (1998) Quantitative analysis and process monitoring of site-specific glycosylation microheterogeneity in recombinant human interferon-gamma from Chinese hamster ovary cell culture by hydrophilic interaction chromatography. J Chromatogr B Biomed Sci Appl 712:73–82 50. Takegawa Y, Deguchi K, Ito H, Keira T, Nakagawa H, Nishimura SI (2006) Simple separation of isomeric sialylated N-glycopeptides by a zwitterionic type of hydrophilic interaction chromatography. J Sep Sci 29:2533–2540 51. Takegawa Y, Deguchi K, Keira T, Ito H, Nakagawa H, Nishimura S (2006) Separation of isomeric 2-aminopyridine derivatized N-glycans and N-glycopeptides of human serum immunoglobulin G by using a zwitterionic type of hydrophilic-interaction chromatography. J Chromatogr A 1113:177–181 52. Takegawa Y, Ito H, Keira T, Deguchi K, Nakagawa H, Nishimura S (2008) Profiling of N- and O-glycopeptides of erythropoietin by capillary zwitterionic type of hydrophilic interaction chromatography/electrospray ionization mass spectrometry. J Sep Sci 31:1585–1593 53. Wuhrer M, Koeleman CA, Hokke CH, Deelder AM (2005) Protein glycosylation analyzed by normal-phase nano-liquid chromatography – mass spectrometry of glycopeptides. Anal Chem 77:886–894 54. Wuhrer M, de Boer AR, Deelder AM (2009) Structural glycomics using hydrophilic interaction chromatography (HILIC) with mass spectrometry. Mass Spectrom Rev 28: 192–206 55. Packer NH, Lawson MA, Jardine DR, Redmond JW (1998) A general approach to desalting oligosaccharides released from glycoproteins. Glycoconj J 15:737–747 56. Itoh S, Kawasaki N, Ohta M, Hyuga M, Hyuga S, Hayakawa T (2002) Simultaneous microanalysis of N-linked oligosaccharides in a glycoprotein using microbore graphitized carbon column liquid chromatography-mass spectrometry. J Chromatogr A 968:89–100 57. Alley WR Jr, Mechref Y, Novotny MV (2009) Use of activated graphitized carbon chips for liquid chromatography/mass spectrometric and tandem mass spectrometric analysis of tryptic glycopeptides. Rapid Commun Mass Spectrom 23:495–505 58. Larsen MR, Hojrup P, Roepstorff P (2005) Characterization of gel-separated glycoproteins using two-step proteolytic digestion combined with sequential microcolumns and mass spectrometry. Mol Cell Proteomics 4:107–119
5
The Application of High Throughput Mass Spectrometry
123
59. Sparbier K, Koch S, Kessler I, Wenzel T, Kostrzewa M (2005) Selective isolation of glycoproteins and glycopeptides for MALDI-TOF MS detection supported by magnetic particles. J Biomol Tech 16:407–413 60. Sparbier K, Wenzel T, Kostrzewa M (2006) Exploring the binding profiles of ConA, boronic acid and WGA by MALDI-TOF/TOF MS and magnetic particles. J Chromatogr B Analyt Technol Biomed Life Sci 840:29–36 61. Zhang L, Xu Y, Yao H, Xie L, Yao J, Lu H, Yang P (2009) Boronic acid functionalized coresatellite composite nanoparticles for advanced enrichment of glycopeptides and glycoproteins. Chemistry 15:10158–10166 62. Larsen MR, Jensen SS, Jakobsen LA, Heegaard NHH (2007) Exploring the sialiome using titanium dioxide chromatography and mass spectrometry. Mol Cell Proteomics 6:1778–1787 63. Lewandrowski U, Zahedi RP, Moebius J, Walter U, Sickmann A (2007) Enhanced Nglycosylation site analysis of sialoglycopeptides by strong cation exchange prefractionation applied to platelet plasma membranes. Mol Cell Proteomics 6:1933–1941 64. Alvarez-Manilla G, Atwood J, 3rd, Guo Y, Warren NL, Orlando R, Pierce M (2006) Tools for glycoproteomic analysis: size exclusion chromatography facilitates identification of tryptic glycopeptides with N-linked glycosylation sites. J Proteome Res 5:701–708 65. Morris HR (1980) Biomolecular structure determination by mass spectrometry. Nature 286:447–452 66. Barber M, Bordoli RS, Sedgwick RD, Tyler AN, Bycroft BW (1981) Fast atom bombardment mass spectrometry of bleomycin A2 and B2 and their metal complexes. Biochem Biophys Res Commun 101:632–638 67. Fenn JB, Mann M, Meng CK, Wong SF, Whitehouse CM (1989) Electrospray ionization for mass spectrometry of large biomolecules. Science 246:64–71 68. Lane CS (2005) Mass spectrometry-based proteomics in the life sciences. Cell Mol Life Sci 62:848–869 69. Papac DI, Wong A, Jones AJ (1996) Analysis of acidic oligosaccharides and glycopeptides by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Anal Chem 68:3215–3223 70. Hoffmann Ed., Stroobant V (2007) Mass spectrometry principles and applications. John Wiley & Sons Ltd, New York 71. Schaeffer-Reiss C (2008) A brief summary of the different types of mass spectrometers used in proteomics. Methods Mol Biol 484:3–16 72. Roepstorff P, Fohlman J (1984) Proposal for a common nomenclature for sequence ions in mass spectra of peptides. Biomed Mass Spectrom 11:601 73. Medzihradszky KF, Gillece-Castro BL, Settineri CA, Townsend RR, Masiarz FR, Burlingame AL (1990) Structure determination of O-linked glycopeptides by tandem mass spectrometry. Biomed Environ Mass Spectrom 19:777–781 74. Hirayama K, Yuji R, Yamada N, Kato K, Arata Y, Shimada I (1998) Complete and rapid peptide and glycopeptide mapping of mouse monoclonal antibody by LC/MS/MS using ion trap mass spectrometry. Anal Chem 70:2718–2725 75. Bateman KP, White RL, Thibault P (1998) Evaluation of adsorption preconcentration/capillary zone electrophoresis/nanoelectrospray mass spectrometry for peptide and glycoprotein analyses. J Mass Spectrom 33:1109–1123 76. Hui JP, White TC, Thibault P (2002) Identification of glycan structure and glycosylation sites in cellobiohydrolase II and endoglucanases I and II from Trichoderma reesei. Glycobiology 12:837–849 77. Zeng R, Xu Q, Shao XX, Wang KY, Xia QC (1999) Characterization and analysis of a novel glycoprotein from snake venom using liquid chromatography-electrospray mass spectrometry and Edman degradation. Eur J Biochem 266:352–328 78. Zhu X, Borchers C, Bienstock RJ, Tomer KB (2000) Mass spectrometric characterization of the glycosylation pattern of HIV-gp120 expressed in CHO cells. Biochemistry 39: 11194–11204
124
S. Singh et al.
79. Macek B, Hofsteenge J, Peter-Katalinic J (2001) Direct determination of glycosylation sites in O-fucosylated glycopeptides using nano-electrospray quadrupole time-of-flight mass spectrometry. Rapid Commun Mass Spectrom 15:771–777 80. McLafferty FW, Horn DM, Breuker K, Ge Y, Lewis MA, Cerda B, Zubarev RA, Carpenter BK (2001) Electron capture dissociation of gaseous multiply charged ions by Fourier-transform ion cyclotron resonance. J Am Soc Mass Spectrom 12:245–249 81. Zubarev RA, Horn DM, Fridriksson EK, Kelleher NL, Kruger NA, Lewis MA, Carpenter BK, McLafferty FW (2000) Electron capture dissociation for structural characterization of multiply charged protein cations. Anal Chem 72:563–573 82. Hakansson K, Cooper HJ, Emmett MR, Costello CE, Marshall AG, Nilsson CL (2001) Electron capture dissociation and infrared multiphoton dissociation MS/MS of an Nglycosylated tryptic peptic to yield complementary sequence information. Anal Chem 73:4530–4536 83. Mirgorodskaya E, Roepstorff P, Zubarev RA (1999) Localization of O-glycosylation sites in peptides by electron capture dissociation in a Fourier transform mass spectrometer. Anal Chem 71:4431–4436 84. Hakansson K, Emmett MR, Hendrickson CL, Marshall AG (2001) High-sensitivity electron capture dissociation tandem FTICR mass spectrometry of microelectrosprayed peptides. Anal Chem 73:3605–3610 85. Hogan JM, Pitteri SJ, Chrisman PA, McLuckey SA (2005) Complementary structural information from a tryptic N-linked glycopeptide via electron transfer ion/ion reactions and collision-induced dissociation. J Proteome Res 4:628–632 86. Wuhrer M, Catalina MI, Deelder AM, Hokke CH (2007) Glycoproteomics based on tandem mass spectrometry of glycopeptides. J Chromatogr B Analyt Technol Biomed Life Sci 849:115–128 87. Catalina MI, Koeleman CA, Deelder AM, Wuhrer M (2007) Electron transfer dissociation of N-glycopeptides: loss of the entire N-glycosylated asparagine side chain. Rapid Commun Mass Spectrom 21:1053–1061 88. Wiesner J, Premsler T, Sickmann A (2008) Application of electron transfer dissociation (ETD) for the analysis of posttranslational modifications. Proteomics 8:4466–4483 89. Good DM, Wirtala M, McAlister GC, Coon JJ (2007) Performance characteristics of electron transfer dissociation mass spectrometry. Mol Cell Proteomics 6:1942–1951 90. Picariello G, Ferranti P, Mamone G, Roepstorff P, Addeo F (2008) Identification of N-linked glycoproteins in human milk by hydrophilic interaction liquid chromatography and mass spectrometry. Proteomics 8:3833–3847 91. Gonzalez J, Takao T, Hori H, Besada V, Rodriguez R, Padron G, Shimonishi Y (1992) A method for determination of N-glycosylation sites in glycoproteins by collision-induced dissociation analysis in fast atom bombardment mass spectrometry: identification of the positions of carbohydrate-linked asparagine in recombinant alpha-amylase by treatment with peptide-N-glycosidase F in 18O-labeled water. Anal Biochem 205:151–158 92. Hagglund P, Matthiesen R, Elortza F, Hojrup P, Roepstorff P, Jensen ON, Bunkenborg J (2007) An enzymatic deglycosylation scheme enabling identification of core fucosylated N-glycans and O-glycosylation site mapping of human plasma proteins. J Proteome Res 6:3021–3031 93. Carr SA, Huddleston MJ, Bean MF (1993) Selective identification and differentiation of Nand O-linked oligosaccharides in glycoproteins by liquid chromatography-mass spectrometry. Protein Sci 2:183–196 94. Huddleston MJ, Bean MF, Carr SA (1993) Collisional fragmentation of glycopeptides by electrospray ionization LC/MS and LC/MS/MS: methods for selective detection of glycopeptides in protein digests. Anal Chem 65:877–884 95. Annan RS, Carr SA (1997) The essential role of mass spectrometry in characterizing protein structure: mapping posttranslational modifications. J Protein Chem 16:391–402 96. Sullivan B, Addona TA, Carr SA (2004) Selective detection of glycopeptides on ion trap mass spectrometers. Anal Chem 76:3112–3118
5
The Application of High Throughput Mass Spectrometry
125
97. Adamson JT, Hakansson K (2006) Infrared multiphoton dissociation and electron capture dissociation of high-mannose type glycopeptides. J Proteome Res 5:493–501 98. Jiang H, Desaire H, Butnev VY, Bousfield GR (2004) Glycoprotein profiling by electrospray mass spectrometry. J Am Soc Mass Spectrom 15:750–758 99. Gadgil HS, Bondarenko PV, Treuheit MJ, Ren D (2007) Screening and sequencing of glycated proteins by neutral loss scan LC/MS/MS method. Anal Chem 79:5991–5999
Chapter 6
Solutions to the Glycosylation Problem for Low- and High-Throughput Structural Glycoproteomics Simon J. Davis and Max Crispin
Abstract N- and O-glycosylation profoundly affect the biological properties of glycoproteins, principally by influencing their structures and cellular trafficking, and by forming the recognition sites of carbohydrate-binding ligands. For crystallographers interested in studying the protein component of glycoproteins, the two most important aspects of glycosylation are (1) that it is often essential for the correct folding of a given protein and for ensuring its solubility, which generally necessitates expression of the molecule in eukaryotic cells, and (2) that there are now procedures for the efficient post-folding removal of N-linked glycans from glycoproteins and for minimizing the effects of O-glycosylation, which will generally benefit crystallogenesis. We provide an overview of how glycans influence glycoprotein folding and then identify the sources of structural heterogeneity at the heart of the ‘glycosylation problem’. We then discuss the options available to structural biologists for circumventing the problems associated with protein N- and O-glycosylation. Our emphasis is on methods for producing glycoproteins with homogeneous and/or removable N-glycosylation in mammalian cells that can be implemented in both very high yield, stable expression systems and in a high throughput format based on transient expression protocols. We also consider whether deglycosylation reduces protein stability and end by emphasizing the importance of using rigorous stereochemical and biosynthetic data when building glycosylation into partial or complete electron density. Keywords Glycosylation · Endoglycosidases · Mammalian expression systems · Structural biology · High throughput
S.J. Davis (B) Nuffield Department of Clinical Medicine and MRC Human Immunology Unit, Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DS, UK e-mail:
[email protected] R.J. Owens, J.E. Nettleship (eds.), Functional and Structural Proteomics of Glycoproteins, DOI 10.1007/978-90-481-9355-4_6, C Springer Science+Business Media B.V. 2011
127
128
S.J. Davis and M. Crispin
Abbreviations CHO CNX CRT Endo ER GalNAc GlcNAc GnT I HEK IL IgSF MALDI-TOF MS NB-DNJ PDB PNGase F s S2 SG STP TCR UGGT
Chinese hamster ovary calnexin calreticulin endoglycosidase endoplasmic reticulum α-N-acetylgalactosamine N-acetylglucosamine β1−2 N-acetylglucosamine transferase I human embryonic kidney interleukin immunoglobulin superfamily matrix-assisted laser desorption/ionization-time of flight mass spectrometry N-butyldeoxynojirimycin Protein Data Bank peptide-N-glycosidase F soluble Schneider 2 structural genomics serine-, threonine- and proline-rich T-cell receptor UDP-glucose glycoprotein:glucosyltransferase
Contents 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 6.2 Glycosylation and Protein Folding . . . . . . . . . . . . . 6.3 The “Glycosylation Problem” . . . . . . . . . . . . . . . 6.3.1 O-Glycosylation . . . . . . . . . . . . . . . . . 6.3.2 N-Glycosylation . . . . . . . . . . . . . . . . . 6.4 Solving the Glycosylation Problem . . . . . . . . . . . . 6.4.1 Shielding N-Linked Glycans from Lattice Contacts . . 6.4.2 Depletion of Glycosylation by Sequon Mutation . . . 6.4.3 Deglycosylation with peptide-N-glycosidase . . . . . 6.4.4 Exoglycosidase Treatment . . . . . . . . . . . . . 6.4.5 N-Glycan Remodeling and Endoglycosidase Treatment 6.5 Does Glycan Removal Affect Protein Function? . . . . . . 6.6 Putting the Sugar Back . . . . . . . . . . . . . . . . . . 6.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
129 130 132 132 133 136 136 137 138 138 140 147 148 150 151
6
Solutions to the Glycosylation Problem
129
6.1 Introduction The surfaces of eukaryotic cells are covered by a diverse array of complex carbohydrates that mediate a range of biological functions. These covalent posttranslational modifications can prevent the formation of crystals necessary for the structural analysis of glycoproteins by X-ray diffraction. This is because the crystallization of macromolecules is generally hindered by structural and conformational disorder. Even small regions of high mobility in protein loops can prevent the identification of crystallization conditions [1]. In the case of glycoproteins, this heterogeneity derives largely from the carbohydrate surrounding the protein core. Carbohydrate groups, or glycans, can occupy an equivalent volume to the protein domain to which they are attached for domains of up to ∼100 amino acids. The barrier to crystallization imposed by glycosylation applies to many biologically important surface-expressed proteins: virus envelopes, membrane transport proteins, and cellsignaling receptors, including those involved in the neurological and immunological synapses. Indeed, glycosylation is a posttranslational modification affecting over half of all proteins [2] and can be essential for correct protein folding and stability. The problem of glycosylation is almost certainly among the major factors resulting in membrane proteins constituting less than 1% of the structures in the Protein Data Bank (PDB) [3]. As has now been confirmed by quantitative analysis of the performance of structural genomics (SG) pipelines, in general the key hurdles to obtaining a new protein structure occur at the steps between the expression and purification of a protein and that between obtaining the purified protein and crystallizing it [4]. Virtually all of the glycoproteins of higher eukaryotes, and particularly those of medical interest, are translated in the lumen of the endoplasmic reticulum (ER) favouring disulphide bond formation when this is required, and are thereafter directed to secretory pathways or expressed at the cell surface. Heterologous protein production in mammalian expression systems has the advantage that such proteins are able to fold under largely native conditions and can be purified from the extracellular milieu if they can be expressed in soluble forms. Most proteins in this class would otherwise have to be refolded from denatured precipitates produced in bacteria. As we will discuss, a number of methods became established for the post-folding removal of the glycosylation from recombinant glycoproteins expressed stably in mammalian cells [5–7]. With the advent of structural genomics, however, the need arose to have parallel methods that could be implemented in a high throughput setting, with the expectation that this might help relieve both of the bottlenecks at the protein expression and crystallization stages of structure determination experienced in SG pipelines. A little-heralded advantage of the SG approach is that, in addition to the rapid analysis of proteins that can be expressed and crystallized readily, the so-called “low-hanging fruit” [8], the opportunity afforded by robotics and miniaturization has made it possible to address very difficult cases, using a variety of approaches in parallel. A recent example from our laboratories involves a large, very heavily glycosylated cell-surface protein for which it has been necessary to prepare
130
S.J. Davis and M. Crispin
>50 different forms of the protein in order to obtain well-diffracting crystals (VT Chang, AR Aricescu, EY Jones and SJD, unpublished data). Taking advantage of these opportunities necessitated the development of approaches based on transient rather than stable protein expression [9]. The development of tools for producing deglycosylatable protein in transient mammalian expression systems coincided with great improvements in the yields, efficiency and scalability of transient expression protocols. Due to the advent of new episomal expression vectors, transfection protocols and multiplex, deep-well tissue culture methods, the scalability of transient expression in mammalian cells is now comparable to that of high throughput bacterial systems [10–13]. We present an overview of the problems posed by the glycosylation of cell surface and secreted proteins for structural analysis, drawing on our own experience of them, and describe the development of methods for circumventing these issues in both a stable expression context and, more recently, in transient expression systems. We start with a discussion of the impact of glycosylation on protein folding.
6.2 Glycosylation and Protein Folding There are two principal modes of glycosylation that occur in eukaryotic proteins: the glycans are attached via Asn residues or via Ser and Thr residues. These processes are termed N-and O-linked glycosylation, respectively. N-linked glycosylation occurs in the ER and has a major role in early protein folding. O-linked glycans are attached in the Golgi apparatus and therefore have little if any bearing on glycoprotein folding, except insofar as they influence the overall shape and dimensions of a molecule. In contrast to O-linked glycosylation, which tends to cluster in regions rich in serine, threonine and proline, the sites of potential N-linked glycosylation are well defined by a consensus motif. N-linked glycosylation occurs co-translationally by the recognition of a glycosylation sequon by the oligosaccharyltransferase complex, defined as Asn-X-Ser/Thr/Cys-X, where X is generally any amino acid except proline [14, 15]. The actual sequence has a bearing on the occupancy of a glycosylation sequon. For example, occupancy of the Asn-X-Cys sequon in the human epidermal growth factor receptor is known to be very low [15]. N-linked glycosylation is closely linked to protein folding and secretion [16]. At one level, glycans have a direct structural role by forming intramolecular glycanprotein interactions. Such stabilising interactions often shield hydrophobic residues on the protein surface and have important consequences for the conformational and chemical structure of a glycan at a given site, which can influence the design of crystallization strategies. However, glycosylation also plays a central role in folding per se insofar as it mediates the interaction of nascent glycoproteins with lectin-type chaperones. The chaperones of eukaryotic cells are generally highly conserved, which allows many, but not all, glycoproteins from higher eukaryotes to be expressed in lower-eukaryotic expression systems. Exceptions known to the authors are members of the immunolglobulin superfamily, such as CD48 and LFA-3, which
6
Solutions to the Glycosylation Problem
131
could be expressed at high levels but failed to fold detectably in Pichia pastoris (SJD, unpublished data). The translation and translocation of nascent glycoproteins into the ER is coupled through the ribosome-sec61 complex. Cryo-electron microscopy analysis of translating versus non-translating ribosomes suggests that nascent polypeptide chains start to fold on the ribosome [17]. N-linked glycosylation occurs via the co-translational transfer of Glc3 Man9 GlcNAc2 to the asparagine residue of the sequon. Completion of the folding process can take considerable time. For example, although initial folding of the HIV-1 envelope glycoprotein gp160 is achieved within approximately five minutes, the glycoprotein remains in the ER for several hours whilst undergoing further folding involving disulphide bond isomerisation and folding-dependent leader sequence cleavage [18]. A nascent, incompletely folded protein is able to recruit numerous chaperones. For example, thyroglobulin folds in complex with BiP/Hsp 70, which is also associated with Grp94, Grp170, ERp72 and protein disulphide isomerase [19]. The N-linked glycan of nascent glycoproteins recruits the lectin-type chaperones calnexin (CNX) and calreticulin (CRT), which are capable of recruiting additional chaperones such as ERp57. Following glycosylation by the oligosaccharyltransferase, the Glcα1-2Glcα1-3Glcα1-3 cap of the glycans is hydrolysed by α-glucosidase I and II, which act on the α1-2 and α1-3 linked groups, respectively. This leaves a monoglucosylated structure that comprises the recognition signal for both the membrane bound CNX and the soluble CRT. CNX is closely associated with the translocon as demonstrated by the rapid binding of hemagglutinin to CNX following the translocation of as few as thirty amino-acid residues [20]. The central role of CNX and CRT in glycoprotein folding is highlighted by the increased expression of these chaperones by B-cells in preparation for increased antibody expression [21]. Glucosidase II can liberate the glycoprotein from the CNX/CRT complexes allowing fully folded glycoproteins to exit the ER. However, incompletely folded glycoproteins may be retained for further lectin-mediated folding events upon reglucosylation by UDP-glucose glycoprotein:glucosyltransferase (UGGT) at N-linked glycosylation sites distal to the misfolded site [22]. UGGT determines the folding status of a glycoprotein via the exposure in misfolded glycoproteins of hydrophobic regions normally buried in the native fold [23]. The ER-associated degradation pathway eliminates irretrievably misfolded glycoproteins. The presence of Glc1 Man8 GlcNAc2 on glycoproteins unable to complete folding is recognized by ER degradation-enhancing α-mannosidase-like protein facilitating transport of the glycoprotein out of the ER for degradation. Fully folded glycoproteins cannot be reglucosylated by UGGT and therefore exit the ER and traffic to the Golgi apparatus. Endomannosidase shunt pathways can in some cells rescue glycoproteins trapped in the CNX/CRT cycle. Whilst the majority of mammalian cells contain active endomannosidase, Chinese hamster ovary (CHO) cells appear to belong to a small subset of cells completely lacking in this activity. In exceptional circumstances, monoglucosylated glycans can be retained in fully folded glycoproteins when a glycosylation site is substantially protected by the protein [24, 25].
132
S.J. Davis and M. Crispin
6.3 The “Glycosylation Problem” Whereas chemical and conformational heterogeneity has relatively little impact on solution nuclear magnetic resonance-based experiments, it is anathema to crystallography because it prevents the formation of reproducible lattice contacts and therefore crystals. Strategies have emerged to promote crystallogenesis through either shielding or reducing surface disorder. Antibody fragments can be utilized in co-crystallization trials in the hope that they might dominate lattice interactions [26]. Alternatively the protein can be engineered to eliminate flexible regions, or point mutations can be introduced to convert highly mobile side-chains, such as that of lysine, to more structurally ordered ones [27]. Similarly, surface entropy can be reduced, and crystallogenesis promoted, by the reductive methylation of lysines [28]: reductive methylation rescued 7% of a test set of 360 targets otherwise resistant to crystallization [29]. The extreme heterogeneity of the native glycans that often coat the surface of glycoproteins comprises the “glycosylation problem”. The approaches taken to minimize its impact on crystallization have parallels with the entropy reduction techniques applied to lysine. However, the impact of carbohydrate on the protein surface is generally much more significant than for lysine. Each N-glycan chain can potentially be several thousand Daltons in mass, exhibit tremendous internal and global flexibility, and occlude a large percentage of the total surface area. Furthermore, each glycosylation site can, especially in glycoproteins from mammalian expression sources, contain several hundred chemically distinct oligosaccharide structures, termed “glycoforms”. Finally, each glycoform may be heterogenous not only in mass and structure, but also charge.
6.3.1 O-Glycosylation The crystallographer interested in glycoproteins will generally have to contend at some point with both N- and O-linked glycans. O-linked glycosylation occurs in the Golgi via the sequential addition of monosaccharides to serine or threonine residues clustered in serine-, threonine- and proline-rich (STP) or “mucin-like” domains [30]. O-linked glycosylation occurring outside of STP domains cannot presently be accurately predicted and is not specifically addressed in this article. In higher eukaryotic expression systems, O-glycosylation is initiated by the transfer of α-N-acetylgalactosamine (GalNAc) to the Ser or Thr residues, which is subsequently elaborated by a wide range of glycosyltransferases leading to tremendous glycan heterogeneity. STP domains required only to project glycoproteins such as CD8 [31] and CD55 [32] from the cell surface are usually deleted from expression constructs for crystallography. The high proline content of STP domains makes them readily identifiable by disorder prediction algorithms, such as RONN [33], and the O-link prediction server, NetOGlyc [30]. However, the structure of C-cadherin confirms that O-linked glycosylation can occur outside classic disordered STP-domains, in domains with loops rich in proline [34]. This type
6
Solutions to the Glycosylation Problem
133
of intradomain O-glycosylation is, however, rare compared to STP O-linked glycosylation [30], and negligible compared to that of N-linked glycosylation [35]. The “glycosylation problem” is for the most part, therefore, restricted to N-linked glycans. In the rare cases where they cannot simply be removed from expression constructs, for example when required to obtain usable levels of expression, STP domains pose significant challenges to crystallization since they are most likely to form extended, somewhat flexible structures [31]. To overcome such problems, Leahy et al. [36] relied on post-purification treatments of the STP domain to crystallize the globular immunoglobulin domains of CD8αα, that utilized neuraminidase, core 1 O-glycanase and Staphylococcal V8 protease. Such a strategy is likely to be effective only for the subset of STP domains containing sialylated core 1 structures (Galβ1−3GalNAc).
6.3.2 N-Glycosylation An important consideration in tackling the glycosylation problem is that the different eukaryotic expression systems available to crystallographers, such as mammalian, insect and yeast expression systems, exhibit different N-glycosylation profiles. All eukaryotic expression systems share a highly conserved N-linked glycosylation pathway in the ER and early Golgi apparatus, presumably reflecting the important role of glycosylation in protein folding. Understanding the conserved features of the enzymology of glycosylation and the key differences between the common expression systems may not only guide approaches to the remodeling of glycosylation to promote crystallogenesis, but will assist the accurate model building and interpretation of electron density in the absence of detailed site-specific chemical characterization of the carbohydrate moiety [37]. 6.3.2.1 Chemical Heterogeneity Of N-Linked Glycans Higher eukaryotic expression systems are characterized by the significant chemical diversity of their glycans (Fig. 6.1a). Following folding, yeast oligomannose-type glycans are extended by mannosyltransferases to form high-mannose N-linked glycans. In insect and mammalian eukaryotic expression systems the terminal α1−2 mannose residues are trimmed by ER α-mannosidase I and Golgi α-mannosidase IA, IB and IC to give Man5 GlcNAc2 [38–40]. The subsequent transfer of β1−2 N-acetylglucosamine (GlcNAc) by GlcNAc transferase I (GnT I) forms so-called “hybrid-type” glycans which are acted upon by numerous Golgi-resident glycosyltransferases. In higher eukaryotes, these include β-galactosyltransferase, GnT III, fucosyltransferase VIII, and Golgi α-mannosidase II [41]. Golgi α-mannosidase II catalyzes the formation of monoantennary glycans by the cleavage of the two terminal α-mannose residues of the 6-arm of the trimannosyl core [42]. In higher eukaryotes these structures are subsequently elaborated to form highly heterogenous “complex-type” glycans. The antennae are initially extended with galactose. Such Galβ1−3/4GlcNAc structures can be further elaborated with moieties such as
Fig. 6.1 The glycosylation problem. Glycosylation exhibits chemical and conformational heterogeneity. (a) MALDI-TOF MS of the N-linked glycans from the extracellular region of a mammalian type-1 membrane glycoprotein (19A) transiently expressed in HEK 293T cells [9, 143]. (b) Overlay of different conformations of an N-linked glycan revealing flexibility about the glycosidic linkages [144]. (c) Molecular model of a prion glycoprotein displaying the conformational freedom of the two N-linked glycans (yellow and green) projected away from the underlying protein core (grey). The glycosylphosphatidylinositol anchor is shown in blue. The key to the symbols used in panel a is presented in Fig. 6.2 (panel a was provided courtesy of Prof. David J. Harvey, University of Oxford; Panel b is adapted from Frank et al [144]; Panel c is courtesy of Dr Mark R. Wormald, University of Oxford)
134 S.J. Davis and M. Crispin
6
Solutions to the Glycosylation Problem
135
a
2DTQ
2HCZ
1P8J
b
1S4P
1JUH
2B8H
Key: xylose
N-acetylglucosamine
galactose
fucose
mannose
6 4 3
1
α-linkage β-linkage
2
Fig. 6.2 Ordered glycans in crystal structures. Structures of well-ordered N-linked glycans displayed as sticks (oxygen, red; carbon, green; nitrogen, blue) with the associated 2Fo –Fc electron density map contoured at 1σ and shown as a blue mesh. Glycans are of complex- (a) or oligomannose- (b) type. Cartoon representations of the glycan structures are depicted below each glycan. The key to the nomenclature is presented in the bottom panel and uses the angle between monosaccharide symbols to indicate linkage position [145]. The accession codes for the corresponding PDB entries are indicated; 2DTQ [146]; 2HCZ [147]; 1P8J [148]; 1S4P [96]; 2B8H [149]; 1JUH [152]. Molecular graphics were prepared using PyMol
fucose, sialic acid, sulphate and GalNAc, some of which make up the classic blood group antigens. Thus, as a glycoprotein progresses through the Golgi apparatus, the competing actions of both lumenal and membrane-associated glycosyltransferases give rise to a vast range of final glycoforms. For example, at least 123 glycan structures were identified at the single N-linked glycosylation site of human erythrocyte CD59 [43]. Insect expression systems such as D. melanogaster and S. frugiperda yield much simpler glycosylation: following the actions of Golgi α-mannosidase II a membrane-bound β-hexosaminidase rapidly cleaves the β1−2 GlcNAc resulting in compact paucimannose-type glycans such as fucosylated Man3 GlcNAc2 [44, 45].
136
S.J. Davis and M. Crispin
6.3.2.2 Conformational Heterogeneity of N-Linked Glycans Even homogenous glycoproteins with a single glycoform may be recalcitrant to crystallization due to the extensive conformational disorder of the glycan. The conformational space explored by a glycan is a product of both the internal flexibility (Fig. 6.1b) and the degree of orientation (if any) imposed by intramolecular glycan-protein interactions (Fig. 6.1c). NMR structures of N-linked glycans have revealed that the overall topology of a glycan can be relatively well defined despite a high degree of internal flexibility [46]. Furthermore, some motifs within a glycan can be very well ordered. For example, the structure of sialyl Lewis x, NeuNAcα2−3Galβ1−4[Fucα1−3]GlcNAc, is ordered by hydrophobic packing between the α1−3 linked fucose and the galactose [47]. Small conformational changes of the GlcNAc-Asn bond may nevertheless result in positional changes at the non-reducing termini of glycans, leading to dramatic conformational heterogeneity (Fig. 6.1c). Notwithstanding some exceptions, large glycans are rarely observed in crystal contacts presumably due to the entropic penalties associated with their immobilisation. The majority of glycans observed in crystal structures are stabilized by existing intramolecular glycan-protein interactions. For example, in the crystal structure of Sf9 cell-expressed CD26 (dipeptidylpeptidase IV) in complex with adenosine deaminase, the N-linked glycan at Asn229 of CD26 lies along the dimer interface with a hydrogen bond from the central β-mannose to the backbone of adenosine deaminase [48]. Consequently, Manα1−6Manβ1−4GlcNAcβ1−4GlcNAcβ1−Asn229 is well ordered. Although examples of complex-type glycans observed in crystal structures of glycoproteins are known (Fig. 6.2a), such extreme examples of ordered complex-type glycans as those shown are rare, with most participating in extensive lattice contacts. Many observable glycans are of the oligomannose-type, implying that the same contacts that stabilize the glycan also inhibit its ER and Golgi processing (Fig. 6.2b). Many partial glycans observable in crystal structures involve contacts with the surface of the protein, particularly with the first GlcNAc at the reducing terminus of the glycan [35, 49]. The “glycosylation problem” is compounded not only by the reliance of the majority of glycoproteins on N-linked glycans for correct protein folding at the ER calnexin/calreticulin checkpoint, but by their direct effects on protein folding and stabilization [50]. Together, these factors often preclude the bacterial expression of glycoproteins, and limit the success of refolding strategies.
6.4 Solving the Glycosylation Problem 6.4.1 Shielding N-Linked Glycans from Lattice Contacts A key objective of any attempt to improve the crystallizability of a macromolecule is a reduction in surface disorder. By minimizing surface disorder, the entropic penalties associated with crystal contact formation are reduced. This can be achieved either by shielding the disordered region or reducing its disorder. The generation of complexes with functionally relevant binding partners is particularly
6
Solutions to the Glycosylation Problem
137
attractive because it yields new biological insights along with shielding disordered regions. Greater flexibility is, however, afforded by the use of antibody Fab fragments reactive with the target. The effect of Fabs, which are almost invariably unglycosylated, is to reduce the proportion of glycosylated surface area. Fab-Fab lattice contacts may therefore dominate over glycoprotein-glycoprotein interactions, particularly if the target is small relative to the Fab. For example, Evans et al. [26] used a Fab from a mitogenic antibody to crystallize the immunoglobulin domain of CD28. In the crystals CD28 homodimerized and the homodimer was suspended within the lattice by the Fab fragments, which formed the great majority of the contacts. Without these contributions the crystal lattice would have had to accommodate the ten N-linked glycans attached to the homodimer. A major drawback to the Fabbased approach to crystallization is the requirement to produce the Fab. Moreover, Fab complex formation may not completely eliminate the inherent disorder of the target and may be insufficient for the generation of high-quality crystals.
6.4.2 Depletion of Glycosylation by Sequon Mutation Point mutations allow the elimination of chemical heterogeneity at source. For example, heterogeneity in the human EPO receptor expressed in P. pastoris resulted from glycosylation at N52 and via isoformylization of the Asn164-Gly165 peptide bond, both of which were eliminated by N52Q and N164Q mutations, allowing structural determination of the EPO-receptor complex to 1.9 Å [51]. Similarly, a series of glycosylation mutants of the ADP-ribosyl cyclase, CD157, were screened for functionality and crystallizability and only the mutant lacking the most C-terminal glycan, of four, was functional and could be crystallized [52, 53]. Other examples of proteins whose crystallization was aided by the deletion of glycosylation sites include human butyrylcholin-esterase [54], human testis angioconverting enzyme [55], carboxypeptidase from yeast [56], and rat procathepsin B [57]. The screening of glycosylation mutants is labour intensive and many glycoproteins are likely to require glycosylation at one sequon at least for folding, meaning that all the glycosylation sites cannot necessarily be deleted. By definition, the existence of variable occupancy at a particular site indicates that glycosylation of this site is not essential for protein folding [58–60]. Mutation of a variably occupied glycosylation sequon is thus unlikely to affect protein folding due to loss of the glycan, allowing such sites to be deleted without unduly affecting expression. It has recently been shown that these sites can be detected by liquid chromatography electrospray ionization mass spectrometry [61]. This involves treating the glycoprotein with peptide N-glycosidase F (PNGase F), which removes glycans from N-glycosylated proteins and peptides, in the process converting the asparagine into an aspartic acid residue and increasing the mass of the fragment by 1 Da. Combined with proteolytic digestion, this mass conversion allows the identification of glycan-modified oligopeptides [61]. Alternatively, identification of the degree of conservation of the glycosylation sites combined with the use of occupancy prediction servers (http://www.cbs.dtu.dk/services/NetNGlyc/) can be used to identify
138
S.J. Davis and M. Crispin
and mutate glycosylation sites likely to be variably occupied. However, such glycosylation mutants may disrupt protein folding through new glycan-independent structural effects. As discussed in Section 6.4.5, procedures are now in place for reducing individual N-linked glycans to single GlcNAc residues [5–7, 9]. However variable site occupancy resulting in heterogeneity in the numbers of single GlcNAc residues remaining at each site may still be a problem. Therefore, in some cases it may even be necessary to eliminate this level of heterogeneity by mutating the variably occupied glycosylation acceptor site.
6.4.3 Deglycosylation with peptide-N-glycosidase PNGase F is commonly used to completely remove mammalian N-linked glycans. PNGase F hydrolyses the secondary amide bond that links N-linked glycans to Asn and thus differs from the endoglycosidases that cleave glycosidic linkages. Under denaturing conditions, PNGase F is effective at removing N-linked glycans that do not contain core α1−3 fucose, which occur in glycans from both plants and insects. However, the differential presentation of glycosylation sites on the surface of a mature, folded glycoprotein results in a significant number of glycosylation sites that are inaccessible to PNGase F hydrolysis as the enzyme requires extended peptide substrates [62]. For example, despite extensive treatment of bovine amine oxidase with PNGase F and a number of exoglycosidases, electron density corresponding to the reducing termini of all three glycosylation sites was observed [63, 64], indicating that PNGase F treatment had been ineffective. This inherent limitation led to the inclusion of 1.5 M urea in an effort to deglycosylate the Ebola virus surface glycoprotein [65]. Furthermore, many N-linked glycosylation sites are surrounded by hydrophobic surface patches that are stabilized by the interaction with the hydrophobic face of the core GlcNAc created by the axial C−H groups. In our experience, in contrast to the behaviour of endoglycosidase (Endo) H-treated protein (Fig. 6.3a; see below) the introduction by PNGase F of a point charge at glycosylation sites owing to the glycan-Asn/Asp transition can often be destabilising and result in aggregated protein (Fig. 6.3b [6]). Despite its lack of generality, deglycosylation with PNGase F has been successfully employed in the generation of a number of crystal structures, including interleukin (IL)-5 [66], IL-10:shIL-10R1 and vIL10:shIL-10R1 [67], human α-thrombin-thrombomodulin complex [68], lactoferrin [69, 70], laccase [71], and the lectin and EGF domains of E-selectin [72]. PNGase F-mediated deglycosylation of transmembrane proteins, such as the human erythrocyte anion-exchanger membrane domain [73] and the water channel, aquaporin-1 [74, 75], has also facilitated their crystallization.
6.4.4 Exoglycosidase Treatment It has long been appreciated that enzymatic remodeling of glycoproteins with exoglycosidases can promote crystallization [69]. Exoglycosidases are specific for a
6
Solutions to the Glycosylation Problem
139
a
b 31
34
Absorbance (280 nm)
Absorbance (280 nm)
Fn no: 28
43 kD
0.6
30 kD
0.4 0.2
Fn no: 21 24 27 30 33 36
0.8
43 kD
0.6
20 kD
0.4 0.2 0
0 18
24
30
Fraction number
36
18
24
30
36
Fraction number
Fig. 6.3 PNGase F-, but not Endo H-mediated deglycosylation leads to protein aggregation. Human sCD58 expressed in CHO-K1 cells in the presence of NB-DNJ was digested with Endo H (a) and PNGase F (b) overnight at 37◦ C and pH 5.2 and pH 8, respectively. Both mixes were concentrated and analyzed by Sephacryl S-100 size-exclusion chromatography. Virtually all of the PNGase F-deglycosylated material formed large aggregates that eluted with the void volume, whereas the Endo H-treated material eluted at the positions expected for variably glycosylated monomers
unique, or a very narrow range of glycosidic linkages at the non-reducing termini of glycans. Exoglycosidases have the advantage that the exposure of underlying hydrophobic surface areas can be limited. Moreover, exoglycosidases can be used to control the length of the associated glycans. This approach has been used to demonstrate the structural impact of IgG Fc glycosylation, via the generation and crystallization of a panel of IgG Fc domains with sequential truncations of the Asn297 oligosaccharide [76]. Exoglycosidase trimming of glycoproteins has facilitated the determination of several glycoprotein structures. The structures of both human chorionic gonadotropin [77] and the atrial natriuretic peptide receptor [78] were determined following neuraminidase treatment. Similarly, the crystallization of human acid β-glucocerebrosidase was achieved following treatment with neuraminidase, β-galactosidase and N-acetylglucosaminidase to expose the underlying mannose residues. The use of exoglycosidases is, however, limited as a general approach for carbohydrate remodelling due to the large number of specific enzymes required for effective uniform truncation and the requirement of prior knowledge of the glycan structures [79]. In addition, the availability and expense of such a panel of enzymes is likely to be prohibitive for high-throughput structure determination of glycoproteins. An exception is the use of a single α-mannosidase with glycoproteins from known mannose-based expression systems. Illustrating this approach, Jack bean α-mannosidase was used in the crystallization of human CD1a from Schneider 2 (S2) D. melanogaster cells [80, 81]. Due to uniform α-mannosylation, Jack bean α-mannosidase also readily trims the O-linked glycosylation of fungally-derived glycoproteins [82].
140
S.J. Davis and M. Crispin
6.4.5 N-Glycan Remodeling and Endoglycosidase Treatment By choosing a suitable cell line or by using inhibitors of glycosylation pathways, it is possible to produce glycoproteins that fold normally but whose glycosylation is sensitive to endo-N-acetylglucosaminidases. The key advantage of this approach is that protein folding proceeds normally in the ER in the presence of normal levels of glycosylation whilst the subsequent, aberrant Golgi-dependent processing resulting in the formation of oligomannose-type N-linked glycans facilitates the removal of the glycans once the protein has been purified. Endo-β-N-acetylglucosaminidases specific for the β1−4 linkage of the diN-acetylchitobiose core of N-linked glycans, have been isolated from a range of microorganisms. Although endo-β-N-acetylglucosaminidases all hydrolyse the same glycan linkage, they are each specific for only a subset of all N-linked glycans. For example, the endo-β-N-acetylglucosaminidases from Flavobacterium meningosepticum [83, 84] cleave only oligomannose-type glycans (Endo F1 ), monoand biantennary glycans (Endo F2 ) and fucosylated mono-, bi- and isomer-specific triantennary glycans (Endo F3 ). Other examples are Endo H from Streptococcus plicatus which cleaves oligomannose and hybrid-type glycans [85, 86], Endo D from Streptococcus pneumoniae which cleaves paucimannose structures and Man5 GlcNAc2 [87], and Endo S from Streptococcus pyogenes which cleaves biantennary complex-type glycans [88]. All the endo-β-N-acetylglucosaminidases leave single GlcNAc residues with or without the potential core α1−6 linked fucose attached to the protein surface. As this GlcNAc dominates the stabilizing intramolecular glycan-protein interactions, endoglycosidases are powerful tools for preparing largely unglycosylated but natively folded and active proteins. The development of deglycosylation strategies can be guided by the knowledge of the substrate specificities of the endo-β-N-acetylglucosaminidases. However, due to the substrate diversity of the endoglycosidases and the multitude of glycan structures that can be present on a glycoprotein, endoglycosidases were initially used when the glycosylation was naturally restricted, as in insect and fungal expression systems. However, endoglycosidases have also been utilized in mammalian expression systems wherein the glycan diversity is artificially restricted by the use of inhibitors or through the use of glycosylation processing mutants. 6.4.5.1 Insect Cell-Derived Glycoproteins Insect expression systems, such as D. melanogaster and S. frugiperda, are characterized by oligo- and paucimannose-type glycosylation. Expression of target glycoproteins in such systems has the advantage of glycosylation being dominated by the paucimannose core α1−6-fucosylated Man3 GlcNAc2 glycan. This structure corresponds to the trimannosyl core of glycans from the mammalian expression systems and has a well defined, relatively stable conformation [35]. The compactness and stablity of the Man3 GlcNAc2 glycan has allowed the crystallization of numerous proteins (see e.g. Garcia et al. [89]). The presence of trace amounts of
6
Solutions to the Glycosylation Problem
141
larger oligomannose-type glycans can nevertheless hinder crystallization. It is possible, however, to remove these contaminants and the dominating Man3 GlcNAc2 glycan using endoglycosidases. In the case of the HIV-1 envelope glycoprotein, gp120, Kwong et al. employed specific endoglycosidases to remove the oligomannose series (Endo H) and the paucimannose Man3 GlcNAc2 structures (Endo D [90]). The reaction efficacy was evaluated by structural analysis of the PNGase F-released glycans. Since all the glycoproteins expressed in D. melanogaster and S. frugiperda contain entirely α-mannose-based glycans these can also be trimmed from the non-reducing termini with, for example, Jack bean α-mannosidase, as used in the crystallization of human CD1a [80, 91]. 6.4.5.2 Fungally-Expressed Glycoproteins Fungal expression systems have been a popular choice for targets that require glycosylation for folding and therefore cannot be obtained using bacterial expression systems, because of the high expression levels that can be achieved. Many, but not necessarily all, mammalian glycoproteins can be expressed in these systems. Fungal expression systems give rise to glycan structures that can be readily cleaved by endoglycosidase digestion. Indeed, the N-linked glycans are entirely sensitive to Endo H and Endo F1 . This Endo H/F1 sensitivity has been exploited in the structural determination of both native and heterologous glycoproteins from fungal sources. For example, endoglycosidase treatment was used in the crystallization of phosphomonoesterase from Aspergillus ficuum [92] and the dye-decolorizing peroxidase produced by A. oryzae [93]. Among other poteins, Endo H digestion facilitated the crystallization of peroxidase from the inky-cap mushroom Coprinus cinereus [94], endopolygalacturonase from the pathogenic fungus, Stereum purpureum [95], and quercetin 2,3-dioxygenase from A. japonicus [96]. The quercetin 2,3-dioxygenase structure revealed an important advantage of using Endo H. The homodimer contained ten N-linked glycans all of which were cleaved by Endo H except for two symmetry-related glycans forming extensive stabilising interactions at the dimer interface. Thus endoglycosidase action does not remove glycans that are tightly associated with the protein and maintain protein stability and solubility. A number of crystal structures have been obtained using yeast expression systems such as Pichia pastoris in combination with deglycosylation with Endo H or Endo F1 , including human complement receptor type 2 (CD21) [97], A. fumigatus phytase [98], human pancreatic α-amylase [99], human tissue kallikrein [100] and human neutral endopeptidase (neprilysin) complexed with phosphoramidon [101]. Crystals of the deglycosylated endopeptidase diffracted to 2.1 Å, whereas the crystals of the glycosylated enzyme only diffracted to 7.5 Å resolution [102]. 6.4.5.3 Mammalian Expression Systems Expression in Lec3.2.8.1 Cells: Rat sCD2 The problems associated with crystallizing glycoproteins became apparent to the authors upon observing that, whereas the N-terminal two domains of human CD4,
142
S.J. Davis and M. Crispin
which is unglycosylated crystallizes readily [103, 104], the equivalent region of rat CD4 containing a single N-linked glycan did not (SJD et al. unpublished observations). Proteins with simplified, smaller N-linked glycans were known to be expressible using baculovirus expression systems and insect cells, but our experience was that the expression levels obtainable with such systems were relatively poor in comparison to that achievable with mammalian expression systems [105]. CHO cells, on the other hand, give very high levels of expression with fully processed (i.e. complex) N-glycosylation [106]. The generation by Pamela Stanley and co-workers of a panel of mutated CHO cell lines with defective glycosylation offered the possibility of generating high levels of protein expression with simplified N-linked glycans [107]. One cell line in particular, Lec3.2.8.1, was particularly promising as it produced oligomannose-type Man5 GlcNAc2 N-linked glycans known to be sensitive to Endo H. These cells harboured mutations resulting in an undefined deficiency in sialic acid metabolism, defective CMP-sialic acid and UDP-galactose translocation, and disrupted GnT I [108]. The sensitivity of the N-linked glycans produced by these cells, under conditions that preserved the folding of the proteins to which they were attached (i.e. in the absence of denaturants), was unknown. This was first tested in structural studies of soluble (s) forms of rat CD2, which has four N-glycosylation sequons and CD4, which has two. The Lec3.2.8.1 cell line was transfected with calcium phosphate/DNA mixtures using the very high-yield glutamine synthetase-based stable gene expression system [109]. Although this was initially very inefficient, transfection was improved using cationic liposomes. The first few Lec3.2.8.1 clones expressing rat sCD2 and sCD4 that were obtained did, however, secrete the recombinant glycoproteins at very high levels (40−80 mg/l). The expressed proteins were purified by affinity chromatography and gel filtration and Endo H sensitivity compared to the analogous forms expressed in wild-type (CHO-K1) cells. Under non-denaturing conditions, Endo H only partially deglycosylated CHO-K1-derived sCD4 and had no effect on CHO-K1-derived sCD2 over a 16 h incubation. In contrast, the deglycosylation of Lec3.2.8.1 derived material went to virtual completion within 90 minutes. The reduction in the heterogeneity of sCD2 following deglycosylation was marked (Fig. 6.4a), although somewhat exaggerated by the variable sequon occupancy characterizing rat sCD2 [61]. The molecular weight of the deglycosylated sCD4 was virtually indistinguishable from that of an unglycosylated variant allowing for the presence of 1−4 GlcNAc residues left by Endo H [105]. Glycosylation analysis showed that the majority of the N-linked glycans from the CHO-K1 derived products were acidic whereas greater than 90% of those from the Lec3.2.8.1 derived sCD2 and sCD4 were neutral. Gel-filtration showed that the N-linked glycans from sCD2 and sCD4 expressed in CHO-K1 cells were large and heterogeneous and combined β-galactosidase and hexosaminidase digestion confirmed that all of the N-linked glycans were of the complex type. The N-linked glycans from sCD4 and sCD2 expressed in Lec3.2.8.1 cells, in contrast, were smaller and less heterogeneous than those present on the CHO-K1 derived glycoproteins. Digestion with α-mannosidase confirmed that they were of the oligomannosetype.
6
Solutions to the Glycosylation Problem
a
Le + Le CH c3 En c 3 .2 O do .2 . .8 -K .1 H 8.1 1
143
b
43 kD
30 kD
20 kD
Fig. 6.4 The deglycosylation and crystallization of rat sCD2. (a) sCD2 expressed in Lec3.2.8.1 cells was treated with Endo H. The starting material (lane 2) and deglycosylated sCD2 (lane 3) were analysed on an SDS-PAGE gel stained with Coomassie Blue. Comparison was made with sCD2 expressed in CHO-KI cells (lane 1). The four bands comprising the non-Endo H treated CHOK1- and Lec3.2.8.1-derived forms arise as a result of variable sequon occupancy [61]. (b) Crystals appearing in the first well of a hanging-drop crystallization trial of the deglycosylated sCD2 are shown
Lec3.2.8.1 cell-derived, Endo H-digested sCD4 failed to give better crystals than those previously obtained. However, the sensitivity of the Lec3.2.8.1-derived sCD2 meant that 3 mg of deglycosylated sCD2 could be prepared from 10 mg of total protein using 700 mU of Endo H, and this material crystallized very readily in the presence of polyethylene glycol (Fig. 6.4b), whereas the fully glycosylated Lec3.2.8.1 product formed only amorphous precipitates. The structure of sCD2, solved to 2.8 Å resolution, offered the first complete view of the extracellular region of a cell adhesion molecule [110]. Expression in Lec3.2.8.1 cells has since been used in structural analyses of the T-cell receptor (TCR) in complex with an anti-TCR Fab [111], murine CD8αα and of CD8αα in complex with both the class I MHC H-2 Kb [112] and the non-classical MHC-I molecule, TL [113]. An important, more recent development is that, even though it is much less efficient than in buffers at pH 5.2, there is enough residual Endo H activity at pH 7.4 or even pH 8 to attempt deglycosylation under non-acidic conditions if the target protein is unstable under acidic conditions. The use of glutathione S-transferase-linked forms of e.g. Endo F1 allows their subsequent removal from the digestion mix. It is worth noting that the truncated sugars present on Lec3.2.8.1-derived material occasionally seem also to be conducive to crystallization. For example, non-Endo H digested, Lec3.2.8.1-derived ICAM-1 [114] and ICAM-2 [115], CEACAM1a [42], Semaphorin 4D [116] and a TCR/MHC class II complex [117] also produced crystals, presumably due to the relative uniformity and compactness of Lec3.2.8.1 N-linked glycans. In the case of Semaphorin 4D, the glycosylated, but not the Endo H-deglycosylated form of Lec3.2.8.1-derived protein formed crystals. The glycosylated and deglycosylated forms of any given protein with compact N-linked glycans may therefore be worth considering as separate crystallization targets.
144
S.J. Davis and M. Crispin
Inhibition of Processing with N-Butyldeoxynojirimycin: Human sCD2 Methods were sought that avoided difficult transfections of Lec3.2.8.1 cells and for generating Endo H-sensitive protein from pre-existing cell lines. The α-glucosidase I inhibitor, N-butyldeoxynojirimycin (NB-DNJ), was known to prevent the maturation of N-linked oligosaccharides on HIV-1 gp120 expressed in CHO cells [118], to the extent that they remained in Endo H-sensitive oligomannose forms. Quantitative analysis of a human sCD2-expressing cell line [6] cultured in the absence or in the presence of 0.5, 1, 1.5, or 2 mM NB-DNJ indicated that the inhibitor reduced expression 3-4 fold. On the other hand, sensitivity to Endo H increased with increasing NB-DNJ concentrations (Fig. 6.5a). At the highest NB-DNJ concentration less than ∼15% of the hsCD2 was Endo H-resistant reflecting the absence of the endomannosidase shunt pathway in these cells. The resistant glycoforms could nevertheless be readily removed using a mixture of lentil lectin-, concanavalin A- and phytohaemagglutinin-coupled agarose beads. The Endo H-treated hsCD2 readily formed crystals that diffracted to 2.5 Å resolution (Fig. 6.5b). Subsequent work indicated that for some proteins expressed in Lec3.2.8.1 cells or in CHO-K1 cells in the presence of NB-DNJ, as little as 10−30% of the total secreted protein was Endo H sensitive. While this represented a substantial increase in Endo H sensitivity over that of, for example, rat sCD2 expressed in wild-type CHO-K1 cells (