Clinical Proteomics
M E T H O D S
I N
M O L E C U L A R
B I O L O G YTM
John M. Walker, SERIES EDITOR 447. Alcohol: Methods and Protocols, edited by Laura E. Nagy, 2008 446. Post-translational Modification of Proteins: Tools for Functional Proteomics, Second Edition, edited by Christoph Kannicht, 2008 443. Molecular Modeling of Proteins, edited by Andreas Kukol, 2008 439. Genomics Protocols: Second Edition, edited by Mike Starkey and Ramnanth Elaswarapu, 2008 438. Neural Stem Cells: Methods and Protocols, Second Edition, edited by Leslie P. Weiner, 2008 437. Drug Delivery Systems, edited by Kewal K. Jain, 2008 436. Avian Influenza Virus, edited by Erica Spackman, 2008 435. Chromosomal Mutagenesis, edited by Greg Davis and Kevin J. Kayser, 2008 434. Gene Therapy Protocols: Volume 2: Design and Characterization of Gene Transfer Vectors edited by Joseph M. LeDoux, 2008 433. Gene Therapy Protocols: Volume 1: Production and In Vivo Applications of Gene Transfer Vectors, edited by Joseph M. LeDoux, 2007 432. Organelle Proteomics, edited by Delphine Pflieger and Jean Rossier, 2008 431. Bacterial Pathogenesis: Methods and Protocols, edited by Frank DeLeo and Michael Otto, 2008 430. Hematopoietic Stem Cell Protocols, edited by Kevin D. Bunting, 2008 429. Molecular Beacons: Signalling Nucleic Acid Probes, Methods and Protocols, edited by Andreas Marx and Oliver Seitz, 2008 428. Clinical Proteomics: Methods and Protocols, edited by Antonia Vlahou, 2008 427. Plant Embryogenesis, edited by Maria Fernanda Suarez and Peter Bozhkov, 2008 426. Structural Proteomics: High-Throughput Methods, edited by Bostjan Kobe, Mitchell Guss, and Huber Thomas, 2008 425. 2D PAGE: Volume 2: Applications and Protocols, edited by Anton Posch, 2008 424. 2D PAGE: Volume 1:, Sample Preparation and Pre-Fractionation, edited by Anton Posch, 2008 423. Electroporation Protocols, edited by Shulin Li, 2008 422. Phylogenomics, edited by William J. Murphy, 2008 421. Affinity Chromatography: Methods and Protocols, Second Edition, edited by Michael Zachariou, 2008 420. Drosophila: Methods and Protocols, edited by Christian Dahmann, 2008 419. Post-Transcriptional Gene Regulation, edited by Jeffrey Wilusz, 2008 418. Avidin-Biotin Interactions: Methods and Applications, edited by Robert J. McMahon, 2008 417. Tissue Engineering, Second Edition, edited by Hannsjörg Hauser and Martin Fussenegger, 2007 416. Gene Essentiality: Protocols and Bioinformatics, edited by Svetlana Gerdes and Andrei L. Osterman, 2008 415. Innate Immunity, edited by Jonathan Ewbank and Eric Vivier, 2007
414. Apoptosis in Cancer: Methods and Protocols, edited by Gil Mor and Ayesha Alvero, 2008 413. Protein Structure Prediction, Second Edition, edited by Mohammed Zaki and Chris Bystroff, 2008 412. Neutrophil Methods and Protocols, edited by Mark T. Quinn, Frank R. DeLeo, and Gary M. Bokoch, 2007 411. Reporter Genes for Mammalian Systems, edited by Don Anson, 2007 410. Environmental Genomics, edited by Cristofre C. Martin, 2007 409. Immunoinformatics: Predicting Immunogenicity In Silico, edited by Darren R. Flower, 2007 408. Gene Function Analysis, edited by Michael Ochs, 2007 407. Stem Cell Assays, edited by Vemuri C. Mohan, 2007 406. Plant Bioinformatics: Methods and Protocols, edited by David Edwards, 2007 405. Telomerase Inhibition: Strategies and Protocols, edited by Lucy Andrews and Trygve O. Tollefsbol, 2007 404. Topics in Biostatistics, edited by Walter T. Ambrosius, 2007 403. Patch-Clamp Methods and Protocols, edited by Peter Molnar and James J. Hickman 2007 402. PCR Primer Design, edited by Anton Yuryev, 2007 401. Neuroinformatics, edited by Chiquito J. Crasto, 2007 400. Methods in Membrane Lipids, edited by Alex Dopico, 2007 399. Neuroprotection Methods and Protocols, edited by Tiziana Borsello, 2007 398. Lipid Rafts, edited by Thomas J. McIntosh, 2007 397. Hedgehog Signaling Protocols, edited by Jamila I. Horabin, 2007 396. Comparative Genomics, Volume 2, edited by Nicholas H. Bergman, 2007 395. Comparative Genomics, Volume 1, edited by Nicholas H. Bergman, 2007 394. Salmonella: Methods and Protocols, edited by Heide Schatten and Abraham Eisenstark, 2007 393. Plant Secondary Metabolites, edited by Harinder P. S. Makkar, P. Siddhuraju, and Klaus Becker, 2007 392. Molecular Motors: Methods and Protocols, edited by Ann O. Sperry, 2007 391. MRSA Protocols, edited by Yinduo Ji, 2007 390. Protein Targeting Protocols Second Edition, edited by Mark van der Giezen, 2007 389. Pichia Protocols, Second Edition, edited by James M. Cregg, 2007 388. Baculovirus and Insect Cell Expression Protocols, Second Edition, edited by David W. Murhammer, 2007 387. Serial Analysis of Gene Expression (SAGE): Digital Gene Expression Profiling, edited by Kare Lehmann Nielsen, 2007 386. Peptide Characterization and Application Protocols, edited by Gregg B. Fields, 2007 385. Microchip-Based Assay Systems: Methods and Applications, edited by Pierre N. Floriano, 2007
M E T H O D S I N M O L E C U L A R B I O L O G YT M
Clinical Proteomics Methods and Protocols
Edited by
Antonia Vlahou Biomedical Research Foundation, Academy of Athens, Athens, Greece
Editor Antonia Vlahou Academy of Athens Biomedical Research Foundation Athens, Greece Athens 115 27 e-mail:
[email protected] Series Editor John M. Walker School of Life Sciences University of Hertfordshire Hatfield, Herts., AL10 9AB UK
ISBN: 978-1-58829-837-9
e-ISBN: 978-1-59745-117-8
Library of Congress Control Number: 2007939413 ©2008 Humana Press, a part of Springer Science+Business Media, LLC All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Humana Press, 999 Riverview Drive, Suite 208, Totowa, NJ 07512 USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. While the advice and information in this book are believed to be true and accurate at the date of going to press, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein. Printed on acid-free paper 987654321 springer.com
Preface
Clinical proteomics has rapidly evolved over the past few years and is continuously growing as new methodologies and technologies emerge. In this volume, leading researchers in the field have contributed their stateof-the-art methodologies on protein profiling and identification of disease biomarkers in tissues, microdissected cells, and body fluids. Experimental approaches involving application of two-dimensional electrophoresis, multidimensional liquid chromatography, SELDI/MALDI mass spectrometry and protein arrays, as well as the bioinformatics and statistical tools pertinent to the analysis of proteomics data are described. As stated in the introductory chapter by Prof. Paik, the Vice President of the Human Proteome Organization, “clinical proteomics needs the integration of biochemistry, pathology, analytical technology, bioinformatics, and proteome informatics to develop highly sensitive diagnostic tools for routine clinical care in the future.” The multi-disciplinary character of clinical proteomics approaches is evident in the detailed step-by-step protocols described in this volume, which makes them of potential use to a wide range of researchers, including clinicians, molecular biologists, chemists, bioinformaticians, and computational biologists. Antonia Vlahou
v
Acknowledgments
The editor gratefully acknowledges all contributing authors for their collaboration, which made this project possible and brought it into fruition; the series editor, Prof. John Walker, whose help and guidance have been instrumental; Mr. Patrick Marton, Mr. David Casey, and the whole production team at Humana headed by the late Mr. Tom Laningan for making an excellent production of this book.
vii
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii 1.
Overview and Introduction to Clinical Proteomics . . . . . . . . . . . . . . . . . Young-Ki Paik, Hoguen Kim, Eun-Young Lee, Min-Seok Kwon, and Sang Yun Cho
Part I:
1
Specimen Collection for Clinical Proteomics
2.
Specimen Collection and Handling: Standardization of Blood Sample Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Harald Tammen 3. Tissue Sample Collection for Proteomics Analysis. . . . . . . . . . . . . . . . . . 43 Jose I. Diaz, Lisa H. Cazares, and O. John Semmes
Part II: Clinical Proteomics by 2DE and Direct MALDI/SELDI MS Profiling 4.
5.
6.
7.
8.
Protein Profiling of Human Plasma Samples by Two-Dimensional Electrophoresis . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Sang Yun Cho, Eun-Young Lee, Hye-Young Kim, Min-Jung Kang, Hyoung-Joo Lee, Hoguen Kim, and Young-Ki Paik Analysis of Laser Capture Microdissected Cells by 2-Dimensional Gel Electrophoresis . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Daohai Zhang and Evelyn Siew-Chuan Koay Optimizing the Difference Gel Electrophoresis (DIGE) Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 David B. Friedman and Kathryn S. Lilley MALDI/SELDI Protein Profiling of Serum for the Identification of Cancer Biomarkers . . . . . . . . . . . . . . . . . . . . . . 125 Lisa H. Cazares, Jose I. Diaz, Rick R. Drake, and O. John Semmes Urine Sample Preparation and Protein Profiling by Two-Dimensional Electrophoresis and Matrix-Assisted Laser Desorption Ionization Time of Flight Mass Spectroscopy . . . . . . . . 141 Panagiotis G. Zerefos and Antonia Vlahou
ix
x
Contents 9.
Combining Laser Capture Microdissection and Proteomics Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 Dana Mustafa, Johan M. Kros, and Theo Luider
Part III: 10.
Clinical Proteomics by LC-MS Approaches
Comparison of Protein Expression by Isotope-Coded Affinity Tag Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 Zhen Xiao and Timothy D. Veenstra
11.
Analysis of Microdissected Cells by Two-Dimensional LC-MS Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 Chen Li, Yi-Hong, Ye-Xiong Tan, Jian-Hua Ai, Hu Zhou, Su-Jun Li, Lei Zhang, Qi-Chang Xia, Jia-Rui Wu, Hong-Yang Wang, and Rong Zeng 12. Label-Free LC-MS Method for the Identification of Biomarkers . . . . . 209 Richard E. Higgs, Michael D. Knierman, Valentina Gelfanova, Jon P. Butler, and John E. Hale 13.
Analysis of the Extracellular Matrix and Secreted Vesicle Proteomes by Mass Spectrometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 Zhen Xiao, Thomas P. Conrads, George R. Beck, Jr., and Timothy D. Veenstra
Part IV:
Clinical Proteomics and Antibody Arrays
14.
Miniaturized Parallelized Sandwich Immunoassays . . . . . . . . . . . . . . . . 247 Hsin-Yun Hsu, Silke Wittemann, and Thomas O. Joos
15.
Dissecting Cancer Serum Protein Profiles Using Antibody Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 Marta Sanchez-Carbayo
Part V: Statistics and Bioinformatics in Clinical Proteomics Data Analysis 16.
2D-PAGE Maps Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 Emilio Marengo, Elisa Robotti, and Marco Bobba 17. Finding the Significant Markers: Statistical Analysis of Proteomic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 Sebastien Christian Carpentier, Bart Panis, Rony Swennen, and Jeroen Lammertyn 18. Web-Based Tools for Protein Classification . . . . . . . . . . . . . . . . . . . . . . . . 349 Costas D. Paliakasis, Ioannis Michalopoulos, and Sophia Kossida
Contents 19.
20.
xi
Open-Source Platform for the Analysis of Liquid Chromatography-Mass Spectrometry (LC-MS) Data . . . . . . . . . . . . . . 369 Matthew Fitzgibbon, Wendy Law, Damon May, Andrea Detter, and Martin McIntosh
Pattern Recognition Approaches for Classifying Proteomic Mass Spectra of Biofluids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383 Ray L. Somorjai Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
Contributors Jian-Hua Ai • Eastern Hepatobiliary Surgery Hospital, Shanghai, China George R. Beck, Jr • Division of Endocrinology, Metabolism and Lipids Emory University, School of Medicine, Atlanta, GA Marco Bobba • University of Eastern Piedmont, Department of Environmental and Life Sciences, Alessandria, Italy Jon P. Butler • Lilly Corporate Center, Indianapolis, IN Sebastien Christian Carpentier • Faculty of Bioscience Engineering, Division of Crop Biotechnics, K.U. Leuven, Leuven, Belgium Lisa H. Cazares • The George L. Wright Jr. Center for Biomedical Proteomics Eastern Virginia Medical School, Norfolk, VA Sang Yun Cho • Yonsei Biomedical Proteome Research Center, Department of Biochemistry, College of Sciences, Seoul, Korea Thomas P. Conrads • Laboratory of Proteomics and Analytical Technologies SAIC-Frederick, Inc., National Cancer Institute at Frederick, Frederick, MD Andrea Detter • Fred Hutchinson Cancer Research Center, Seattle, WA Jose I. Diaz • Cancer Therapy Research Center’s Institute for Drug Development, University of Texas, Health Science Center, San Antonio, TX Rick R. Drake • Eastern Virginia Medical School, Norfolk, VA Matthew Fitzgibbon • Fred Hutchinson Cancer Research Center, Seattle, WA David B. Friedman • Proteomics Laboratory, Mass Spectrometry Research Center, Department of Biochemistry, Vanderbilt University School of Medicine, Nashville, TN Valentina Gelfanova • Lilly Corporate Center, Indianapolis, IN John E. Hale • Lilly Corporate Center, Indianapolis, IN Richard E. Higgs • Lilly Corporate Center, Indianapolis, IN Yi-Hong • Eastern Hepatobiliary Surgery Hospital, Shanghai, China Hsin-Yun Hsu • Biochemistry Department NMI Natural and Medical Sciences Institute at the University of Tuebingen, Reutlingen, Germany Thomas O. Joos • Biochemistry Department, NMI Natural and Medical Sciences Institute at the University of Tuebingen, Reutlingen, Germany Min-Jung Kang • Yonsei Biomedical Proteome Research Center, Department of Biochemistry, College of Sciences, Seoul, Korea xiii
xiv
Contributors
Hoguen Kim • Department of Pathology, College of Medicine, Yonsei University, Seoul, Korea Hye-Young Kim • Yonsei Biomedical Proteome Research Center, Department of Biochemistry, College of Sciences, Seoul, Korea Michael D. Knierman • Lilly Corporate Center, Indianapolis, IN Evelyn Siew-Chuan Koay • Department of Pathology, Yong Loo Lin School of Medicine, National University of Singapore, and Molecular Diagnosis Center, Department of Laboratory Medicine. National University Hospital, Singapore Sophia Kossida • Division of Biotechnology, Biomedical Research Foundation, Academy of Athens, Athens, Greece Johan M. Kros • Department of Pathology, Josephine Nefkens Institute Erasmus Medical Center, Rotterdam, The Netherlands Min-Seok Kwon • Yonsei Biomedical Proteome Research Center, Department of Biochemistry, College of Sciences, Seoul, Korea Jeroen Lammertyn • Faculty of Bioscience Engineering, Division of Mechatronics, Biostatistics and Sensors, K.U. Leuven, Leuven, Belgium Wendy Law • Fred Hutchinson Cancer Research Center, Seattle, WA Eun-Young Lee • Yonsei Biomedical Proteome Research Center, Department of Biochemistry, College of Sciences, Seoul, Korea Hyoung-Joo Lee • Yonsei Biomedical Proteome Research Center, Department of Biochemistry, College of Sciences, Seoul, Korea Chen Li • Research Center for Proteome Analysis, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China Su-Jun Li • Research Center for Proteome Analysis, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China Kathryn S. Lilley • Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, United Kingdom Theo Luider • Laboratories of Neuro-Oncology/Clinical and Cancer Proteomics, Josephine Nefkens Institute Erasmus Medical Center, Rotterdam, The Netherlands Emilio Marengo • Department of Environmental and Life Sciences, University of Eastern Piedmont, Alessandria, Italy Damon May • Fred Hutchinson Cancer Research Center, Seattle, WA Martin McIntosh • Fred Hutchinson Cancer Research Center, Seattle, WA Ioannis Michalopoulos • Biomedical Research Foundation, Academy of Athens, Athens, Greece Dana Mustafa • Department of Pathology, Josephine Nefkens Institute Erasmus Medical Center, Rotterdam, The Netherlands
Contributors
xv
Young-Ki Paik • Department of Biochemistry, Yonsei Proteome Research Center & Biomedical Proteome Research Center, Seoul, Korea Costas D. Paliakasis • Biomedical Research Foundation, Academy of Athens, Athens, Greece Bart Panis • Faculty of Bioscience Engineering, Division of Crop Biotechnics, K.U. Leuven, Leuven, Belgium Elisa Robotti • Department of Environmental and Life Sciences, University of Eastern Piedmont, Alessandria, Italy Marta S.anchez-Carbayo • Tumor Markers Group, Spanish National Cancer Center (CNI0), Madrid, Spain O. John Semmes • The George L. Wright Jr. Center for Biomedical Proteomics, Eastern Virginia Medical School, Norfolk, VA Ray L. Somorjai • Biomedical Informatics Institute for Biodiagnostics, National Research Council, Winnipeg, Manitoba, Canada Rony Swennen • Faculty of Bioscience Engineering, Division of Crop Biotechnics, K.U. Leuven, Leuven, Belgium Harald Tammen • Digilab BioVisioN GmbH, Hannover, Germany Ye-Xiong Tan • Eastern Hepatobiliary Surgery Hospital, Shanghai, China Timothy D. Veenstra • Laboratory of Proteomics and Analytical Technologies, SAIC-Frederick, Inc., National Cancer Institute at Frederick, Frederick, MD Antonia Vlahou • Division of Biotechnology, Biomedical Research Foundation, Academy of Athens, Athens, Greece Hong-Yang Wang • Eastern Hepatobiliary Surgery Hospital, Shanghai, China Silke Wittemann • Biochemistry Department, NMI Natural and Medical Sciences Institute at the University of Tuebingen, Reutlingen, Germany Jia-Rui Wu • Research Center for Proteome Analysis, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China Qi-Chang Xia • Research Center for Proteome Analysis, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China Zhen Xiao • Laboratory of Proteomics and Analytical Technologies, SAIC-Frederick, Inc., National Cancer Institute at Frederick, Frederick, MD Rong Zeng • Research Center for Proteome Analysis, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China Panagiotis G. Zerefos • Division of Biotechnology, Biomedical Research Foundation, Academy of Athens, Athens, Greece
xvi
Contributors
Daohai Zhang • Molecular Diagnosis Center Department of Laboratory Medicine, National University Hospital, Singapore and Department of Pathology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore Lei Zhang • Research Center for Proteome Analysis, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China Hu Zhou • Research Center for Proteome Analysis, Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
1 Overview and Introduction to Clinical Proteomics Young-Ki Paik, Hoguen Kim, Eun-Young Lee, Min-Seok Kwon, and Sang Yun Cho
Summary As the field of clinical proteomics progresses, discovery of disease biomarkers becomes paramount. However, the immediate challenges are to establish standard operating procedures for both clinical specimen handling and reduction of sample complexity and to increase the ability to detect proteins and peptides present in low amounts. The traditional concept of a disease biomarker is shifting toward a new paradigm, namely, that an ensemble of proteins or peptides would be more efficient than a single protein/peptide in the diagnosis of disease. Because clinical proteomics usually requires easy access to well-defined fresh clinical specimens (including morphologically consistent tissue and properly pretreated body fluids of sufficient quantity), biorepository systems need to be established. Here, we address these questions and emphasize the necessity of developing various microdissection techniques for tissue specimens, multidimensional fractionation for body fluids, and other related techniques (including bioinformatics), tools which could become integral parts of clinical proteomics for disease biomarker discovery.
Key Words: biomarker; body fluids; clinical proteomics; translational proteomics; depletion; biorepository; multidimensional fractionation; specimen bank; biomarker panel. Abbreviations: CSF: Cerebrospinal Fluid, SILAC: Stable Isotope Labeling with Amino acids in Cell culture, FFE: Free Flow Electrophoresis, IMAC: Immobilized Metal Affinity Chromatography, 2DE: 2-dimensional Gel electrophoresis, CBB: Coomassie Brilliant Blue, SELDI: Surface-Enhanced Laser Desorption/Ionization, MALDI: MatrixAssisted laser desorption/ionization, MDLC: Multi-dimensional Liquid Chromatography, LC: Liquid Chromatography, TOF: Time-of-Flight, CID: Collision-induced dissociation, ETD: Electron Transfer Dissociation, LIT: Linear Ion-Trap, FT: Fourier-Transform, Q: Quadrupole, ELISA; Enzyme-Linked Immunosorbent Assay, SISCAPA: Stable Isotope Standards with Capture by Anti-Peptide Antibody, AQUA: Absolute Quantitative From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols Edited by: A. Vlahou © Humana Press, Totowa, NJ
1
2
Paik et al. Analysis. Commercial brands are also shown: MARS; Multiple Affinity Removal System, (Agilent, Palo Alto, CA, USA), EnchantTM : EnchantTM Multi-protein Affinity Separation Kit (Pall Life Sciences, Ann Arbor, MI, USA), GradiflowTM : GradiflowTM Separation (Life Bioprocess, Frenchs Forest, Australia), FFETM : BD Free Flow Electrophoresis System (BD Diagnostics, Martinsried/Planegg, Germany), Zoom® : Zoom® Benchtop Proteomics System (Invitrogen Corporation, Carlsbad, CA, USA), Rotofor: Bio-Rad Rotofor® Prep IEF Ccll (Bio-Rad, Hercules, CA, USA), PF2D: ProteomeLabTM PF2D Protein Fractionation System (Beckman Coulter, Inc., Fullerton, CA, USA), DIGE: EttanTM DIGE System (GE Healthcare Bio-Sciences AB, Uppsala, Sweden), Deep PurpleTM : Deep PurpleTM Total Pprotein Stain (GE Healthcare Bio-Sciences AB, Uppsala, Sweden), ICATTM : Isotopecoded affinity tags (Applied Biosystems, Foster City, CA, USA), iTRAQTM : iTRAQTM Reagents (Applied Biosystems, Foster City, CA, USA), Q-TRAPTM : (Applied Biosystems, Foster City, CA, USA).
1. Overview and Scope of Clinical Proteomics Clinical proteomics is defined as comprehensive studies of qualitative and quantitative profiling of proteins (and peptides) present in clinical specimens such as body fluids and tissues. The comparison of specimens from healthy and diseased individuals may lead to the discovery of a disease biomarker (1). The biomarker serves as a molecular signature reflecting stages of disease before or after treatment and can also be used for prognostic purposes in monitoring the response to treatment (2). Clinical proteomics consists of a variety of experimental processes, which include the collection of well-phenotyped clinical specimens, analysis of proteins or peptides of interest, data interpretation, and validation of proteomics data in a clinical context (Fig. 1). After successful identification of a few disease biomarker candidates through extensive profiling,
Fig. 1. Clinical and translational proteomics. The key components of experimental methods are included in each box.
Overview and Introduction to Clinical Proteomics
3
translational proteomics involving validation with a cohort study follows. Even after proper identification and verification of a disease biomarker, it takes quite a long time to prove that this biomarker is applicable to clinical diagnosis or prognosis (3,4). There has been a remarkable increase in publication of clinical proteomics papers within a short period of time [more than 800 papers in 2006 (Fig. 2)], coinciding with the rapid growth of proteomics. Reflecting this trend in clinical proteomics, this chapter aims to present a review of core technologies that are used in the field of clinical proteomics with respect to sample specimen processing, protein separation platforms (e.g., gel-based system or liquid-based methods), quantitative labeling, mass spectrometry (MS), and proteome informatics tools. It is noteworthy that despite the advent of new technologies, there remain several bottlenecks in the proteomics field such as lack of dataset standardization, quantification of the proteins of interest, verification of protein or peptides identified, and an overall strategy for tackling biomarker postidentification. Thus, the pace of biomarker discovery, one of the key agendas of clinical proteomics, will depend on how well these obstacles or bottlenecks are resolved by technical advancement (4). The following sections address these issues in the context of clinical proteomics.
Fig. 2. Recent trends in clinical proteomics publications. The distribution of the articles related to clinical proteomics listed in PubMed is shown here. The key words used for searching articles are as follows: query (clinical[All Fields] OR ((“biological markers”[TIAB] NOT Medline[SB]) OR “biological markers”[MeSH Terms] OR biomarker[Text Word])) AND (“proteomics”[MeSH Terms] OR proteomics[Text Word] OR proteomic[All Fields] OR “proteome”[MeSH Terms] OR proteome[Text Word]).
4
Paik et al.
2. Sample Specimens and Processing Techniques Used for Clinical Proteomics 2.1. General Considerations Because clinical proteomics rely heavily on the patient specimens, three important factors need to be considered before the selection and preparation of clinical specimens: (1) selection of the correct clinical samples according to the type of research, (2) isolation of the appropriate component from the clinical samples, and (3) establishment of optimal experimental conditions for each sample (5,6,7,8). For the selection of correct clinical samples, the relationship between clinical samples and the specific disease should also be considered. For example, although cancer tissue represents a specific cancer, several types of body fluids from patients may also have a relationship to the cancer. If the selected clinical samples specifically represent the disease, the next step is to evaluate what components are related to the specific disease. That is, tumor cells in cancerous tissues are surrounded by many types of stromal cells, inflammatory cells, and connective tissues that are directly related to changes in protein expression in the cancer. If the purpose of proteomic analysis is to identify characteristic changes of specific proteins in tumor cells, then the precise identification of tumor cell percentage that can be increased by tissue microdissection would appear to be necessary (5,6,7). As sample specimen conditions directly impact the results of biomarker discovery, well-defined clinical specimens should be used since the discovery of disease biomarkers is much easier when the samples have clear anatomical and pathophysiological definitions. Because clinical specimens are heterogeneous, sophisticated pathological discrimination is required for the isolation of specific diseased tissue or body fluids. Without the expertise of a pathologist at the earliest stage, it may be difficult to isolate a specifically defined specimen for clinical proteomics. Generally, clinical samples contain variable factors and components originating from the microenvironment of specific tissues. For instance, liver tissues usually contain a large amount of blood in the sinusoid and this amount is increased in tissues with dilated sinusoids (9). Lung tissues usually contain deposited exogenous materials and this amount is increased in heavy smokers (10). Note that the amount of blood present in isolated tissues may directly influence the relative proportion of proteins found in clinical specimens. Deposited materials and the other chemicals such as stain dye and fixatives used in the microdissection may also influence the experimental conditions (11). In the analysis of clinical samples, suitable buffer conditions, minimal lysis time, and high-yield protein precipitation are highly recommended. To avoid substantial variations between experiments using clinical specimens, a large set of specimens are also necessary because, unlike cultured cell lines, clinical specimens have high
Overview and Introduction to Clinical Proteomics
5
component variability (12). More details on specific disease types are also described throughout this volume. 2.2. Body Fluids Surveying the literature, there appears to be five to six different types of clinical specimens. Body fluids [e.g., plasma, urine, tear, cerebrospinal fluid, lymph, and ascites], tissues (e.g., liver, heart, muscle, brain, and lung), cells, bone, and hair have all been used for clinical proteomics (Table 1) (13,14,15,16, 17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33). Each has its own merits and limitations for biomarker discovery via proteomic analysis. Among those sample specimens, the number of publications using body fluids has increased recently, perhaps because of their convenience and ease of use for noninvasive diagnosis. Since those proteins secreted in the body fluids during or after disease may reflect a broad range of pathophysiological conditions, much emphasis has been given to identification of prominent protein/peptide biomarkers that exhibit differential expression at different stages. In the literature, the terms “body fluids” and “biofluids” are being used interchangeably, although the former indicates a greater likelihood of being obtained directly from the patients, while the latter is applied more broadly, referring to liquid or liquid-like samples obtained from living organisms including model animals and plants. Throughout this chapter we will use “body fluids” for clarity. Given the large dynamic range of protein and peptide sources, plasma (a complex liquid interface between tissues) and extra cellular fluids may be the best body fluid to use for clinical proteomics and biomarker discovery (34,35, 36,37,38). In addition to plasma, more than a dozen additional body fluids are currently used for biomarker discovery, ranging from urine to peritoneal fluids (Table 1). However, the biggest challenge in body fluids proteomics may be the multiple pretreatment processes including depletion of high-abundance proteins (in the case of plasma) (34,35,36) and/or their enrichment (in the case of urine) (15,39) prior to analysis (Table 1). Thus, the outcome of clinical proteomics may depend on proper sample processing since the quality of selection and handling of the most specific type of specimen will affect the overall pattern of profiling. Because the details of body fluid proteomics have been well described by Shen Hu et al. (38), we would like to focus on only a few essential points. First, standard measures need to be introduced to protect specimens from nonspecific proteolysis, lysis, and modification during collection and preparation (11). For the standardization of blood sample collection, Tammen emphasizes many useful considerations of preanalytical variables in plasma proteomics, which can be applied to processes involved with blood specimens [(40) and see Chapter 2]. The more specific problems involved in sample
6
Fluid
Synovial fluid Ascites Bronchial lavage fluid
Pleural fluid Peritoneal fluid
Body cavity fluid
Seminal fluid Nipple aspirate fluid Cerebrospinal fluid
Follicular fluid
Lung cancer Ovarian cancer
Rheumatoid arthritis Ovarian cancer Chronic obstructive pulmonary disease, asthmatics and lung disease (29) (14)
(26) (13) (27,28)
(23) (24) (25)
(22)
• Can reflect disease perturbations in the organs or tissues from which they are secreted • Procedure of synovial biopsy is not very difficult
(15) (16) (17,18) (19) (20,21)
Urine Nasal discharge Tears Saliva Amniotic-/cervical fluid
Prostate cancer Seasonal allergic rhinitis Blepharitis and dry eye Oral and breast cancer Fetal aneuploidy and intra-amniotic inflammation Recurrent spontaneous abortion Male infertility Breast cancer Brain tumor
• Routinely accessible body fluids • Very important in the discovery of biomarkers of diseases (systemic vs. organ specific/local) • Important for early detection, disease severity, prognosis, monitoring of response to therapy
(13,14)
Plasma/serum
Proximal fluid
Secretions
Characteristics of the samples
Disease
Reference
Type
Table 1 Types of Biological Specimens Used in Clinical Proteomics
• Mucosa and salt have to be removed necessarily
• Considerations for sample adequacy – Storage – Hemolysis – Influence of anticoagulants –Consistent results • Consider whether to pool samples or analyze individual samples • Depletion of high-abundance proteins (Albumin consist of 50% of plasma proteins)
Pretreatment required for proteomics
7
Hair
Cartilage
Cell lines or primary tissue culture
Cell
Bone
LCM or LMPC isolated Formalin fixed Paraffin embedded
Tissue
Rheumatoid arthritis
Any type of disease
Any type of disease
• Very important for the development of novel in situ biomarkers • Immunofluorescence, immunocytochemistry, imaging mass spectrometry • Very important in the discovery of biomarker candidates • Validation should be performed using primary tumor samples (e.g., immunohistologic methods, imaging MS) • Cartilage consists mainly of extracellular matrix, mostly made of collagens and proteoglycans • Over 300 proteins were found to constitute the insoluble complex formed by transglutaminase crosslinking
(30)
(31)
(32)
(33)
• Need to sufficient extraction of protein from insoluble complex
• Cetylpyridinium chloride effectively aggregate with proteoglycan
• Desalting and removal of media component
• Considerations for sample adequacy • Integrity, degradation of protein • Contamination (microorganisms, extraneous material)
8
Paik et al.
handling are also addressed by Rai et al. (41). Second, to increase the dynamic range of detection and reduce sample heterogeneity, pretreatments such as depletion of high-abundance proteins appear to be required (34,35,36). In addition, many pretreatment steps to remove high-abundance proteins may be required during initial sample processing. Multiple fractionations of clinical samples prior to major separation work would reduce the sample complexity. Note that coremoval of low-abundance proteins during this type of multiple depletion (36,42) and modification of proteins of interest during or after isolation (43) should be considered as well. For several problems encountered with specimen collection, Xiao et al. (Chapter 13) in this volume also describe different methods to isolate extra cellular matrix (ECM) and analyze the proteome of secreted vesicles. These methods will be useful for studying ECM and secreted vesicles in various samples ranging from the primary cultured cells to tissue specimens. Therefore, one must consider the best options for this process before doing the main experiment. 2.3. Tissues and Other Samples Usually tissues are used as primary screening samples to find direct causes of disease from the lesion present in tissues of the corresponding organ, for example, liver tissue in hepatocellular carcinoma (HCC) (44,45). Tissues are widely used for clinical proteomics, although there are no standing operation procedures in specimen fractionation and the detection limit of current instrumentation remains borderline. As listed in Table 1, many cancer tissues can be prepared in different ways such as laser capture microdissection (LCM) (5,6), pressures catapulting techniques [laser microdissection and pressure catapulting (LMPC)] (30,46), and formalin-fixed paraffin-embedded sample preparation (11). Theses techniques are well described in Chapters 3, 5, 9, and 11 in this volume. It is desirable, however, that proteomics studies of disease tissues should also be coupled with parallel analysis of the corresponding body fluids. For example, for the study of cancer biomarkers, paired cancer tissue sets (tumor vs. nontumor) and the same patient’s plasma were used, which led to a more comprehensive analysis (47,48). Experiments on tissue samples may mostly be suitable for pathophysiological studies rather than biomarker discovery due to the complexity of the sample. In specimen processing for proteomics studies, there are usually several unwanted problems such as artifacts created during sample collection, processing, and storage. Other matters arise in the handling of patient information regarding sex, age, and race (49). To minimize those problems associated with systematic sample handling, it is plausible to establish a specimen bank (50,51,52). In fact, the collection of many clinical samples in a biorepository would have enormous
Overview and Introduction to Clinical Proteomics
9
benefits for proteomic research. This enables the selection of homogeneous clinical samples according to the research purposes and isolation of specific components from clinical samples. Additionally, large scale collection of clinical specimens in a biorepository is essential for the validation of specific markers after biomarker candidate discovery. Ideally, the clinical samples stored in the biorepository should be (1) collected and stored immediately because dead cells and altered proteins affect proteomic analysis, (2) subjected to accurate quality control, and (3) catalogued by reliable and secure clinical data. The quality control of clinical samples includes trimming of specimens and confirmation of diagnosis by pathologists; information gained (such as the confirmation of tumor cell and stromal cell ratio, percentage of necrosis, percentage of fibrosis, proportion of infiltrated inflammatory cells, etc.) should be stored in a database of clinical samples. It is also essential to store clinical and follow-up data for each sample and each patient’s written informed consent form in the biorepository network. This clinical specimen banking network provides convenience, reduced budget, and reliability for researchers involved in clinical proteomic research (50,51,52). For representative tissue sample collection for proteomics studies, Diaz et al. (Chapter 3) address a practical experimental strategy for storage and handling of sample specimens that are used in surface-enhanced laser desorption/ionization (SELDI), 2D gel, and liquid chromatography (LC)-based proteomics. Emphasis should be given to the primary responsibility of pathologists in the whole process of tissue proteomics in addition to morphological analysis at the molecular level.
3. Biomarker Discovery and Clinical Proteomics Given that one of the central issues of clinical proteomics is biomarker discovery and its application, a brief account of this subject is appropriate here. An excellent review of the whole arena of biomarker development can be found elsewhere (53,54,55). Until now, it has been generally accepted that a conventional concept of a disease biomarker would be a single protein/peptide with high specificity, which is usually present in low abundance, expressed in a disease in a stage-specific manner, and serve as a major fingerprint of the body’s response to drugs or other treatments. Although many examples of broad biomarkers for various diseases are known (56,57,58,59,60), identification of more specific and selective biomarkers is urgently needed. Accordingly, we may also need to change the current biomarker concept and eliminate the inherent bias toward individual disease biomarkers. Recently, a new idea has been introduced that an ensemble of different proteins would be more efficient than a single protein/peptide in the diagnosis of disease (61,62,63). To solve
10
Paik et al.
this problem we propose a general strategy of clinical proteomics leading to disease biomarker discovery as outlined in Fig. 3. Since biomarker candidate proteins could come from many different cellular processes, they could be either in low abundance or high abundance, which would directly or indirectly reflect the physiological condition of the body. Perhaps they are present in different concentrations depending on the disease stage or tissue type. For example, common proteins such as Hsp 27 (64, 65), 14-3-3 proteins (66,67), apoA-I (68,69), and serum amyloid precursor A (70) appear in most of disease samples from lung cancer, gastric cancer, pancreatic cancer, prostate cancer, neuroblastoma and, inflammation. A number of questions then arise: should they be treated as disease-specific or disease nonspecific proteins? What would be the criterion to make this decision? Is this due to the fact that the number and type of proteins secreted from a specific
Fig. 3. The concept of the creation of a protein biomarker panel for a specific disease. Each white, gray, dark-gray, and black circle represents a putative protein biomarker of a specific disease at that clinical stage. A group of slash-lined circles symbolizes the biomarker panel of liver disease as an example.
Overview and Introduction to Clinical Proteomics
11
physiological condition of many different types of diseases might be similar? How one can distinguish one type of disease from another simply by looking at their protein profiles? As outlined in Fig. 3, at the beginning of certain disease, signals at earlier stages may be limited to only a few easily counted molecules. As the disease progresses, more signal molecules might have been produced, resulting in mixed types of biomarkers representing multiple disease phenomena. Although this assumption seems to be oversimplified, more noise is created at a certain stage where it becomes more difficult to identify those molecules at the molecular level because of two reasons: (1) they are in amounts too small to be detected using the current technology and (2) it may be too premature for the molecules to be specific for a particular disease. Presumably, proteins appearing in stage 3 or 4 may have higher specificity of a particular disease but the sensitivity might be low. It may be likely that this noise interferes with the signaling pathway of a certain disease, and we may end up having no decisive marker. To circumvent this problem, it may be desirable to identify a set of biomarker candidate proteins, termed a “biomarker panel,” which ideally contains potential candidate proteins or peptides that represent specific stages of the disease as a group. Given this panel, extensive validation processes may be sought using large group cohort. Analogous to this strategy, many biomarker candidates at stage 1 can be included in the panel, which can have more specificity and sensitivity as compared to a single molecule biomarker. Using this kind of biomarker panel, one can use not only this molecule as diagnostic marker but also as a prognostic indicator in monitoring treatment effectiveness. For example, Linkov et al. (61) reported that both the sensitivity and specificity were improved up to 84.5 and 98%, respectively, when they used a panel containing 25 multimarkers in early diagnosis of head and neck cancer (squamous cell cancer of the head and neck) (61). In the diagnosis of prostate cancer, specificity was increased from 5–15 to 84–95% when they used a biomarker panel containing six marker proteins as compared to a single marker. In HCC, studies have been carried out on a biomarker panel consisting of a protein array that can be used as a diagnostic kit (62,63). A general strategy for biomarker discovery is outlined in Fig. 4. In typical clinical proteomics, work sample collection is the first step, followed by pretreatment of the sample in order to reduce sample complexity to enable searching for low-abundance proteins (e.g., disease biomarkers) using various fractionation tools. This multidimensional fractionation is well-described elsewhere (34,35,36), and depends on the properties and concentration of the sample. Typically the prefractionated samples go either to a two-dimensional electrophoresis (2DE) or LC-based proteomics separation system, followed by single or multiple steps of mass spectrometric analysis depending on the sample
12 Fig. 4.
Overview and Introduction to Clinical Proteomics
13
quantity and experimental goal. The data obtained from this series of analyses will be integrated into the proteome informatics system where protein/peptide identification, quantification, modification, and verification of peak list are carried out [(71) and also Chapter 19]. Usually this step becomes rate limiting since major profiling data are constructed and analyzed at this point. The clinical relevance of those proteins (and changes in their expression level) in a specific disease state is mostly determined, which eventually leads to identification of biomarker candidates. In addition, SELDI, molecular imaging and protein microarrays can also be applied before or after this step. Once major biomarker candidates are identified, those proteins are subjected to further verification via sophisticated analytical arrays and translational proteomics, which involves cohort studies, pre-evaluation, and a robust analytical system (4,72). Throughout the process of translational proteomics, one may be able to judge whether the identified panel or single proteins are suitable for biomarkers of a specific disease. A recent comprehensive review by Zolg (73) addressed several considerations in the biomarker development pipeline from discovery to validation. Three critical challenges within the pipeline are reduction of clinical sample complexity, the proof of principle of biomarker function, and the detection limit of unique proteins present in the samples. In the search for biomarker panels, reliable statistical tools and bioinformatics resources are needed, which are now available on the web (Table 2; see also Chapters 16 and 17). As the number of biomarker panel candidates increases, more cases are being examined, which require statistical learning methods. These methods include neural networks, genetic algorithms, k-means Fig. 4. A typical experimental strategy for clinical proteomics and translational proteomics. In clinical proteomics research, various experimental techniques are included: specimen collection, prefractionation, 2DE, Non2DE (liquid-based separation), mass spectrometry, informatics, and others. The course of each section as marked (square, circle in different color) is determined by the investigators, depending on the experimental goal. At the bottom, experimental procedures for the verification and validation of biomarker candidates are schematically outlined leading to clinical screening and applications. The squares indicate the separation system based on the specific characteristics of proteins and general prefractionation system. The open circles and open triangle represent analytical modules at the protein and peptide level, respectively. The arrow and junction points indicate an option of each selection. Bottom parts indicate verification procedure employing multiple reaction monitoring and quantitative mass analysis. Those biomarker candidates identified from typical clinical proteomics would be subject to translational proteomics for validation where a large scale cohort study and evaluation would then proceed.
14
Paik et al.
nearest-neighbor analysis, euclidean distance-based nonlinear methods, fuzzy pattern matching, selforganizing mapping, and support vector machines (74,75,76,77,78). They are very useful for classification of proteins according to the specific disease state (see also Chapters 16 and 20). Once biomarker candidates are identified, it is necessary to predict in silico the function of these proteins and validate them in the context of clinical application. Table 3 provides web resources, which can be used for clinical data management, in silico functional annotation (see Chapter 18), prediction, and identification of modified forms of proteins. Thus, by combining experimental methods (Fig. 4) and informatics tools (Tables 2 and 3), one is able to obtain a set of biomarker candidate proteins (panel) that would be further used for validation through translational proteomics (Fig. 1).
4. Introduction of the Experimental Strategy Described in This Volume For protein profiling and identification, proteomics platform technologies are moving forward in many areas not only in clinical proteomics but also in the general biological field. In this section, the leading scientists in the field of proteomics outline core techniques and their application to the studies of clinical proteomics. For example, in plasma proteome analysis, it is necessary to deplete high-abundance proteins using various techniques such as multidimensional fractionation by immunoaffinity column, gel permeation, and beads (Fig. 4). Cho et al. (Chapter 4) addresses this in relation to 2D gel analysis of plasma wherein the technical details of sample preparation, gel electrophoresis, and quantification of proteins on the gel are described. Zhang and Koay (Chapter 5) describe the methods of 2D gel analysis for cells prepared by LCM. They describe the application of LCM in dissecting tumor cells in breast cancer for macromolecular extraction and 2D gels. This can be used for preparation of samples from paraffin-embedded tissue blocks in microdissecting the cells of interest. Further to this procedure, Mustafa et al. (Chapter 9) review the application of LCM for proteomics analysis and demonstrate that combining LCM and MS would facilitate identification of specific proteins for each sample type. For urine sample analysis, Zerefos et al. (Chapter 8) provide simple protocols for protein analysis by 2D gel or direct matrix-assisted laser desorption/ionization-time-of-flight mass spectrometry. These techniques include protein enrichment through protein precipitation and ultrafiltration means. Combining these methods with the above profiling technologies allows reproducible and sensitive analysis of one of the most significant and complex biological samples (77).
Overview and Introduction to Clinical Proteomics
15
Table 2 Clinical Proteomics Initiatives and Resources
Institute CPTI
ABRF
PPI
EDRN
Web resources ExPASy
NCBI
CPRMap
Database MedGene
Details
Websites
National Cancer Institute’s Clinical Proteomics Technologies, initiative for cancer The Association of Biomolecular Resource Facilities, an international society dedicated to advancing core and research biotechnology laboratories through research, communication, and education Plasma Proteome Institute, the PPI is working to facilitate clinical adoption of advanced diagnostic tests using proteins in plasma and serum The Early Detection Research Network, the EDRN provide up-to-date information on biomarker research through this website and scientific publications
http://proteomics.cancer. gov
Expert Protein Analysis System, proteomics related information and database National Center for Biotechnology Information, the protein entries in the Entrez search and retrieval system have been compiled from a variety of sources, including SwissProt, PIR, PRF, PDB, and translations from annotated coding regions in GenBank and RefSeq Clinical Proteomics Research Map, updated research article for disease and clinical proteomics
http://www.expasy.org/
MedGene can make a list of human genes associated with a particular human disease in ranking order
http://hipseq.med.harv ard.edu/MEDGENE
http://www.abrf.org/
http://www.plasmaprote ome.org/plasmaframes. htm http://edrn.nci.nih.gov
http://www.ncbi.nlm. nih.gov/entrez/query. fcgi?db = Protein& itool = toolbar
http://www.cprmap.com/
16
Paik et al.
Table 3 Available Bioinformatic Resources for the Analysis of Proteomics Data Name
Description
Clinical proteome data management system Proteus LIMS for proteomics pipeline CPAS LIMS for identification and quantification using by LC-MS/MS data Systems biology A management system for experiment analysis collecting, storing, management and accessing data system produced by microarray, proteomics, and immunohistochemistry GPM database Open source system for analyzing, validating, and storing protein identification data SpectrumMill MS/MS data analysis and management system Phosphorylation Group-based phosphorylation scoring method KinasePhos
NetPhos
NetPhosK
Prediction of kinase-specific phosphorylation sites A web tool for identifying protein kinase-specific phosphorylation sites using by hidden Markov model Sequence and structure-based prediction of eukaryotic protein phosphorylation sites Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence
Website URL
PMID
http://www. genologics.com 16396501
http://www. sbeams.org/
16756676
http://www. thegpm.org/
15595733
http://www.chem. agilent.com/
http://973proteinweb.ustc. edu.cn/gps/ gps_web/ http://kinasePhos. mbc.nctu.edu.tw
15980451
http://www.cbs. dtu.dk/services/ NetPhos/
10600390
http://www.cbs.dtu. dk/services/ NetPhosK/
15174133
15980458
Overview and Introduction to Clinical Proteomics PredPhospho
PREDIKIN
Prosite
Scansite
Phospho.ELM
Human protein reference database (HPRD)
PhosphoSite
Glycosylation NetOGlyc 2.0
17
Prediction of phosphorylation sites using support vector machine A prediction of substrates for serine/threonine protein kinases based on the primary sequence of a protein kinase catalytic domain A prediction of substrates for protein kinases-based conserved motif search Prediction of PK-specific phosphorylation site with Bayesian decision theory A database of experimentally verified phosphorylation sites in eukaryotic proteins A database of known kinase/phosphatase substrate as well as binding motifs that are curated from the published literature A bioinformatics resource dedicated to physiological protein phosphorylation
http://pred.ngri. re.kr/Pred Phospho.htm http://florey.biosci. uq.edu.au/kinsub/ home.htm
15231530
http://kr.expasy. org/prosite
17237102
http://scansite. mit.edu
16549034
http://phospho.elm. eu.org/
15212693
http://www. phosphosite.org/ Login.jsp
15174125
Predicts O-glycosylation sites in mucin-type proteins
http://www.cbs. dtu.dk/services/ NetOGlyc/ http://www.cbs. dtu.dk/services/ DictyOGlyc/ http://www.cbs. dtu.dk/services/ YinOYang/ http://www.cbs.dtu. dk/services/ NetNGlyc/ http://www.expasy. ch/tools/glycomod/
9557871
DictyOGlyc 1.1
Predicts O-GlcNAc sites in eukaryotic proteins
YinOYang 1.2
Predicts O-GlcNAc sites in eukaryotic proteins
NetNGlyc 1.0
Predicting N-glycosylation sites
GlycoMod
Web software for prediction of the possible oligosaccharide structures in glycoproteins from their experimentally determined masses
16445868
http://www.hprd. org/PhosphoMotif_ finder
10521537
16316981
11680880
(Continued)
18
Paik et al.
Table 3 (Continued) Name
Description
Website URL
PMID
Glyco-fragment
A web tool to support the interpretation of mass spectra of complex carbohydrates Compares each peak of a measured mass spectrum with the calculated fragments of all structures contained in the SweetDB Based on the matching of experimental MS2 data with the theoretical fragmentation of glycan structures in GlycoSuiteDB A web-based computational program that can quickly extract sequence information from a set of MSn spectra for an oligosaccharide of up to 10 residues To determine simultaneously the glycosylation sites and oligosaccharide heterogeneity of glycoproteins using MATLAB A web server for identifying multiple post-translational peptide modifications from tandem mass spectra An attempt to create annotated data collections for carbohydrates
http://www.dkfz. de/spec/projekte/ fragments/
14625865
GlycoSearchMS
GlycosidIQ
Saccharide topology analysis tool
GlycoX
MODi
SWEET-DB
Protein–protein interaction Munich The database of mammalian information protein–protein interactions center for protein sequence’s MPPI
http://www.dkfz. 15215392 de/spec/glycosciences. de/sweetdb/ms/
https://tmat. 15174134 proteomesystems. com/glyco/glycosuite/ glycodb 10857602
17022651
http://www. unimod.org
16845006
http://www.dkfz.de/ spec2/sweetdb/
11752350
http://mips.gsf.de
16381839
Overview and Introduction to Clinical Proteomics Database of interacting proteins Molecular interaction network database
Protein–protein interactions of cancer proteins
IntAct
Biomolecular interaction network database Metabolic and signal pathway BioCarta KEGG
Cancer cell map
HPRD
19
A database that documents experimentally determined protein–protein interactions A database of storing, in a structured format, information about molecular interactions by extracting experimental details from work published in peer-reviewed journals Predicts interactions, which are derived from homology with experimentally known protein–protein interactions from various species IntAct provides a freely available, open source database system and analysis tools for protein interaction data A database designed to store full descriptions of interactions, molecular complexes and pathways
http://dip.doembi.ecla.edu/
11752321
http://mint.bio. uniroma2.it/mint
17135203
http://bmm. cancerresearchuk. org/˜pip
16398927
http://www.ebi. ac.uk/intact/
17145710
http://www.bind.ca
12519993
A pathway database
http://www. biocarta.com http://www. genome.jp/kegg
A pathway database with genomical, chemical, and biological network information The cancer cell map is a selected set of human cancer focused pathways A database with data pertaining to post-translational modifications, protein–protein interactions, tissue expression,
16381885
http://cancer. cellmap.org/cellmap/ http://www. hprd.org/
(Continued)
20
Paik et al.
Table 3 (Continued) Name
Description
Website URL
PMID
subcellular localization, and enzyme–substrate relationships Proteomic data resource The cancer cell A database of clinical data map from SELDI-TOF
Proteomics identifications database PeptideAtlas
Disease resource Online mendelian inheritance in man GeneCards
Cancer gene census
A database of protein and peptide identifications that have been described in the scientific literature A multiorganism, publicly accessible compendium of peptides identified in a large set of tandem mass spectrometry proteomics experiments
http://home.ccr. cancer.gov/ncifda proteomics/ ppatterns.asp http://www.ebi. ac.uk/pride/
16381953
http://www. peptideatlas.org
16381952
A database of human genes and genetic disorders
http://www.ncbi.nlm. nih.gov/entrez/query. fcgi?db = OMIM
17170002
An integrated database of human genes that includes automatically mined genomic, proteomic, and transcriptomic information A catalogue those genes for which mutations have been causally implicated in cancer
http://www.genecards. 15608261 org/index.shtml
http://www.sanger. ac.uk/genetics/CGP/ Census/
14993899
Two-dimensional electrophoresis is perhaps the most popular start-up tool for proteome analysis. For clinical proteomics, 2DE has been the traditional workhorse of proteomics used for the analysis of different clinical specimens ranging from plasma to urine (Table 1). Quantification problems in 2DE are now solved by employing fluorescent dyes (cy3 and cy5), which allow normalization
Overview and Introduction to Clinical Proteomics
21
of data obtained from two different clinical specimens (79). Freedman and Lilley (Chapter 6) present general optimization conditions for differential in gel electrophoresis (DIGE) in the quantitative analysis of clinical samples. They address the usefulness of differentially labeling dyes (Cy2, Cy3, and Cy5). The essence of any DIGE system is to minimize any potential human errors in the process of identification and quantification of proteins spotted in a 2D gel (79). The difficulties in 2D map analysis are introduced by Marengo et al. (Chapter 16). They describe methods for comparing protein spots using image analysis technology and related informatics tools to minimize variations between measurements of spot volume, a key to successful 2D map construction. There are many variations of LC in protein profiling, including mass detection methods, column types, data mining through search engines, mass accuracy, and running conditions (80,81,82). These are all related to quantification of proteins or peptides in the sample, one of the major bottlenecks in proteomics (83,84,85,86,87). Among the several techniques are isotope-coded affinity tags (ICAT), mass-coded affinity tagging, and nonisotope labeled methods. Xiao and Veenstra (Chapter 10) present the application of ICAT in the course of COX-2 inhibitor regulated proteins in a colon cancer cell line. With emphasis on sample preparation, they provide details on ICAT procedures for quantitative proteomics (88). In addition to this approach, Li et al. (Chapter 11) employ a strategy, which combines LCM techniques for sample preparation of HCC and cleavable isotope-coded affinity tags in order to identify those markers quantitatively. However, it should be mentioned here that some other measures are needed to increase the efficiency of ICAT since it has drawbacks in the efficiency of sample recovery during or after labeling steps (87). A label-free serum quantification method has been recently introduced (48) (See Chapter 12 by Higgs et al.). The use of antibody arrays in clinical proteomics has increased recently in the context of high-throughput detection of cancer specimens where the identities of the proteins of interest are known (89,90). The evaluation of antibody crossreactivity and specificity is very crucial in these assays. This matter is addressed by Sanchez-Carbayo (Chapter 15), where technical aspects and application of planar antibody arrays in the quantification of serum proteins is described as well as by Hsu et al. (Chapter 14) where the development and use of beadbased miniaturized multiplexed sandwich immunoassays for focused protein profiling in various body fluids is provided. The latter method using beadbased protein arrays or suspension microarray allows the simultaneous analysis of a variety of parameters within a single experiment. With the versatility of suspension microarray in the analysis of proteins of interest present in different types of body fluids ranging from serum to synovial fluids, this multiplexed protein profiling technology described by Hsu et al. (Chapter 14) seems to hold a great promise in clinical proteomics. Similarly, in combination with
22
Paik et al.
tissue microarrays technology (91) it would also be possible to perform parallel molecular profiling of clinical samples together with immunohistochemistry, fluorescence in situ hybridization, or RNA in situ hybridization. SELDI is another arena of high-throughput profiling of clinical samples in the course of disease marker discovery [(92,93), Chapter 7]. It is expected that profiling approaches in proteomics, such as SELDI-MS, will be frequently used in disease marker discovery, but only if the proper identification technologies coupled with SELDI are improved. During the course of biomarker discovery, large data sets are usually generated and deposited in a coordinated fashion (Tables 2 and 3) (94,95). Indeed, statistical analysis of 2DE proteomics, which produce several hundred protein spots, is complex. To circumvent some inconsistency in 2D gel proteomics data, Friedman and Lilley (Chapter 6) and Carpentier et al. (Chapter 17) point out available statistical tools and suggest case-specific guidelines for 2D gel spot analysis. Fitzgibbon et al. (Chapter 19) describe an open source platform for LC-MS spectra where the msInspector program is used to lower false positives and guide normalization of the dataset. It is also demonstrated that msInspect can analyze data from quantitative studies with and without isotopic labels. Paliakasis et al. (Chapter 18) introduce web-based tools for protein classification, which lead to prediction of potential protein function and family clustering of related proteins. They provide some guidelines to classification of protein data into more meaningful families. Finally, Somorjai (Chapter 20) addresses important filtering criteria for the application of protein pattern recognition to biomarker discovery using statistical tools. 5. Concluding Remarks Although there are several bottlenecks in clinical proteomics (such as lack of standardization of sample specimen process, quantification, and overall strategy for tackling post-identification of biomarkers), we believe that the field holds great promise in biomarker discovery. The success of clinical proteomics depends on the availability and selection of well-phenotyped specimens, reduction of sample complexity, development of good informatics tools, and efficient data management. Therefore, sample handling techniques including microdissection for tissue sample, multidimensional fractionation for body fluids, and pretreatment of other clinical specimens (e.g., urine, tears, and cells) should be developed in this context. Since there is no gold standard for sample collection and handling, one needs to find the best options available for sample processing without damage. In addition, establishment of a biorepository system would systematically minimize some artifacts and variation between samples during or after identification of biomarkers.
Overview and Introduction to Clinical Proteomics
23
It is now generally accepted that an ensemble (or panel) of different proteins would be more efficient than a single protein/peptide in the diagnosis of disease, an idea which is poised to replace the conventional concept of a biomarker. As a high-throughput way of protein profiling, the use of antibody arrays in clinical proteomics has recently increased in regard to detection of cancer specimens. However, in the use of antibody arrays to profile serum autoantibodies, issues of cross-reactivity and specificity have to be resolved. Although not covered here due to space limitations, with the advent of proteomics techniques one can further analyze a network of protein–protein interaction as well as post-translational modifications of those proteins involved in a specific disease (Table 3). It is now highly recommended that common reagents such as antibodies and standard proteins, which are very useful for spiking purposes, quantification work, and sensitivity normalization of one machine to another be used in worldwide efforts like human proteome organization plasma proteome project (96,97). Finally, clinical proteomics needs the integration of biochemistry, pathology, analytical technology, bioinformatics, and proteome informatics to develop highly sensitive diagnostic tools for routine clinical care in the future (71,98). Acknowledgments This study was supported by a grant from the Korea Health 21 R&D project, Ministry of Health & Welfare, Republic of Korea (A030003 to YKP). References 1. Etzioni, R., Urban, N., Ramsey, S., McIntosh, M., Schwartz, S., Reid, B., Radich, J., Anderson, G., and Hartwell, L. (2003) The case for early detection. Nat. Rev. Cancer 3, 1–10. 2. Ludwig, J. A. and Weinstein, J. N. (2005) Biomarkers in cancer staging, prognosis and treatment selection. Nat. Rev. Cancer 5, 845–856. 3. Xiao, Z., Prieto, D., Conrads, T. P., Veenstra, T. D., and Issaq, H. J. (2005) Proteomic patterns: their potential for disease diagnosis. Mol. Cell Endocrinol. 230, 95–106. 4. Rifai, N., Gillette, M. A., and Carr, S. A. (2006) Protein biomarker discovery and validation: the long and uncertain path to clinical utility. Nat. Biotechnol. 24, 97–983. 5. Emmert-Buck, M. R., Bonner, R. F., Smith, P. D., Chuaqui, R. F., Zhuang, Z., Goldstein, S. R., Weiss, R. A., and Liotta, L. A. (1996) Laser capture microdissection. Science 274, 998–1001. 6. Gillespie, J. W., Ahram, M., Best, C. J., Swalwell, J. I., Krizman, D. B., Petricoin, E. F., Liotta, L. A., and Emmert-Buck, M. R. (2001) The role of tissue microdissection in cancer research. Cancer J. 7, 32–39.
24
Paik et al.
7. Craven, R. A. and Banks, R. E. (2002) Use of laser capture microdissection to selectively obtain distinct populations of cells for proteomic analysis. Methods Enzymol. 356, 33–49. 8. Vincourt, J. B., Lionneton, F., Kratassiouk, G., Guillemin, F., Netter, P., Mainard, D., and Magdalou, J. (2006) Establishment of a reliable method for direct proteome characterization of human articular cartilage. Mol. Cell Proteomics 5, 1984–1995. 9. Platt, M. S., Agamanolis, D. P., Krill, C. E. Jr., Boeckman, C., Potter, J. L., Robinson, H., and Lloyd, J. (1983) Occult hepatic sinusoid tumor of infancy simulating neuroblastoma. Cancer 52, 1183–1189. 10. Mahadevia, P. J., Fleisher, L. A., Frick, K. D., Eng, J., Goodman, S. N., and Powe, N. R. (2003) Lung cancer screening with helical computed tomography in older adult smokers: a decision and cost-effectiveness analysis. JAMA 289, 313–322. 11. Hood, B. L., Darfler, M. M., Guiel, T. G., Furusato, B., Lucas, D. A., Ringeisen, B. R., Sesterhenn, I. A., Conrads, T. P., Veenstra, T. D., and Krizman, D. B. (2005) Proteomic analysis of formalin-fixed prostate cancer tissue. Mol. Cell Proteomics 4, 1741–1753. 12. Alaiya, A., Al-Mohanna, M., and Linder, S. (2005) Clinical cancer proteomics: promises and pitfalls. J. Proteome Res. 4, 1213–1222. 13. Gericke, B., Raila, J., Sehouli, J., Haebel, S., Konsgen, D., Mustea, A., and Schweigert, F. J. (2005) Microheterogeneity of transthyretin in serum and ascitic fluid of ovarian cancer patients. BMC Cancer 17, 133–141. 14. Swisher, E. M., Wollan, M., Mahtani, S. M., Willner, J. B., Garcia, R., Goff, B. A., and King, M. C. (2005) Tumor-specific p53 sequences in blood and peritoneal fluid of women with epithelial ovarian cancer. Am. J. Obstet. Gynecol. 193, 662–667. 15. Pisitkun, T., Johnstone, R., and Knepper, M. A. (2006) Discovery of urinary biomarkers. Mol. Cell Proteomics 5, 1760–1771. 16. Ghafouri, B., Irander, K., Lindbom, J., Tagesson, C., and Lindahl, M. (2006) Comparative proteomics of nasal fluid in seasonal allergic rhinitis. J. Proteome Res. 5, 330–338. 17. Koo, B. S., Lee, D. Y., Ha, H. S., Kim, J. C., and Kim, C. W. (2005) Comparative analysis of the tear protein expression in blepharitis patients using two-dimensional electrophoresis. J. Proteome Res. 4, 719–724. 18. Grus, F. H., Podust, V. N., Bruns, K., Lackner, K., Fu, S., Dalmasso, E. A., Wirthlin, A., and Pfeiffer, N. (2005) SELDI-TOF-MS ProteinChip array profiling of tears from patients with dry eye. Invest. Ophthalmol. Vis. Sci. 46, 863–876. 19. Amado, F. M., Vitorino, R. M., Domingues, P. M., Lobo, M. J., and Duarte, J. A. (2005) Analysis of the human saliva proteome. Expert Rev. Proteomics 2, 521–539. 20. Wang, T. H., Chang, Y. L., Peng, H. H., Wang, S. T., Lu, H. W., Teng, S. H., Chang, S. D., and Wang, H. S. (2005) Rapid detection of fetal aneuploidy using proteomics approaches on amniotic fluid supernatant. Prenat. Diagn. 25, 559–566. 21. Ruetschi, U., Rosen, A., Karlsson, G., Zetterberg, H., Rymo, L., Hagberg, H., and Jacobsson, B. (2005) Proteomic analysis using protein chips to detect
Overview and Introduction to Clinical Proteomics
22.
23. 24.
25.
26.
27.
28.
29.
30. 31.
32.
33.
34.
25
biomarkers in cervical and amniotic fluid in women with intra-amniotic inflammation. J. Proteome Res. 4, 2236–2242. Kim, Y. S., Kim, M. S., Lee, S. H., Choi, B. C., Lim, J. M., Cha, K. Y., and Baek, K. H. (2006) Proteomic analysis of recurrent spontaneous abortion: identification of an inadequately expressed set of proteins in human follicular fluid. Proteomics 6, 3445–3454. Pilch, B. and Mann, M. (2006) Large-scale and high-confidence proteomic analysis of human seminal plasma. Genome Biol. 7, R40 Varnum, S. M., Covington, C. C., Woodbury, R. L., Petritis, K., Kangas, L. J., Abdullah, M. S., Pounds, J. G., Smith, R. D., and Zangar, R. C. (2003) Proteomic characterization of nipple aspirate fluid: identification of potential biomarkers of breast cancer. Breast Cancer Res. Treat. 80, 87–97. Zheng, P. P., Luider, T. M., Pieters, R., Avezaat, C. J., van den Bent, M. J., Sillevis Smitt, P. A., and Kros, J. M. (2003) Identification of tumor-related proteins by proteomic analysis of cerebrospinal fluid from patients with primary brain tumors. J. Neuropathol. Exp. Neurol. 62, 855–862. Gibson, D. S., Blelock, S., Brockbank, S., Curry, J., Healy, A., McAllister, C., and Rooney, M. E. (2006) Proteomic analysis of recurrent joint inflammation in juvenile idiopathic arthritis. J. Proteome Res. 5, 1988–1995. Merkel, D., Rist, W., Seither, P., Weith, A., and Lenter, M. C. (2005) Proteomic study of human bronchoalveolar lavage fluids from smokers with chronic obstructive pulmonary disease by combining surface-enhanced laser desorption/ionization-mass spectrometry profiling with mass spectrometric protein identification. Proteomics 5, 2972–2980. Wu, J., Kobayashi, M., Sousa, E. A., Liu, W., Cai, J., Goldman, S. J., Dorner, A. J., Projan, S. J., Kavuru, M. S., Qiu, Y., and Thomassen, M. J. (2005) Differential proteomic analysis of bronchoalveolar lavage fluid in asthmatics following segmental antigen challenge. Mol. Cell Proteomics 4, 1251–1264. Tyan, Y. C., Wu, H. Y., Lai, W. W., Su, W. C., and Liao, P. C. (2005) Proteomic profiling of human pleural effusion using two-dimensional nano liquid chromatography tandem mass spectrometry. J. Proteome Res. 4, 1274–1286. Khalil, A. A. and James, P. (2007) Biomarker discovery: a proteomic approach for brain cancer profiling. Cancer Sci. 98, 201–213. Khodavirdi, A. C., Song, Z., Yang, S., Zhong, C., Wang, S., Wu, H., Pritchard, C., Nelson, P. S., and Roy-Burman, P. (2006) Increased expression of osteopontin contributes to the progression of prostate cancer. Cancer Res. 66, 883–888. Vincourt, J. B., Lionneton, F., Kratassiouk, G., Guillemin, F., Netter, P., Mainard, D., and Magdalou, J. (2006) Establishment of a reliable method for direct proteome characterization of human articular cartilage. Mol. Cell Proteomics 5, 1984–1995. Lee, Y. J., Rice, R. H., and Lee, Y. M. (2006) Proteome analysis of human hair shaft: from protein identification to post-translational modification. Mol. Cell Proteomics 5, 789–800. Cho, S. Y., Lee, E. Y., Lee, J. S., Kim, H. Y., Park, J. M., Kwon, M. S., Park, Y. K., Lee, H. J., Kang, M. J., Kim, J. Y., Yoo, J. S., Park, S. J., Cho, J. W., Kim, H. S., and
26
35.
36.
37. 38. 39.
40.
41.
42.
43.
44.
45.
46.
47.
Paik et al. Paik, Y. K. (2005) Efficient prefractionation of low-abundance proteins in human plasma and construction of a two-dimensional map. Proteomics 5, 3386–3396. Lathrop, J. T., Hayes, T. K., Carrick, K., and Hammond, D. J. (2005) Rarity gives a charm: evaluation of trace proteins in plasma and serum. Expert Rev. Proteomics 2, 393–406. Lee, H. J., Lee, E. Y., Kwon, M. S., and Paik, Y. K. (2006) Biomarker discovery from the plasma proteome using multidimensional fractionation proteomics. Curr. Opin. Chem. Biol. 10, 42–49. Anderson, N. L. and Anderson, N. G. (2002) The human plasma proteome: history, character, and diagnostic prospects. Mol. Cell Proteomics 1, 845–867. Hu, S., Loo, J. A., and Wong, D. T. (2006) Human body fluid proteome analysis. Proteomics 6, 6326–6353. Park, M. R., Wang, E. H., Jin, D. C., Cha, J. H., Lee, K. H., Yang, C. W., Kang, C. S., and Choi, Y. J. (2006) Establishment of a 2-D human urinary proteomic map in IgA nephropathy. Proteomics 6, 1066–1076. Tammen, H., Schutle, I., Hess, R., Menzel, C., Kellmann, M., and SchulzKnappe, P. (2005) Prerequisites for peptidomic analysis of blood samples: I. Evaluation of blood specimen qualities and determination of technical performance characteristics. Comb. Chem. High Trhoughput Screen 8, 725–733. Rai, A. J., Gelfand, C. A., Haywood, B. C., Warunek, D. J., Yi, J., Schuchard, M. D., Mehigh, R. J., Cockrill, S. L., Scott, G. B., Tammen, H., Schulz-Knappe, P., Speicher, D. W., Vitzthum, F., Haab, B. B., Siest, G., and Chan, D. W. (2005) HUPO plasma proteome project specimen collection and handling: towards the standardization of parameters for plasma proteome samples. Proteomics 5, 3262–3277. Zhou, M., Lucas, D. A., Chan, K. C., Issaq, H. J., Petricoin, E. F. 3rd, Liotta, L. A., Veenstra, T. D., and Conrads, T. P. (2004) An investigation into the human serum “interactome”. Electrophoresis 25, 1289–1298. Findeisen, P., Sismanidis, D., Riedl, M., Costina, V., and Neumaier, M. (2005) Preanalytical impact of sample handling on proteome profiling experiments with matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Clin. Chem. 51, 2409–2411. Park, K. S., Kim, H., Kim, N. G., Cho, S. Y., Choi, K. H., Seong, J. K., and Paik, Y. K. (2002) Proteomic analysis and molecular characterization of tissue ferritin light chain in hepatocellular carcinoma. Hepatology 35, 1459–1466. Park, K. S., Cho, S. Y., Kim, H., and Paik, Y. K. (2002) Proteomic alterations of the variants of human aldehyde dehydrogenase isozymes correlate with hepatocellular carcinoma. Int. J. Cancer 97, 261–265. Marko-Varga, G., Berglund, M., Malmstrom, J., Lindberg, H., and Fehniger, T. E. (2003) Targeting hepatocytes from liver tissue by laser capture microdissection and proteomics expression profiling. Electrophoresis 24, 3800–3805. Paradis, V., Degos, F., Dargere, D., Pham, N., Belghiti, J., Degott, C., Janeau, J. L., Bezeaud, A., Delforge, D., Cubizolles, M., Laurendeau, I., and Bedossa, P. (2005) Identification of a new biomarker of hepatocellular carcinoma by serum protein profiling of patients with chronic liver diseases. Hepatology 41, 40–47.
Overview and Introduction to Clinical Proteomics
27
48. Ru, Q. C., Zhu, L. A., Silberman, J., and Shriver, C. D. (2006) Label-free semiquantitative peptide feature profiling of human breast cancer and breast disease sera via two-dimensional liquid chromatography–mass spectrometry. Mol. Cell Proteomics 5, 1095–1104. 49. Azad, N. S., Rasool, N., Annuziata, C. M., Minasian, L., Whiteley, G., and Kohn, E. C. (2006) Proteomics in clinical trials and practice: present uses and future promise. Mol. Cell Proteomics 5, 1819–1829. 50. Gunter, E. W. (1997) Biological and environmental specimen banking at the Centers for Disease Control and Prevention. Chemosphere 34, 1945–1953. 51. Strauss, G. H. and Kelly, S. J. (1990) The development of the U.S. EPA health effects research laboratory frozen blood cell repository program. Mutat. Res. 234, 349–354. 52. Romeo, M. J., Espina, V., Lowenthal, M., Espina, B. H., Petricoin, E. F. 3rd, and Liotta, L. A. (2005) CSF proteome: a protein repository for potential biomarker identification. Expert Rev. Proteomics 2, 57–70. 53. Conrads, T. P., Hood, B. L., Petricoin, E. F. 3rd, Liotta, L. A., and Veenstra, T. D. (2005) Cancer proteomics: many technologies, one goal. Expert Rev. Proteomics 2, 693–703. 54. Schrader, M. and Selle, H. (2006) The process chain for peptidomic biomarker discovery. Dis. Markers 22, 27–37. 55. Danna, E. A. and Nolan, G. P. (2006) Transcending the biomarker mindset: deciphering disease mechanisms at the single cell level. Curr. Opin. Chem. Biol. 10, 20–27. 56. De Masi, S., Tosti, M. E., and Mele, A. (2005) Screening for hepatocellular carcinoma. Dig. Liver Dis. 37, 260–268. 57. Yamaguchi, K., Nagano, M., Torada, N. Hamasaki, N., Kawakita, M., and Tanaka, M. (2004) Urine diacetylspermine as a novel tumor marker for pancreatobiliary carcinomas. Rinsho. Byori. 52, 336–339 58. Dabrowska, M., Grubek-Jaworska, H., Domagala-Kulawik, J., Bartoszewicz, Z., Kondracka, A., Krenke, R., Nejman, P., and Chazan, R. (2004) Diagnostic usefulness of selected tumor markers (CA125, CEA, CYFRA 21–1) in bronchoalveolar lavage fluid in patients with non-small cell lung cancer. Pol. Arch. Med. Wewn 111, 659–665. 59. Gann, P. H., Hennekens, C. H., and Stampfer, M. J. (1995) A prospective evaluation of plasma prostate-specific antigen for detection of prostatic cancer. JAMA 273, 289–294 60. Ciambellotti, E., Coda, C., and Lanza, E. (1993) Determination of CA 15–3 in the control of primary and metastatic breast carcinoma. Minerva Med. 84, 107–112. 61. Linkov, F., Lisovich, A., Yurkovetsky, Z., Marrangoni, A., Velikokhatnaya, L., Nolen, B., Winans, M., Bigbee, W., Siegfried, J., Lokshin, A., and Ferris, R. L. (2007) Early detection of head and neck cancer: development of a novel screening tool using multiplexed immunobead-based biomarker profiling. Cancer Epidemiol. Biomarkers Prev. 16, 102–107. 62. Casiano, C. A., Mediavilla-Varela, M., and Tan, E. M. (2006) Tumor-associated antigen arrays for the serological diagnosis of cancer. Mol. Cell Proteomics 5, 1745–1759.
28
Paik et al.
63. Nissom, P. M., Lo, S. L., Lo, J. C., Ong, P. F., Lim, J. W., Ou, K., Liang, R. C., Seow, T. K., and Chung, M. C. (2006) Hcc-2, a novel mammalian ER thioredoxin that is differentially expressed in hepatocellular carcinoma. FEBS Lett. 580, 2216– 2226. 64. Feng, J. T., Liu, Y. K., Song, H. Y., Dai, Z., Qin, L. X., Almofti, M. R., Fang, C. Y., Lu, H. J., Yang, P. Y., and Tang, Z. Y. (2005) Heat-shock protein 27: a potential biomarker for hepatocellular carcinoma identified by serum proteome analysis. Proteomics 5, 4581–1588. 65. Li, D. Q., Wang, L., Fei, F., Hou, Y. F., Luo, J. M., Wei-Chen, Zeng, R., Wu, J., Lu, J. S., Di, G. H., Ou, Z. L., Xia, Q. C., Shen, Z. Z., and Shao, Z. M. (2006) Identification of breast cancer metastasis-associated proteins in an isogenic tumor metastasis model using two-dimensional gel electrophoresis and liquid chromatography-ion trap-mass spectrometry. Proteomics 6, 3352–3368. 66. Lee, I. N., Chen, C. H., Sheu, J. C., Lee, H. S., Huang, G. T., Yu, C. Y., Lu, F. J., and Chow, L. P. (2005) Identification of human hepatocellular carcinomarelated biomarkers by two-dimensional difference gel electrophoresis and mass spectrometry. J. Proteome Res. 4, 2062–2069. 67. Righetti, P. G., Castagna, A., Antonucci, F., Piubelli, C., Cecconi, D., Campostrini, N., Rustichelli, C., Antonioli, P., Zanusso, G., Monaco, S., Lomas, L., and Boschetti, E. (2005) Proteome analysis in the clinical chemistry laboratory: myth or reality? Clin. Chim. Acta 357, 123–139. 68. Jang, J. S., Cho, H. Y., Lee, Y. J., Ha, W. S., and Kim, H. W. (2004) The differential proteome profile of stomach cancer: identification of the biomarker candidates. Oncol. Res. 14, 491–499. 69. Steel, L. F., Shumpert, D., Trotter, M., Seeholzer, S. H., Evans, A. A., London, W. T., Dwek, R., and Block, T. M. (2003) A strategy for the comparative analysis of serum proteomes for the discovery of biomarkers for hepatocellular carcinoma. Proteomics 3, 601–609. 70. Yip, T. T., Chan, J. W., Cho, W. C., Yip, T. T., Wang, Z., Kwan, T. L., Law, S. C., Tsang, D. N., Chan, J. K., Lee, K. C., Cheng, W. W., Ma, V. W., Yip, C., Lim, C. K., Ngan, R. K., Au, J. S., Chan, A., Lim, W. W., and Ciphergen SARS Proteomics Study Group (2005) Protein chip array profiling analysis in patients with severe acute respiratory syndrome identified serum amyloid a protein as a biomarker potentially useful in monitoring the extent of pneumonia. Clin. Chem. 51, 47–55. 71. Anderson, L. and Hunter, C. L. (2005) Quantitative mass spectrometric multiple reaction monitoring assays for major plasma proteins. Mol. Cell Proteomics 5, 573–588. 72. Lee, J. W., Figeys, D., and Vasilescu, J. (2007) Biomarker assay translation from discovery to clinical studies in cancer drug development: quantification of emerging protein biomarkers. Adv. Cancer Res. 96, 269–298. 73. Zolg, W. (2006) The proteomic search for diagnostic biomarkers: lost in translation? Mol. Cell Proteomics 5, 1720–1726.
Overview and Introduction to Clinical Proteomics
29
74. Bensmail, H., Golek, J., Moody, M. M., Semmes, J. O., and Haoudi, A. (2005) A novel approach for clustering proteomics data using Bayesian fast Fourier transform. Bioinformatics 21, 2210–2224. 75. Ward, D. G., Cheng, Y., N’Kontchou, G., Thar, T. T., Barget, N., Wei, W., Billingham, L. J., Martin, A., Beaugrand, M., and Johnson, P. J. (2006) Changes in the serum proteome associated with the development of hepatocellular carcinoma in hepatitis C-related cirrhosis. Br. J. Cancer 94, 287–292. 76. Lin, N. and Zhao, H. (2005) Are scale-free networks robust to measurement errors? BMC Bioinformatics 6, 119. 77. Castagna, A., Cecconi, D., Sennels, L., Rappsilber, J., Guerrier, L., Fortis, F., Boschetti, E., Lomas, L., and Righetti, P. G. (2005) Exploring the hidden human urinary proteome via ligand library beads. J. Proteome Res. 4, 1917–1930. 78. Rauch, A., Bellew, M., Eng, J., Fitzgibbon, M., Holzman, T., Hussey, P., Igra, M., Maclean, B., Lin, C. W., Detter, A., Fang, R., Faca, V., Gafken, P., Zhang, H., Whiteaker, J., States, D., Hanash, S., Paulovich, A., and McIntosh, M. W. (2006) Computational proteomics analysis system (CPAS): an extensible open source analytic system for evaluating and publishing proteomic data and high throughput biological experiments. J. Proteome Res. 5, 112–121. 79. Lilley, K. S. and Friedman, D. B. (2004) All about DIGE: quantification technology for differential-display 2D-gel proteomics. Expert Rev. Proteomics 1, 401–409. 80. Qian, W. J., Jacobs, J. M., Liu, T., Camp, D. G. 2nd, and Smith, R. D. (2006) Advances and challenges in liquid chromatography-mass spectrometrybased proteomics profiling for clinical applications. Mol. Cell Proteomics 5, 1727–1744. 81. Powell, D. W., Merchant, M. L., and Link, A. J. (2006) Discovery of regulatory molecular events and biomarkers using 2D capillary chromatography and mass spectrometry. Expert Rev. Proteomics 3, 63–74. 82. Andre, M., Le Caer, J. P., Greco, C., Planchon, S., El Nemer, W., Boucheix, C., Rubinstein, E., Chamot-Rooke, J., and Le Naour, F. (2006) Proteomic analysis of the tetraspanin web using LC-ESI-MS/MS and MALDI-FTICR-MS. Proteomics 6, 1437–1449. 83. Greengauz-Roberts, O., Stoppler, H., Nomura, S., Yamaguchi, H., Goldenring, J. R., Podolsky, R. H., Lee, J. R., and Dynan, W. S. (2005) Saturation labeling with cysteine-reactive cyanine fluorescent dyes provides increased sensitivity for protein expression profiling of laser-microdissected clinical specimens. Proteomics 5, 1746–1757. 84. Heck, A. J. and Krijgsveld, J. (2004) Mass spectrometry-based quantitative proteomics. Expert Rev. Proteomics 1, 317–326. 85. Schneider, L. V. and Hall, M. P. (2005) Stable isotope methods for high-precision proteomics. Drug Discov. Today 10, 353–363. 86. Zhang, J., Goodlett, D. R., Peskind, E. R., Quinn, J. F., Zhou, Y., Wang, Q., Pan, C., Yi, E., Eng, J., Aebersold, R. H., and Montine, T. J. (2005) Quantitative proteomic analysis of age-related changes in human cerebrospinal fluid. Neurobiol Aging 26, 207–227.
30
Paik et al.
87. Liu, T., Qian, W. J., Strittmatter, E. F., Camp, D. G. 2nd, Anderson, G. A., Thrall. B. D., and Smith, R. D. (2004) High-throughput comparative proteome analysis using a quantitative cysteinyl-peptide enrichment technology. Anal. Chem. 76, 5345–5353. 88. Li, C., Hong, Y., Tan, Y. X., Zhou, H., Ai, J. H., Li, S. J., Zhang, L., Xia, Q. C., Wu, J. R., Wang, H. Y., and Zeng, R. (2004) Accurate qualitative and quantitative proteomic analysis of clinical hepatocellular carcinoma using laser capture microdissection coupled with isotope-coded affinity tag and two-dimensional liquid chromatography mass spectrometry. Mol. Cell Proteomics 3, 399–409. 89. Sheehan, K. M., Calvert, V. S., Kay, E. W., Lu, Y., Fishman, D., Espina, V., Aquino. J., Speer, R., Araujo, R., Mills, G. B., Liotta, L. A., Petricoin, E. F. 3rd, and Wulfkuhle, J. D. (2005) Use of reverse phase protein microarrays and reference standard development for molecular network analysis of metastatic ovarian carcinoma. Mol. Cell Proteomics 4, 346–355. 90. Knezevic, V., Leethanakul, C., Bichsel, V. E., Worth, J. M., Prabhu, V. V., Gutkind, J. S., Liotta, L. A., Munson, P. J., Petricoin, E. F. 3rd, and Krizman, D. B. (2001) Proteomic profiling of the cancer microenvironment by antibody arrays. Proteomics 1, 1271–1278. 91. Sharma-Oates, A., Quirke, P., Westhead, D. R. (2005) TmaDB: a repository for tissue microarray data. BMC Bioinformatics 6, 218. 92. Rai, A. J., Stemmer, P. M., Zhang, Z., Adam, B. L., Morgan, W. T., Caffrey, R. E., Podust, V. N., Patel, M., Lim, L. Y., Shipulina, N. V., Chan, D. W., Semmes, O. J., and Leung, H. C. (2005) Analysis of human proteome organization plasma proteome project (HUPO PPP) reference specimens using surface enhanced laser desorption/ionization-time of flight (SELDI-TOF) mass spectrometry: multiinstitution correlation of spectra and identification of biomarkers. Proteomics 5, 3467–3474. 93. Engwegen, J. Y., Gast, M. C., Schellens, J. H., and Beijnen, J. H. (2006) Clinical proteomics: searching for better tumour markers with SELDI-TOF mass spectrometry. Trends Pharmacol. Sci. 27, 251–259. 94. Domon, B. and Aebersold, R. (2006) Mass spectrometry and protein analysis. Science 312, 212–217. 95. Domon, B. and Aebersold, R. (2006) Challenges and opportunities in proteomics data analysis. Mol. Cell Proteomics 5, 1921–1926. 96. Uhlen, M. and Ponten, F. (2005) Antibody-based proteomics for human tissue profiling. Mol. Cell Proteomics 4, 384–393. 97. Taussig, M. J., Stoevesandt, O., Borrebaeck, C. A., Bradbury, A. R., Cahill, D., Cambillau, C., de Daruvar, A., Dubel, S., Eichler, J., Frank, R., Gibson, T. J., Gloriam, D., Gold, L., Herberg, F. W., Hermjakob, H., Hoheisel, J. D., Joos, T. O., Kallioniemi, O., Koegll, M., Konthur, Z., Korn, B., Kremmer, E., Krobitsch, S., Landegren, U., van der Maarel, S., McCafferty, J., Muyldermans, S., Nygren, P. A., Palcy, S., Pluckthun, A., Polic, B., Przybylski, M., Saviranta, P., Sawyer, A., Sherman, D. J., Skerra, A., Templin, M., Ueffing, M., and Uhlen, M. (2007)
Overview and Introduction to Clinical Proteomics
31
ProteomeBinders: planning a European resource of affinity reagents for analysis of the human proteome. Nat. Methods 4, 13–17. 98. Ilyin, S. E., Belkowski, S. M., and Plata-Salaman, C. R. (2004) Biomarker discovery and validation: technologies and integrative approaches. Trends Biotechnol. 22, 411–416.
I Specimen Collection for Clinical Proteomics
2 Specimen Collection and Handling Standardization of Blood Sample Collection Harald Tammen
Summary Preanalytical variables can alter the analysis of blood-derived samples. Prior to the analysis of a blood sample, multiple steps are necessary to generate the desired specimen. The choice of blood specimens, its collection, handling, processing, and storage are important aspects since these characteristics can have a tremendous impact on the results of the analysis. The awareness of clinical practices in medical laboratories and the current knowledge allow for identification of specific variables that affect the results of a proteomic study. The knowledge of preanalytical variables is a prerequisite to understand and control their impact.
Key Words: blood; plasma; serum; proteomics; specimen; preanalytical variables.
1. Introduction Proteomic analysis of blood specimens by semi-quantitative multiplex techniques offers a valuable approach for discovery of disease or therapyrelated biomarkers (1,2). Based on reproducible separation of proteins by their physical–chemical properties in combination with semi-quantitative detection methods and bioinformatic data analysis, proteomics allows for sensitive measurement of proteins in blood specimens (3). Blood can be regarded as a complex liquid tissue that comprises cells and extracellular fluid (4). The choice of a suitable specimen-collection protocol is crucial to minimize artificial processes (e.g., cell lysis, proteolysis) occurring during specimen collection and preparation (5). Preanalytic procedures can alter the analysis of blood-derived From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols Edited by: A. Vlahou © Humana Press, Totowa, NJ
35
36
Tammen
samples. These procedures comprise the processes prior to actual analysis of the sample and include steps needed to obtain the primary sample (e.g., blood) and the analytical specimen (e.g., plasma, serum, cells). Legal or ethical issues (e.g., importance of informed consents) or potential risks of phlebotomy (e.g., bleeding) are not covered in this article. 1.1. Collection of Blood Samples It has been reported that the most frequent faults in the preanalytical phase are the result of erroneous procedures of sample collection (e.g., drawing blood from an infusive line resulting in sample dilution) (6). The design of blood collection devices may aid in correct sampling: evacuated containers sustain the draw of accurate quantity of blood to ensure the correct concentration of additives or the correct dilution of the blood, such as in the case of citrated plasma. The speed of blood draw is also controlled and restricts the mechanical stress. The favored site of collection is the median cubital vein, which is generally easily found and accessed. As such, it will be most comfortable to the patient, and should not evoke additional stress. Preparation of the collection site includes proper cleaning of the skin with alcohol (2-propanol). The alcohol must be allowed to evaporate, since commingling of the remaining alcohol with blood sample may result in hemolysis, raise the levels of distinct analytes, and cause interferences. The position of the patient (standing, lying, sitting) can affect the hematocrit (7), and hence may change the concentration of the analytes. Tourniquet should be applied 3–4 inches above the site of venipuncture and should be released as soon as blood begins flowing into the collection device. The duration of venous occlusion (>1 min) can affect the sample composition. Prolonged occlusion may result in hemoconcentration and subsequently increase the miscellaneous analytes, e.g., total protein levels. Blood should be collected from fasting patients in the morning between 7 and 9 a.m., because ingestion or circadian rhythms can alter the concentration of analytes considerably (e.g., total protein, hemoglobin, myoglobin). 1.2. Characteristics of Serum and Plasma Specimens Serum is one of the most frequently analyzed blood specimens. The generation of serum is time consuming and associated with the activation of coagulation cascade and complement system. These processes influence the composition of the samples, because they result in cell lysis (e.g., thrombocytes, erythrocytes). As a consequence, the concentration of components in the extracellular fluid, such as aspartate-aminotransferase, serotonin, neuronspecific enolase, and lactate-dehydrogenase, are increased (8). On the other hand, degradation of the analytes (e.g., hormones) may occur faster (9). On the
Specimen Collection and Handling
37
proteomic level, more peptides and less proteins are observed in serum when compared to plasma (10,11). Consequently, the activation of clotting cascades necessary to generate serum can lead to artefacts. A reason to use serum as a specimen is based on the notion that the proteome or peptidome of serum may reflect biological events (12). Post-sampling proteolytic cleavage products have been proposed as biomarkers, and it has been further suggested that serum peptidome is of particular diagnostic value for the detection of cancer (13). However, it has been reported that more protein changes occur in serum than in plasma (14). Thus, it can be expected that the reproducibility of such ex vivo proteolytic events is comparatively low. In contrast to serum, citrate and EDTA inhibit coagulation and other enzymatic processes by chelate formation with ions, thereby inhibiting iondependent enzymes. This is in contrast to heparin, which acts through the activation of antithrombin III. The main concern associated with heparinized plasma for proteomic studies is that it is a poly-disperse charged molecule that binds many proteins non-specifically (15,16), and may also influence separation procedures and mass spectrometric detection of peptides and small proteins due to its similar molecular weight (17). The sampling of plasma is less time consuming than the acquisition of serum. Separation of the cells and the liquid phase can be performed subsequently to sample collection since no clotting time is required (30–60 min). In comparison to serum, the amount of plasma generated from blood is approximately 10 to 20% higher. Additionally, the protein content of plasma is also higher than in serum, because of the presence of clotting factors and associated components. Furthermore, proteins may be bound to the clot, resulting in a decrease of protein concentration. 1.3. Processing of Blood Samples A quick separation of cells from the plasma is favorable, since cellular constituents may liberate substances that alter the composition of the sample. Generally, it is recommended that plasma and serum be centrifuged with 1300–2000×g for 10 min within 30 min from the collection of the sample. The temperature should generally be 15–24°C (18), unless recommended differently for distinct analytes like gastrin or A-type natriuretic peptide. Processing at 4°C appears to be attractive, because enzymatic degradation processes are reduced at low temperatures. However, platelets become activated at low temperatures (19) and release intracellular proteins and enzymes, which affect the sample composition. Thus, processing at low temperatures is safe only after thrombocytes have been removed. Since one centrifugation step may be insufficient for
38
Tammen
depletion of platelets below 10 cells/nL, a second centrifugation step (2500×g for 15 min at room temperature) or filtration step may be required to obtain platelet-poor plasma. This procedure is applicable only to plasma since the platelets in serum are already activated. 1.4. Protease Inhibitors Protease inhibitors would be attractive, but commonly used protease cocktails may introduce difficulties due to interference with mass spectrometry and formation of covalent bonds with proteins, which would result in shifting the isoform pattern (20). Protease inhibitors have been considered and investigated as additives in proteome research to prevent or slow down proteolytic processes and thereby provide a means of more sensitive detection of markers in blood (21). Even though protein integrity has been shown to be maintained by the addition of 15 commercially available protease inhibitors, the usefulness of protease inhibitors in overall protein stabilization of blood samples remains to be investigated in more detail (22). The presence of certain protease inhibitors in whole blood is toxic to live cells. Stressed, apoptotic, or necrotic cells release substances, and it may be argued that this affects the composition of serum or plasma until the cellular and soluble factions of blood are separated. However, careful selection of an appropriate protease inhibitor may solve this problem. 2. Materials 1. Twenty gauge needles and an appropriate adapter (e.g., Sarstedt, Nümbrecht, Germany) or a Vacutainer system (BD Bioscience, Franklin Lakes, USA). 2. Alcohol (2-propanol) in spray flask. 3. Swabs. 4. Examination gloves. 5. Tourniquet or sphygmomanometer. 6. Blood collection tubes (e.g., Sarstedt). 7. Centrifuge with a swinging bucket rotor (e.g., Sigma 4K15, Sigma Laborzentrifugen, Osterode, Harz). 8. A 10-mL syringe equipped with a cellulose acetate filter unit with 0.2 μm pore size and 5 cm2 filtration area (e.g., Sartorius Minisart, Sarstedt). 9. 2 mL cryo-vials. 10. Pipette and tips.
3. Methods 1. Venipuncture of a cubital vein is performed using a 20-gauge needle (diameter: 0.9 mm, e.g., butterfly system max. tubing length: 6 cm). If tourniquet is applied, it should not remain in place for longer than 1 min (risk of falsifying results due to
Specimen Collection and Handling
39
hemoconcentration). As soon as the blood flows into the container, the tourniquet has to be released at least partially. If more time is required, the tourniquet has to be released so that circulation resumes and normal skin color returns to extremity. • Prior to blood collection for proteomic analysis, blood is aspirated into the first container (e.g., 2.7 mL S-Monovette, Sarstedt, Nümbrecht, Germany). This is done to flush the surface and remove initial traces of contact-induced coagulation. This sample is not useful for analysis. • Afterward, blood is drawn into a standard EDTA or citrate-containing syringe (e.g. 9 mL EDTA-Monovette, Sarstedt, Nümbrecht, Germany). Depending on ease of blood flow, several samples can be collected. Free flow with mild aspiration should be assured to avoid haemolysis. 2. After venipuncture, plasma is obtained by centrifugation for 10 min at 2000×g at room temperature. Centrifugation should start within 30 min after blood collection. The resulting plasma sample may now be separated from red and white blood cells in an efficient and gentle way. Nevertheless, a significant number of platelets (∼25%) are still present in the sample. This requires an additional preparation step. 3. For platelet depletion, one of the following procedures has to be undertaken directly after step 2: • Platelet removal by centrifugation: The plasma sample is transferred into a second vial for another centrifugation for 15 min at 2500×g at room temperature. After centrifugation, the supernatant is transferred in aliquots of 1.5 mL into cryo vials. • Platelet removal by filtration: Plasma aliquots of 1.5 mL resulting from step 2 are transferred into 2-mL cryo vials using a 10-mL syringe equipped with a cellulose acetate filter unit with 0.2 μm pore size and 5 cm2 filtration area (e.g., Sartorius Minisart® , Sartorius, Göttingen, Germany). Filtration requires only gentle pressure. 4. Samples are transferred to an –80°C freezer within 30 min. Storage is at –80°C. Transport of samples is done on dry ice.
4. Notes 4.1. Frequently Made Mistakes 4.1.1. Blood Withdrawal • • • •
The The The The
patient was not fasting (i.e., had taken food prior to sampling). blood was drawn from an infusive line. blood was drawn in a wrong position (e.g., supine, upright). consumables used were different than those recommended.
40
Tammen
• The expiry date of consumables was already reached. • The tubes were not properly filled. • The tubes were agitated vigorously (instead of gentle shaking to dissolve the anticoagulant). • The blood sample tubes were not consistently kept at room temperature. • The sample tubes were put on ice or in a refrigerator.
. 4.1.2. Lab Handling • Centrifugation was delayed more than 30 min after blood withdrawal. • A cooling centrifuge was adjusted below room temperature. • The centrifugation speed was wrong (e.g., rounds per minute were set instead of g-force). • The centrifugation time was wrong. • The removal of blood plasma by pipetting was done without proper caution. Consequently, the buffy coat or the red blood cells were churned up. • The second centrifugation of recovered plasma samples was delayed after first centrifugation.
4.1.3. Storage of Samples • • • •
The storage of samples was delayed. The storage temperatures were above –80°C. The labeling of sample containers was unreadable or confusable. The attachment of labels to the sample containers was not proper during storage or handling resulted in loss of labels.
4.1.4. General Recommendations • A proper first centrifugation should produce a visible white blood cell layer (buffy coat) between red blood cells and plasma. If not, centrifugation speed or time may be wrong. • One should discard plasma that is icteric or exhibits signs of haemolysis. One should check with an expert if this was due to that particular disease.
References 1. Vitzthum F, Behrens F, Anderson NL, Shaw JH. (2005) Proteomics: from basic research to diagnostic application. A review of requirements and needs. J. Proteome Res. 4, 1086–97. 2. Lathrop JT, Anderson NL, Anderson NG, Hammond DJ. (2003) Therapeutic potential of the plasma proteome. Curr. Opin. Mol. Ther. 5, 250–7.
Specimen Collection and Handling
41
3. Wang W, Zhou H, Lin H, Roy S, Shaler TA, Hill LR et al. (2003) Quantification of proteins and metabolites by mass spectrometry without isotopic labeling or spiked standards. Anal. Chem. 75, 4818–26. 4. Anderson NL, Anderson NG. (2002) The human plasma proteome: history, character, and diagnostic prospects. Mol. Cell. Proteomics 1, 845–67. 5. Omenn GS. (2004) The Human Proteome Organization Plasma Proteome Project pilot phase: reference specimens, technology platform comparisons, and standardized data submissions and analyses. Proteomics 4, 1235–40. 6. Plebani M, Carraro P. (1997) Mistakes in a stat laboratory: types and frequency. Clin. Chem. 43, 1348–51. 7. Burtis CA, Ashwood E. (eds) (2001) Fundamentals of Clinical Chemistry. Saunders, Philadelphia. 8. Guder WG, Narayanan S, Wisser H, Zawata B. (2003) Samples: From the Patient to the Laboratory. The Impact of Preanalytical Variables on the Quality of Laboratory Results. GIT Verlag, Darmstadt, Germany. 9. Evans MJ, Livesey JH, Ellis MJ, Yandle TG. (2001) Effect of anticoagulants and storage temperatures on stability of plasma and serum hormones. Clin. Biochem 34, 107–12. 10. Omenn GS, States DJ, Adamski M, Blackwell TW, Menon R, Hermjakob H et al. (2005) Overview of the HUPO Plasma Proteome Project: results from the pilot phase with 35 collaborating laboratories and multiple analytical groups, generating a core dataset of 3020 proteins and a publicly-available database. Proteomics 5, 3226–45. 11. Rai AJ, Gelfand CA, Haywood BC, Warunek DJ, Yi J, Schuchard MD et al. (2005) HUPO Plasma Proteome Project specimen collection and handling: towards the standardization of parameters for plasma proteome samples. Proteomics 5, 3262–77. 12. Villanueva J, Shaffer DR, Philip J, Chaparro CA, Erdjument-Bromage H, Olshen AB et al. (2006) Differential exoprotease activities confer tumor-specific serum peptidome patterns. J. Clin. Invest. 116, 271–84. 13. Liotta LA, Petricoin EF. (2006) Serum peptidome for cancer detection: spinning biologic trash into diagnostic gold. J. Clin. Invest. 116, 26–30. 14. Tammen H, Schulte I, Hess R, Menzel C, Kellmann M, Schulz-Knappe P. (2005) Prerequisites for peptidomic analysis of blood samples: I. Evaluation of blood specimen qualities and determination of technical performance characteristics. Comb. Chem. High Throughput Screen. 8, 725–33. 15. Holland NT, Smith MT, Eskenazi B, Bastaki M. (2003) Biological sample collection and processing for molecular epidemiological studies. Mutat. Res. 543, 217–34. 16. Landi MT, Caporaso N. (1997) Sample collection, processing and storage. IARC Sci. Publ. 223–36. 17. Tammen H, Schulte I, Hess R, Menzel C, Kellmann M, Mohring T, Schulz-Knappe P. (2005) Peptidomic analysis of human blood specimens: comparison between plasma specimens and serum by differential peptide display. Proteomics 13, 3414–22.
42
Tammen
18. Favaloro EJ, Soltani S, McDonald J. (2004) Potential laboratory misdiagnosis of hemophilia and von Willebrand disorder owing to cold activation of blood samples for testing. Am. J. Clin. Pathol. 122, 686–92. 19. Mustard JF, Kinlough-Rathbone RL, Packham MA. (1989) Isolation of human platelets from plasma by centrifugation and washing. Methods Enzymol. 169, 3–11. 20. Schuchard MD, Mehigh RJ, Cockrill SL, Lipscomb GT, Stephan JD, Wildsmith J et al. (2005) Artifactual isoform profile modification following treatment of human plasma or serum with protease inhibitor, monitored by 2-dimensional electrophoresis and mass spectrometry. Biotechniques 39, 239–47. 21. Jeffrey DH, Deidra B, Keith H, Shu-Pang H, Deborah LR, Gregory JO, Stanley AH. (2004) An Investigation of Plasma Collection, Stabilization, and Storage Procedures for Proteomic Analysis of Clinical Samples. Humana, Totowa, NJ. 22. Rai AJ, Vitzthum F. (2006) Effects of preanalytical variables on peptide and protein measurements in human serum and plasma: implications for clinical proteomics. Expert Rev. Proteomics 3, 409–26.
3 Tissue Sample Collection for Proteomics Analysis Jose I. Diaz, Lisa H. Cazares, and O. John Semmes
Summary Successful collection of tissue samples for molecular analysis requires critical considerations. We describe here our procedure for tissue specimen collection for proteomic purposes with emphasis on the most important steps, including timing issues and the procedures for immediate freezing, storage, and microdissection of the cells of interest or “tissue targets” and the lysates for protein isolation for SELDI, MALDI, and 2DGE applications. The pathologist is at the cornerstone of this process and is an invaluable collaborator. In most institutions, pathologists are responsible for “tissue custody,” and they closely supervise the tissue bank. In addition, they are optimally trained in histopathology in order to they assist investigators to correlate tissue morphology with molecular findings. In recent years, the advent of the laser capture microscope, a tool ideally designed for pathologists, has tremendously facilitated the efficiency of collecting tissue targets for molecular analysis.
Key Words: tissue bank; frozen section; immunofluorescence; laser capture microscope; proteomics.
1. Introduction From the completion of surgery and the acquisition of tissue sample to protein isolation and performing the various proteomic techniques, a number of challenges must be overcome. The first challenge is time. Surgery is associated with loss of vascular supply, resulting in progressive increase of endogenous protease activity, protein degradation, and tissue autolysis. For this reason, specimens submitted for tissue procurement must be processed without delay. Formalin fixation, a standard processing procedure in pathology, From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols Edited by: A. Vlahou © Humana Press, Totowa, NJ
43
44
Diaz et al.
stops protease activity. However, formalin is a cross-linking fixative that irreversibly alters protein, thus compromising the quality of the extracts for most proteomic techniques. Recent technical developments appear promising and may ultimately enable peptide analysis and protein identification (bottom up proteomics) in formalin-fixed paraffin embedded tissue (1). At present, however, it is imperative to take a representative “fresh” tissue sample immediately after surgery when collecting tissue for proteomic studies, including MALDI TOF MS and 2DGE. The surgical specimen should be transported quickly to pathology, and a representative tissue sample should be obtained under the supervision of a pathologist. The sample should be embedded in OCT and frozen without delay. Ideally, a frozen section should be performed for quality assurance before archiving the sample. Once the pathologist confirms that the expected targets are present in the collected tissue (for instance, tumor and non-tumor tissue), the frozen specimen can be stored in a –80°C freezer for subsequent use. Overcoming time constraints requires appropriate institutional policies and dedicated personnel. From our experience, it is better to delegate the responsibility of transporting the surgical specimen from the operating room to pathology to dedicated tissue procurement personnel, instead of expecting the surgical team to deliver the specimens. When collecting and archiving tissue samples, our policy is to bisect the sample into two halves, one embedded in OCT and stored permanently at –80°C for future molecular studies, and one submitted as a “mirror image” processed in formalin after performing a frozen section for morphologic comparison and cell type mapping after basic hematoxylin and eosin (H&E) staining. This formalin-processed mirror image tissue provides optimal morphological detail, which might be necessary in the future. For instance, it is very difficult to identify prostatic intraepithelial neoplasia (PIN) on frozen section slides; however, the formalin fixed section, which closely mimics the frozen section, can be used for guidance. After archiving the tissue sample, the next challenge is to ensure that the proteomic findings are representative of the tissue targets under investigation, given the cellular heterogeneity present in most tissues. For instance, if one would like to determine the differential protein expression in tumor versus non-tumor, one must ensure that proteins are separately and reliably extracted from normal and tumor cells. Certainly, many solid tumors are visible to the naked eye, and both tumor and non-tumor tissues can be collected by gross inspection. However, under a microscope, the tumor bed contains not only tumor cells but many other tumor–associated, non-tumoral elements, such as supporting stromal cells, blood vessels, infiltrating lymphocytes, etc. Moreover, microscopic foci of tumor may infiltrate grossly normal tissue. In the past, various approaches were followed to collect cells from tissue sections, including manual microdissection with a syringe. In the recent years, the procedure
Tissue Sample Collection for Proteomics Analysis
45
of laser-capture microdissection (2) has tremendously increased the quality, specificity, and speed of the process, allowing selective capture of cells and various tissue elements while preserving the molecular integrity (3,4,5). The LCM is a special microscope that isolates cells from frozen or formalinfixed tissues and cytological preparations. Microdissection of single cells or multicellular structures is accomplished by placing a plastic polymer (cap) over the tissue while pulsing an infrared laser for the polymer to melt and adhere to the target cells under the laser ring. When the cap is removed, the cells that adhered to the polymer detach from the surrounding tissue without any molecular damage, becoming suitable for the extraction of high-quality nucleic acids and proteins, and for a wide range of downstream molecular analyses,
A
B
C
D
Fig. 1. Selective immunofluorescent LCM of prostate gland’s basal cells by immunocapture: (A) immunofluorescent staining of basal cells with a mAb against highmolecular-weight keratins, which are highly expressed on basal cells, (B) selection of immunofluorescent-positive basal cells for subsequent LCM, (C) captured immunofluorescent-positive cells after LCM photographed from the plastic cap, (D) remaining of the gland after removing the basal cell layer by LCM.
46
Diaz et al.
such as gene expression microarrays, or proteomics. The use of a microscope can be coupled with special immunostaining procedures if one wishes to capture specific cell types not easily identified by morphology alone, which is the “so called” immunocapture procedure (6,7), which further enhances the specificity of tissue procurement for molecular analysis. For example, in a former study (8), we were able to selectively capture basal cells from benign prostate glands, which are extremely difficult to recognize morphologically but easily identifiable after immunostaining for high-molecular-weight cytokeratin (Fig. 1). We obtained excellent protein quality results and were able to identify several protein peaks preferentially expressed in these cells using SELDI-TOF-MS. When we compared the protein spectra from the same tissue sample sections routinely stained with hematoxilin with those immunostained for high-molecularweight cytokeratins, there was no difference in the spectra, militating against any significant protein deterioration due to the immunostaining procedure. 2. Materials 2.1. Tissue Collection and Storage 1. 2. 3. 4. 5.
Tissue-Tek Cryomold-standard (Sakura, Torrance, CA) Tissue-Tek OCT (Sakura) 2 methylbutane (Mallinckrodt, St. Louis, MO) Shandon Histobath II (Thermo Electron Corp., Waltham, MA) –80°C freezer
2.2. Frozen Tissue Sectioning and Staining 1. Cryostat 2. HistoGeneTM LCM Frozen Section Staining Kit (Arcturus Biosciences Inc, Mountain View, CA). The kit contains histogene staining solution, ethanol (75, 95, 100%), xylene, distilled water nuclease free, histogene LCM slides, and disposable slide staining jars. 3. 1× PBS made from 10× stock (Fisher Scientific) 4. Acetone (high purity grade) 5. Cy3-Strepavidin (Invitrogen, Carlsbad, CA) 6. Biotinylated mAbs: Any antibody can be biotinylated. We routinely have 1.5 mg of antibody labeled with 0.2 mg biotin (Alpha Diagnostic Intl. Inc. San Antonio, TX).
2.3. LCM 1. 2. 3. 4. 5.
PixCell II LCM System (Arcturus Biosciences Inc) AutoPixTM Automated LCM System (Arcturus Biosciences Inc) CapSure® LCM caps (Arcturus Biosciences Inc) Prep Strip (Arcturus Biosciences Inc) Microcentrifuge tubes (0.5 ml) (Eppendorf North America)
Tissue Sample Collection for Proteomics Analysis
47
2.4. LCM Lysate 1. 2. 3. 4.
Micropipet capable of delivering 1 μl accurately 20 mM HEPES (pH to 8.0 with NaOH) with 1% Triton X-100 Sonicator (optional) 1× PBS
2.5. SELDI Analysis 1. 2. 3. 4. 5. 6. 7.
IMAC3 or WCX2 Protein Array Chips (Ciphergen Biosystems Palo Alto, CA) HPLC grade water (Fisher Scientific) 100 mM sodium acetate pH 4.0 100 mM ammonium acetate pH 4.0 Sinapinic acid (SPA) (Ciphergen Biosystems, Palo Alto, CA) Optima grade Acetonitile (Fisher Scientific) Trifluoroacetic acid, packaged in 1 ml ampules (Pierce Chemical Company, Rockford, IL)
2.6. MALDI Analysis 1. 2. 3. 4. 5.
Target plate Cinaminic acid (CHCA) (Bruker Daltonics, Palo Alto, CA) SPA (Fluka) Optima grade Acetonitile (Fisher Scientific) Trifluoroacetic acid, packaged in 1 ml ampules (Pierce Chemical Company)
3. Method 3.1. Tissue Collection and Storage 1. The tissue sample is embedded in OCT using a cryomold and is frozen in the Shandon Histobath, which contains 2 methylbutane (see Note 1). 2. Hold the cryomold against the 2 methylbutane liquid interface and allow the tissue to freeze slowly (3–5 min) (see Note 2). 3. After achieving complete freezing, place the frozen cryomold containing the sample in a plastic bag and transport the sample within a liquid nitrogen container. Store the sample in a –80°C freezer.
3.2. Frozen Tissue Sectioning and Staining 3.2.1. Regular Hematoxylin Staining Prior to LCM, cut 8-μm-thick frozen tissue sections from the cryostat (discard folded or wrinkled sections). Keep slides with sections in cryostat after cutting and stain as follows (see Notes 3 and 9; slides may also be frozen at –80°C until stained.):
48
Diaz et al. 1. 2. 3. 4. 5. 6. 7. 8. 9.
Remove the slides from the freezer or cryostat and place in 70% ethanol (30 s). Place in purified water (5 s). Add the Histogene staining solution (30 s) (see Note 4). Rinse the slides with purified water. Wash with 70% ethanol (60 s). Wash with 95% ethanol twice (60 s each). Wash with 100% ethanol (60 s). Place the slides in xylene to ensure complete dehydration (10 min) (see Note 5). Shake off and drain carefully by touching the corner with a particle-free tissue paper. 10. Air dry the slides to allow xylene to evaporate completely (at least 2 min). 11. The slides are now ready for LCM (they should not be coverslipped) (see Note 12)
3.2.2. Immunofluorescence Staining (see Note 7) 1. 2. 3. 4. 5. 6. 7. 8.
9. 10. 11. 12. 13. 14.
Thaw slides (1 min). Place in cold acetone at 4°C (2 min). Air dry (30 s). Wash in filtered pH 7.4 1× PBS. Drain off slides. Add 100 μl of first biotinylated Ab at optimal dilution: recommended concentration 30–100 μg/ml, optimize for best results (3 min). Rinse in PBS. Add 100 μl of Cy3 at dilution 1:100 (user may decide the optimal staining concentration of the Cy3 Streptavidin conjugate by performing a serial dilution staining experiment) (1 min). Rinse in PBS. Place slides in 75% ethanol (30 s). Place slides in 95% ethanol (30 s). Place slides in 100% ethanol (30 s). Place slides in xylene (5 min) (see Note 6). Air dry (5 min).
3.3. LCM The new instruments developed by Arcturus, such as the AutoPixTM and the VeritasTM are enclosed in automated systems entirely operated by a computer. We describe here the LCM procedure using the PixCell II instrument, which is manually operated and the least expensive LCM instrument today and, therefore, more widely used (see Note 8). 1. Turn on the instrument and enter pertinent data such as slide #, case #, cap lot #, thickness (always 8 μm), and place the stained slide on the mechanical stage (see Note 10).
Tissue Sample Collection for Proteomics Analysis
49
2. Turn on the vacuum pump to immobilize the slide (small aperture on the left side of the stage) and push in the filter bottom for optimal image quality. 3. Place the caps in the rail on the right side of the stage. Unlock the mechanical arm, move it toward the tissue, and drop it at the top of the tissue. Align the joystick to move the stage to a centered and perpendicular position before beginning the microdissection process. 4. Turn on the key on the right side of the power supply to enable the infrared laser. Focus the laser before beginning microdissection using the smallest ring diameter and adjust to the desired diameter. 5. Select the appropriate energy (mW) and time of exposure (ms) for the desired laser ring diameter and ensure its effectiveness in an area of the tissue that lacks any interest using a cap to be discarded (see Note 11). 6. Fire the laser each time the ring is over the desired tissue target. Move the stage supporting the glass slide with the aid of the joystick, which allows fine and precise motion. Check if the tissue is appropriately microdissected and capture the tissue images before and after LCM as well as the image of the target tissue that was captured in the cap (see Note 13). 7. When the cap is filled with the desired amount of tissue, remove the cap and use a 0.5-ml microcentrifuge tube to collect the tissue (the cap is designed to perfectly fit to close the tube) (see Note 14). 8. The microcentrifuge tube can be safely stored in a –80°C freezer without adding any buffer and without lysing the cells, which may be done at a convenient time later.
3.4. LCM Lysate 1. Lyse a total of 1500–2000 laser shots (about 3000 to 6000 microdissected cells) in 4 μl of 20 mM Hepes pH 8.0 with 1% Triton X-100. This is sufficient for one SELDI protein array or one MALDI run. For 2D analysis, a minimum of approximately 25,000 cells are necessary. 2. Add the above lysing buffer on the cap and place in the microfuge tube holding the cap. This is usually done with two additions of 2 μl to the LCM cap. Pipet up and down and scrape the surface of the LCM cap to remove all the cells. A gentle scraping motion with the pipet tip may be necessary to remove the cells, but be careful not to rip the polymer film (see Note 15). Transfer the lysate from the surface of the cap to the microfuge tube. Cells from multiple caps may be combined by subsequently using 4 μl of LCM lysate to lyse cells on another cap. In this way the volume will remain small. If 2DGE may be performed, the lysis procedure is different (see below). Make a 1:10 dilution of each lysate in PBS (for IMAC3 SELDI chips) or 100 mM ammonium acetate pH 4.0 (for WCX2 chips) (i.e., 36 μl added to the 4 μl lysate) vortex for at least 1 min (see Note 16). Spin down briefly. 3. Prepare the arrays of the IMAC chip with CuSO4 according to the manufacturer’s specifications: 20 μl, 100 mM CuSO4 for 10 min, wash with HPLC water; 20 μl, 100 mM Na acetate pH 4.0 for 5 min, wash with water. Use the Micromix shaker for all incubations with the following settings: Form-20, Amplitude-5.
50
Diaz et al.
4. Assemble the bioprocessor with the desired number of chips and add 2× 200 μl PBS to each well, incubate on the shaker for 5 min each time. Pretreat the WCX2 chip with 100 mM ammonium acetate pH 4.0. This can be done on the BioMek robot. 5. Add the diluted lysate to the spot on the chip(s) in the bioprocessor. 6. Cover the bioprocessor with a plastic seal and incubate overnight on MicroMix shaker at room temperature, using the same setting as given above. 7. Remove lysates carefully with a pipet; do not touch the surface of the arrays. Save if needed for another experiment. 8. Wash the spots in bioprocessor 2× with 200 μl PBS (for IMAC) or 100 mM ammonium acetate pH 4.0 (for WCX) for 5 min on the shaker. 9. Wash the arrays with HPLC water 2× for 5 min (on shaker). 10. Remove the chip(s) from bioprocessor and give them a final rinse with HPLC water. 11. Let the chip dry completely, usually overnight. 12. Add 2× 0.5 μl saturated SPA dissolved in 50% acetonitrile, 0.5% TFA. 13. Read at instrument settings optimized for resolution and intensity for the m/z range of 1000–20,000. Higher laser energy will be required to see higher molecular weight peaks.
One method of MALDI sample preparation that reduces the complexity of cell lysates while remaining robust and easily amenable to automated highthroughput applications is sample fractionation using magnetic beads (MB) combined with pre-structured MALDI sample supports (AnchorChip Technology). Several magnetic bead types with different surface chemistries can be used to fractionate serum and increase the number of detectable peaks (see the chapter on serum protein profiling for details). For MALDI analysis, dilute the lysate 1:10 with CHCA or SPA matrix (5–10 mg/ml in 50% acetonitrile, 0.1% TFA). Spot on Anchorplate and read in a MALDI instrument. Further dilution and/or fractionation of the lysate may be necessary to achieve optimal spectra. If 2DGE analysis will be performed, the cells should be lysed as follows: Remove the LCM cap from the tube and add a small volume (10 μl) of 1D focusing rehydration buffer to the tube. The preferred number of laser shots is approximately 100 K. Replace the cap and invert the tube to allow the buffer to come in contact with the cells on the cap and lyse them. Incubate 5 min at room temperature. Sonicate the samples to ensure lysis. Continue with the basic protocol for 1D IEF and 2D analysis. 4. Notes 1. In our experience, a time window of 30 min between completion of surgery and tissue freezing yields good protein quality for most proteomic techniques. However, if one is studying protein phosphorylation, this begins to significantly decrease 20 min after completion of surgery (10).
Tissue Sample Collection for Proteomics Analysis
51
2. When freezing the tissue sample in the Histobath, avoid immediate and complete immersion in 2 methylbutane to preserve optimal tissue morphology. Hold the sample at the liquid interface with minimal immersion and wait until the OCT and the tissue slowly turn white. 3. Use uncoated glass slides for LCM. Coated or electrically-charged glass slides will interfere with the detachment process of the plastic polymer and are not suitable for LCM. 4. Precipitate from Hematoxylin can contaminate the surface of the tissue. Filter these solutions. Add one tablet of protease inhibitor to each staining bath (we use Complete, from BMB). Do not add protease inhibitor to alcohol baths. If using the histogene staining kit (Arcturus) for frozen sections, this is not necessary. 5. Change all the staining and alcohol solutions after staining 20 slides. 6. Poor transfers may result if 100% ethanol has hydrated. Increasing the incubation time in xylene often improves transfer. 7. When specific cells need to be microdissected and these cannot be identified morphologically, the cells of interest can be immunostained with specific mAbs against proteins highly expressed on those cells (immunophenotype). It is critical to expedite the immunostaining procedure because the shorter the immunostaining time, the better the protein quality. One must avoid exceeding 30 min for the total immunostaining and dehydration procedure. In the past, we have used the immunoperoxidase technique with DAB labeling (6), but it was difficult to perform quick enough to preserve optimal protein integrity. Also, manual microdissection of DAB labeled cells with Pixel II is extremely tedious and nonpractical. The immunofluorescence staining method (7) is faster and easier to perform. This method coupled with the Autopix microscope, which has dark field fluorescence and automation capabilities, is the ideal procedure for immunocapture. Since Cy3-strepavidin binds to the antibody labeled with biotin, there is no need for a secondary antibody, thereby decreasing the necessary staining time. It is recommended to run negative control staining; use a biotinylated control antibody from the same animal species and of the same isotype as your primary antibody. Dilute to the same working concentration as the primary antibody. 8. Do not forget to wear gloves every time while performing LCM, including when handling the plastic caps. 9. The thickness of the tissue section is a critical parameter for effective LCM. In our experience (using the Pixel II and the Autopix instruments by Arcturus), 8 μm is the optimal thickness for LCM. 10. Smooth out the surface of the tissue section with a Prep-strip before placing the slide on the LCM instrument, which improves the efficiency and uniformity of the microdissection process. 11. The main factors affecting the efficiency of LCM include the energy, the time of exposure, and the diameter of the laser beam. Regarding the diameter, when using Pixel II, the smallest ring is 7 μm, the medium ring is 15 μm, and the widest ring is 30 μm. Very often, we have used the medium (15 μm, which lifts up about three cells with each shot). When trying to microdissect single cells with
52
Diaz et al. Pixel II, one must use the smallest (7 μm) diameter ring, but our experience was frustrating. With Autopix, we have observed that microdissection of individual cells is better achieved setting the laser ring at 10 μm diameter, below which it becomes very difficult to lift up cells efficiently. A 30-μm diameter laser is very effective for microdissection of whole glands and other large tissue structures.
Regarding the other two parameters, the optimization depends on the tissue type. For instance, for prostate tissue, an energy of 80 mW with a duration of 0.5 ms is usually effective for a medium-size ring (15 μm). The tuning of these parameters is accomplished by a “fail and try” approach, progressively adjusting the energy and the time of exposure for the desired diameter, which obviously depends on the desired microdissection task (single cells vs. mediumor large-size tissue structures). 12. Another factor that affects the effectiveness of LCM is the time the tissue section has been dry after the staining and dehydration procedure. Ideally, the tissue should be stained and microdissected within 1 h if possible. One must avoid having the slide under LCM for more than 4 h. If microdissecting many tissues, stain only four slides at a time. 13. When capturing images before and after microdissection for documentation purposes, make sure the image on the monitor is focused because that is the image that would be captured. Sometimes is focused on the microscope but is unfocused on the monitor. In a typical experiment, you will capture the image before and after firing the laser, which provides records of the effectiveness in removing the cell targets. You can also capture the image of microdissected cells from the polymer cap. 14. Avoid allowing the LCM caps to become excessively crowded. When using the 15-μm laser ring, microdissection is about three cells per shot. One should expect around 3000 cells for each 1000 shots, which is about right per single cap. 15. LCM caps can be viewed under a dissecting microscope to ensure that all cells have been removed from the polymer film after the lysing procedure. 16. Depending on the cell type, vigorous vortexing and sonication may be necessary to completely lyse the cells after they are removed from the cap.
References 1. Prieto, D.A., Hood, B.L., Darfler, M.M., Guiel, T.G., Lucas, D.A., Conrads, T.P., Veenstra, D.T., and Krizman, D.B. (2005) Liquid TissueTM : proteomic profiling of formalin-fixed tissues. Biotechniques 38: 32–5. 2. Emmert-Buck, M.R., Bonner, R.F., Smith, P.D., Chuaqui, R.F., Zhuang, Z., Goldstein, S.R., Weiss, R.A., and Liotta, L.A. (1996) Laser capture microdissection. Science 274: 998–1001. 3. Espina, V., Milia, J., Wu, G., Cowherd, S., Liotta, L.A. (2006) Laser capture microdissection. Methods Mol Biol 319: 213–29.
Tissue Sample Collection for Proteomics Analysis
53
4. Best, C.J., and Emmert-Buck, M.R. (2001) Molecular profiling of tissue samples using laser capture microdissection. Expert Rev Mol Diagn. 1: 53–60. 5. Ornstein, D.K., Gillespie, J.W., Paweletz, C.P., Duray, P.H., Herring, J., Vocke, C.D., Topalian, S.L., Bostwick, D.G., Linehan, W.M., Petricoin, E.F., III, and Emmert-Buck, M.R. (2000) Proteomic analysis of laser capture microdissected human prostate cancer and in vitro prostate cell lines. Electrophoresis 21: 2235–42. 6. Fend, F., Emmert-Buck, M.R., Chuaqui, R., Cole, K., Lee, J., Liotta, L.A., and Raffeld, M. (1999) Immuno-LCM: laser capture microdissection of immunostained frozen sections for mRNA analysis. Am J Pathol 154: 61–6. 7. Murakami, H., Liotta, L., Star, R.A. (2000) IF-LCM: laser capture microdissection of immunofluorescently defined cells for mRNA analysis rapid communication. Kidney Int 58(3): 1346–53. 8. Cazares, L.H., Adam, B.L., Ward, M.D., Nasim, S., Schellhammer, P.F., Semmes, O.J., and Wright, G.L., Jr (2002) Normal, benign, preneoplastic, and malignant prostate cells have distinct protein expression profiles resolved by surface enhanced laser desorption/ionization mass spectrometry. Clin Cancer Res 8: 2541–52. 9. Diaz, J., Cazares, L.H., Corica, A., and Semmes O. (2004) Selective capture of prostatic basal cells and secretory epithelial cells for proteomic and genomic analysis. Urol Oncol 22(4): 329–36. 10. Mora, L., Buettner, R., Seigne, J., Diaz, J., Hamad, N., Garcia, R., Bowman, T., Falcone, R., Faigurth, R., Cantor, A., Muro-Cacho, C., Livistong, S., Levitzki, A., Kraker, A., Karras, J., Pow-Sang, J., and Jove, R. (2002) Constitutive activation of Stat3 in human prostate tumors and cell lines: direct inhibition of stat3 signaling induces apoptosis of prostate cancer cells. Cancer Research 62: 6659–66.
4 Protein Profiling of Human Plasma Samples by Two-Dimensional Electrophoresis Sang Yun Cho, Eun-Young Lee, Hye-Young Kim, Min-Jung Kang, Hyoung-Joo Lee, Hoguen Kim, and Young-Ki Paik
Summary Human plasma is regarded the most complex and well-known clinical specimen that can be easily obtained; alterations in the levels of plasma proteins or their corresponding enzyme activities may reflect either a healthy or a diseased state. Given that there is no defined genomic information as to the intact protein components in plasma, protein profiling could be the first step toward its molecular characterization. Several problems exist in the analysis of plasma proteins, however. For example, the widest dynamic range of protein concentrations, the presence of high-abundance proteins, and post-translational modifications need to be considered before proteomic studies are undertaken. In particular, efficient depletion or pre-fractionation of high-abundance proteins is crucial for the identification of low-abundance proteins that may contain potential biomarkers. After the removal of high-abundance proteins, protein profiling can be initiated using two-dimensional electrophoresis (2DE), which has been widely used for displaying the differential proteome under specific physiological conditions. Here, we describe a typical 2DE procedure for plasma proteome under either a healthy or a diseased state (e.g., liver cancer) in which pre-fractionation and depletion are integral steps in the search for disease biomarkers.
Key Words: 2-dimensional gel electrophoresis; plasma; HPPP; immunoaffinity column.
Abbreviations: IEF: Isoelectric Focusing, IPG; Immobilized pH Gradient, TCA: Trichloroacetic Acid, FFE: Free Flow Electrophoresis, HPMC: Hydroxypropyl Methylcellulose, TBP: Tributylphosphine, 2DE: 2-dimensional Gel Electrophoresis, BPB: Bromophenol Blue, CHCA: -cyano-4-hydroxycinnamic acid, LTQ: Linear Iontrap From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols Edited by: A. Vlahou © Humana Press, Totowa, NJ
57
58
Cho et al.
MALDI-TOF: Matrix-assisted Laser Desorption Ionization - Time of Flight Mass Spectrometry, HPPP: Human Plasma Proteome Project.
1. Introduction Human plasma is an intravascular fluid that serves as a liquid medium for blood proteins that are derived from various cells, tissues, and other biofluids (1). In fact, the components of plasma are very heterogeneous, including inorganic ions (e.g., bicarbonate, calcium), metabolic intermediates (e.g., cholesterol, glucose), and plasma proteins (e.g., albumin, globulin), which are important in maintaining body fluid balance, immune response, blood clotting, and other metabolic mechanisms of homeostasis. Plasma contains many different proteins that are primarily synthesized in the liver and are often subjected to post-translational modification (PTM) (2). Since human plasma is the most complex and well-known clinical specimen that can be easily obtained, it has been a central target for many biomedical studies (2). Alterations in the levels of plasma proteins or their corresponding enzyme activities may reflect either a healthy or a diseased state that can be monitored by various analytical tools, including biochemical assays and proteomics. Given that there is no defined genomic information as to the intact protein components in plasma, a proteomic study may be the method of choice (3,4). Recently, plasma protein profiling was conducted as part of the plasma proteome project of HUPO, termed HPPP (5). The pilot phase of HPPP produced 3020 non-redundant proteins that were found to be present in human plasma and serum (5,6). However, several points must be addressed before proteomic studies are undertaken. First, plasma protein is believed to contain the most dynamic concentration range (more than 10 orders of magnitude) of each constituent protein, creating many technical obstacles in proteomic detection by mass spectrometry (MS) (2,3). For example, the removal of high-abundance proteins (e.g., albumin, IgG, transferrin, fibrinogen, IgA, etc.) that occupy more than 90% of all plasma proteins prior to biochemical analysis may be a big challenge and perhaps even problematic in light of plasma-derived biomarker discovery (3,7). Second, since many plasma proteins have many structural isoforms, more efficient analytical system is needed to facilitate the analysis of multiple isoforms of plasma proteins (1). Third, since many plasma proteins are synthesized as pre-proteins that are subjected to various PTMs for cellular function, more efficient methods to analyze modified proteins (e.g., glycosylated proteins) are required. For example, since glycopeptides are not easily ionized completely during MS analysis, which leads to inadequate spectral data and low detection sensitivity due to the attached glycans, a strategy
Protein Profiling by Two-Dimensional Electrophoresis
59
for the removal of glycans must be considered for protein identification. Taken together, all these factors are important for the proteomic study of plasma (8). Of the problems listed above, the first problem that concerns the protein profiling of plasma may be the depletion or pre-fractionation of high-abundance plasma proteins (3,4,7). Without this depletion procedure, the identification of low-abundance proteins (including biomarkers) may not be practical. After the removal of high-abundance proteins, two-dimensional electrophoresis (2DE) may be the first step chosen to analyze plasma proteins because it is easy to perform in the laboratory. Although 2DE has several limitations in terms of reproducibility, separation of membrane or low-molecular-weight proteins, and proteins with extreme pIs (10), this technique has been widely used as a first analysis of proteins in a particular physiological state when coupled with MS (9). Recently, quantitative 2DE was performed with a difference in gel electrophoresis (DIGE) system (see Chapter by Friedman and Lilley for detail), where two or three differentially staining dyes can be applied to specific protein populations to determine their quantitative changes in expression levels under a specific physiological condition (10). Thus, this chapter is intended to provide the reader with necessary information on the systematic analysis of the plasma proteome using 2DE in an attempt to search for disease biomarkers from the plasma proteins of patients with hepatocellular carcinoma (HCC) (11,12).
2. Materials 2.1. Preparation of Human Plasma Samples 1. Blood collection tubes: BD Plus Plastic K2 EDTA (BD, 367525; 10 mL), BD Glass Serum with silica clot activator (367820, 10 mL). 2. Protease inhibitor (Complete Protease Inhibitor Cocktail, Roche, 11 697 498 001, 20 tablets): One tablet contains protease inhibitors (antipain, bestatin, chymostatin, leupeptin, pepstatin, aprotinin, phosphoramidon, and EDTA) sufficient for the processing of 100 mL plasma samples. Prepare 25× stock solutions in 2 mL distilled water.
2.2. Depletion of High-Abundance Proteins with an Immunoaffinity Column 1. HPLC system, such as the HP1100 LC system (Agilent). 2. Multiple affinity removal system (MARS): LC column (Agilent, 5185-5984); Buffer A for sample loading, washing, and equilibrating (Agilent, 5185-5987); Buffer B for eluting (Agilent, 5185-5988).
60
Cho et al.
2.3. Isoelectric Focusing (IEF) with Immobilized pH Gradient (IPG) Strip 1. MultiPhorTM (GE Healthcare) or Protean IEF cell (Bio-Rad): Numerous commercially available isoelectric focusing units exist 2. Re-swelling tray 3. Mineral oil: Immobiline Dry Strip Cover Fluid (GE Healthcare) 4. Power supply, such as the EPS 3501 XL power supply (GE Healthcare) 5. Thermostatic circulator: Multitemp III thermostatic circulator (GE Healthcare) 6. IPG strip: Immobiline Dry Strip, pH 3-10 nonlinear (NL), or pH 4.0-5.0, and pH 5.5-6.7, 18 cm long, 0.5 mm thick (GE Healthcare) or with the same pH ranges for ReadyStrip IPG strip (Bio-Rad) 7. Carrier ampholyte mixtures: IPG buffer or Pharmalyte, same range as the selected IPG strip 8. Sample buffer: 7 M urea, 2 M thiourea, 4% (w/v) CHAPS, 0.5% (v/v) ampholyte, 100 mM DTT, 40 mM Tris-HCl, pH 7.5, a trace amount of bromophenol blue (BPB)
2.4. Microscale Solution Isoelectric Focusing: ZOOM® 1. ZOOM® (IEF Fractionator (Invitrogen, ZF10001)). 2. ZOOM® disks: pHs 3.0, 4.6, 5.4, 6.2, 7.0, and 10.0 [Invitrogen, ZD series (e.g., ZD10030 for pH 3.0)] 3. IEF Anode Buffer (50X) (Novex, LC5300, 100 mL) 4. IEF Cathode Buffer (10X) (Novex, LC5310, 125 mL) 5. Anode buffer: 8.4 g urea, 3.0 g thiourea, 3.3 mL Novex® IEF Anode Buffer (50X). Add water to a final volume of 20 mL. 6. Cathode buffer: 8.4 g urea, 3.0 g thiourea, 3.3 mL Novex® IEF Cathode Buffer (50X). Add water to a final volume of 20 mL.
2.5. Fractionation of Plasma Samples by Free Flow Electrophoresis (FFE) 1. ProTeamTM FFE instrument (Tecan) 2. 1% 2-(4-sulfophenylazo)-1,8-dihydroxy-3,6-naphthalenedisulfonic acid (SPADNS) (Tecan, 517074) 3. 0.8% hydroxypropyl methylcellulose (HPMC) (Tecan, 5170709) 4. pI markers: mixture of pI markers that indicate pHs 4.2, 5.1, 6.3, 7.4, 8.7, and 10.1 (Tecan, 5170705) 5. ProlyteTM 1, ProlyteTM 2, and ProlyteTM 3 (Tecan, 0309081, 0309102, and 0309093) 6. Anodic stabilization medium (Inlet I1 ): 14.5% (w/w) glycerol, 8 M urea, 0.03% (w/w) HPMC, 100 mM H2 SO4 7. Separation medium 1 (Inlet I2 ): 14.5% (w/w) glycerol, 8 M urea, 0.03% (w/w) HPMC, 14.5% (w/w) ProlyteTM 1
Protein Profiling by Two-Dimensional Electrophoresis
61
8. Separation medium 2 (Inlet I3−5 ): 14.5% (w/w) glycerol, 8 M urea, 0.03% (w/w) HPMC, 14.5% (w/w) ProlyteTM 2 9. Separation medium 3 (Inlet I6 ): 14.5% (w/w) glycerol, 8 M urea, 0.03% (w/w) HPMC, 14.5% (w/w) ProlyteTM 3 10. Cathodic stabilization medium (Inlet I7 ): 14.5% (w/w) glycerol, 8 M urea, 0.03% (w/w) HPMC, 100 mM NaOH 11. Counter flow medium (Inlet I8 ): 14.5% (w/w) glycerol, 8 M urea 12. Anodic circuit electrolyte: 100 mM H2 SO4 13. Cathodic circuit electrolyte: 100 mM NaOH
2.6. Preparation of 2D Gels 1. Gradient former: One of the two Bio-Rad models can be used in this step: Model 385 (30-100 mL capacity) or Model 395 (100-750 mL capacity). 2. Orbital shaker with speed controller. 3. SDS-PAGE: Protean II xi multicell and multicasting chamber (Bio-Rad) or Ettan DALT twelve large vertical system (GE Healthcare). 4. 5× Tris-HCl buffer: Dissolve 227 g Tris into 800 mL distilled water and adjust the buffer to pH 8.8 with HCl (∼30 mL). Add distilled water to a final volume of 1 L. 5. 5× Gel buffer: Dissolve 15 g Tris, 72 g glycine, and 5 g sodium dodesyl sulfate (SDS) into 800 mL distilled water and add distilled water to a final volume of 1 L. 6. SDS Equilibration buffer contains 6 M urea, 2% (w/v) SDS, 5× gel buffer (pH 8.8), 50% (v/v) glycerol, and 2.5% (w/v) acrylamide monomer. 7. Acrylamide stock solution: Acrylamide/Bis-acrylamide 37:5.1, 40% (w/v) solution (Amresco, M157, 500 mL). 8. Fixing solution: 40% (v/v) methanol and 5% (v/v) phosphoric acid in distilled water. 9. Coomassie blue G-250 staining solution: 17% (w/v) ammonium sulfate, 3% (v/v) phosphoric acid, 34% (v/v) methanol, and 0.1% (w/v) Coomassie blue G-250 in distilled water.
2.7. 2D Gel Image Analysis 1. Scanner with transparency unit, such as Bio-Rad GS710 or GS800 2. 2D gel image analysis program: Image Master Platinum 5 (GE Healthcare), PDQuest 7.3.0 (Bio-Rad), or Progenesis Discovery (NonLinear Dynamics, Ltd.)
2.8. Destaining, In-gel Deglycosylation, and In-gel Tryptic Digestion 1. Speed Vac (Heto) 2. PNGase F stock solution for in-gel deglycosylation PNGase F (Glyko, Inc, GKE5010). Dilute 1 μL PNGase F (2 mU) with 2.5 mL 1× N-glycanase incubation buffer (20 mM sodium phosphate, pH 7.5, and 0.02% (w/v) sodium azide)
62
Cho et al.
3. Sequencing-grade modified trypsin (Promega, V5111, 100 μg, 18,100 U/mg) 4. 50 mM ammonium bicarbonate
2.9. Desalting of Peptides and MALDI Plating 1. 2. 3. 4. 5. 6. 7. 8.
GELoader tips (Eppendorf, No. 0030 048.083, 20 μL capacity) Poros 10 R2 resin (PerSeptive Biosystems, 1-1118-02, 0.8 g) Oligo R3 resins (PerSeptive Biosystems, 1-1339-03, 6.3 g) 2% (v/v) formic acid in 70% (v/v) acetonitrile (ACN) 0.1% (v/v) trifluoroacetic acid in 70% (v/v) ACN 1-mL syringe Matrix: -cyano-4-hydroxycinnamic acid (CHCA) Opti-TOFTM 384-well insert (123 × 81 mm, 1016491, Applied Biosystems)
2.10. MALDI-TOF and Peptide Mass Fingerprinting 1. MALDI-TOF and MALDI-TOF/TOF: Voyager DE-Pro and 4800 MALDI TOF/TOFTM Analyzer (Applied Biosystems) equipped with a 355-nm Nd:YAG laser. The pressure in the TOF analyzer is approximately 7.6e-07 Torr.
3. Methods 3.1. Human Plasma Sample Preparation The following protocol is conducted according to the HUPO reference sample collection protocol (13). 1. Each sample pool consisted of 400 mL blood from one healthy, fasting male and one healthy, fasting postmenopausal female, and was collected into 10-mL tubes by two venipunctures, 20 tubes per veni-puncture (see Note 1). 2. Equal numbers of tubes and aliquots were generated with appropriate concentrations of K2 -EDTA, lithium heparin, or sodium citrate for plasma or were permitted to clot at room temperature for 30 min to yield serum (with micronized silica as the clot activator) (see Note 2). 3. The specimens were centrifuged for 10–15 min under refrigerated conditions at 2–6°C. 4. The resultant serum and plasma from 10 spun tubes of the same type from each donor were pooled into one secondary 50-mL conical bottom BDTM Falcon tube for each tube type. 5. The secondary tube was centrifuged at 2400×g for 15 min to remove residual cellular material from serum and to prepare platelet-poor plasma from the EDTA, heparin, and citrate secondary tubes. 6. Equal volumes of either serum or plasma were pooled from each secondary tube into media bottles (see Note 3). 7. Serum/plasma was mixed gently and kept on ice while distributed as 20-μL aliquots into cryovials and was then frozen and stored at –70°C.
Protein Profiling by Two-Dimensional Electrophoresis
63
3.2. Depletion of High-abundance Proteins with an Immunoaffinity Column For efficient depletion of high-abundance proteins prior to their molecular analysis, many reports have indicated that it is convenient to use commercially available immunoaffinity columns, such as the MARS (Agilent) (2,3) or the prepacked 2-mL SepproTM MIXED12 affinity LC column (GenWay Biotech.) (14), coupled with an HPLC system. For depletion of the six most abundant proteins (i.e., albumin, transferrin, IgG, IgA, haptoglobin, and anti-trypsin) in either serum or plasma, we introduced MARS, which has been used successfully with a wide variety of sample types, including cerebrospinal fluid (CSF) and follicular fluid (2,3) (see Fig. 1 ). 1. Dilute human serum or plasma fivefold with Buffer A (for example: 20 μL human plasma with 80 μL Buffer A) containing the protease inhibitor stock solution (40 μL per 1 mL plasma) (see Note 4) (adopted from the manufacturer’s instructions). 2. Remove the particulates with a 0.22-μm spin filter for 1 min at 16,000×g. 3. Inject 75-100 μL of the diluted serum or plasma at a flow rate of 0.5 mL/min.
Fig. 1. The 2DE images of total human plasma proteins that were depleted of the major six abundant proteins through MARS. Proteins were isoelectrically focused with pH 3–10 NL IPG strips in the first dimension and then resolved by 9–16% SDSPAGE in the second dimension. (A) Whole plasma. (B) Flow through from MARS. Approximately 800 protein spots are displayed by 2DE and identified by MALDI-TOF mass spectrometry. The names of the major proteins of each gel are marked on the image (5) (from (4)with permission)
64
Cho et al.
4. Collect the flow-through fractions that appear between 1.5 and 4.5 min and store them at –20°C if they were not to be analyzed immediately. 5. Elute bound proteins from the column with Buffer B (elution buffer) at a flow rate of 1 mL/min for 3.5 min. 6. Regenerate the column by equilibrating with Buffer A for an additional 7.4 min at a flow rate of 1 mL/min.
3.3. TCA/Acetone Precipitation During 2DE, interfering compounds, such as proteolytic enzymes, salts, lipids, nucleic acids, and any residual high-abundance proteins present after depletion, must be removed or inactivated. In the case of plasma samples, the two most important parameters are salt and proteolysis. TCA/acetone precipitation is the most useful method for desalting the whole plasma and the flow-through fractions of MARS. 1. Add 50% (w/v) trichloroacetic acid (TCA, Sigma, T9159) to reach a final TCA concentration of 5-8%. Mix gently by inverting the tube 5 to 6 times and incubate on ice for 2 h. 2. Centrifuge the sample at 14,000×g for 15 min and discard the supernatant. 3. Add 200 μL cold acetone and resuspend the protein pellet with a pipette. 4. Incubate on ice for 15 min and centrifuge the sample at 14,000×g for 20 min, discard the acetone, and dry the pellet in air (see Note 5). 5. Dissolve the pellet in the sample buffer for 2DE and quantify the protein concentration by the Bradford protein assay.
3.4. Rehydration of the IPG Gel Strip For analytical purposes, typically 0.3–1.0 mg protein can be loaded onto an 18-cm-long IPG with a wide pH range (e.g., pH 3-10), or 0.5–2.0 mg on an IPG with a narrow pH range (e.g., pH 5.5–6.7). A narrow-range IPG usually produces a higher resolution when separate proteins are analyzed by sequential IEF systems: first, fractionate the proteins over several pI ranges in solution with ZOOM® disks or FFE (see Subheadings 3.6 and 3.7) and then perform IEF with IPG strips [one pH unit range strips are also available (e.g., pH 3.0– 4.0 or pH 3.5–4.5 up to pH 6.7)]. Certain proteins appear to be trapped in the disk membrane; partitions and sample loss should be considered. 1. Dilute 1.0 mg protein with the sample buffer to a final volume of 400 μL for 18-cm-long IPG strips (see Note 6). 2. Transfer the entire protein-containing sample buffer into the re-swelling tray. 3. Peel off the protective cover from the IPG strip and slowly slide the IPG strip (gel side down) onto the sample solution. Avoid trapping air bubbles and distribute the sample solution evenly under the strips.
Protein Profiling by Two-Dimensional Electrophoresis
65
4. Overlay the strip with mineral oil and leave for 12-16 h at room temperature (see Note 7 for cup loading)
3.5. IEF with IPG Strip 1. Remove the rehydrated IPG strips that are carrying the protein samples and place them (gel side up) on the strip tray. 2. Place the 2.5-cm filter papers, wetted with distilled water, on both sides of the strips at both cathodic and anodic ends. Place the strip tray on the IEF unit. 3. Cover the strips entirely with mineral oil. 4. Program the instrument (e.g., Multiphor II): Increase the voltage from 100 to 3500 V to reach 80,000 total voltage hours (Vh) (e.g., sequentially, 300 Vh at 100 V, 600 Vh at 300 V, 600 Vh at 600 V, 1000 Vh at 1000 V, and 2000 Vh at 2000 V, for a total of 80,000 Vh at 3500 V) (see Notes 8 and 9). 5. During IEF, the temperature is set to 20°C with a water circulator.
3.6. Microscale Solution IEF: ZOOM® To reduce typical artifacts that may occur when using narrow-range IPG strips (e.g., streaking, distortion, and loss of protein spots), one may use MicroSol-IEF (e.g., ZOOM® , Invitrogen) prior to running 2D gels (3) (see Fig. 2). MicroSol-IEF is a preparative solution-phase IEF apparatus that is dissected by a defined pH membrane disc (15,16). Using MicroSol-IEF, 2.5-3.0 mg plasma proteins can be loaded and efficiently fractionated into five separate chambers by their pI values. 1. Add 2 μL of 99% dimethylamine (DMA) to the 400-μL sample (see Subheading 3.4, Step 2) for alkylation and incubate the sample on a rotary shaker for 30 min at room temperature (adopted from the manufacturer’s instructions). 2. Add 4 μL of 2 M DTT to quench any excess DMA. Centrifuge at 16,000×g for 20 min at 4°C. 3. Preparation of protein samples: Dilute 3 mg protein to a 3250-μL volume with sample buffer. The amount of diluted sample per chamber in the ZOOM® IEF Fractionator is 650 μL. 4. Assemble the ZOOM® IEF Fractionator according to the manufacturer’s instructions. Six disks (pHs 3.0, 4.6, 5.4, 6.2, 7.0, and 10.0) are used to create five fractions that have a range of pH 3.0–10.0. 5. Add each buffer (anode or cathode) to the corresponding blank chamber. 6. Remove the sample chamber cap and add 650 μL of protein sample (step 3) to each chamber. 7. Fractionation can be carried out under the following conditions: 100 V for 20 min, 200 V for 80 min, and 600 V for 80 min (see Note 10). The starting current is approximately 0.6 mA, which increases to approximately 1.2 mA at the beginning of the 200-V step, and the ending current is approximately 0.2 mA. 8. Load the electro-focused samples to the narrow pH IPG strips for 2DE.
66
Cho et al.
Fig. 2. Narrow pH range 2DE images of plasma proteins after depletion of the major six abundant proteins through MARS. After microscale solution IEF (ZOOM® ), the pH 5.5–6.2 fraction was separated on pH 5.5–6.7 IPG strips by second isoelectric focusing and then resolved on a 9–16% gel. (A) Whole 2DE image of pH 3–10 NL and pH 5.5–6.7. (B) One spot on the pH 3–10 NL gel can be separated into two or more spots in the narrow pH range 2DE. (C) Many hidden spots on the pH 3–10 NL gel appear in the narrow pH range 2DE of normal and HCC plasma.
Protein Profiling by Two-Dimensional Electrophoresis
67
3.7. Fractionation of the Plasma Samples by Free Flow Electrophoresis To identify and isolate biomarker candidates from the plasma of diseased patients with HCC using 2DE, a higher resolution is critical, and the analysis can be done by performing narrow pH range IEF. However, for narrow pH range IEF, higher amounts of proteins (e.g., 10-fold or higher) should be loaded onto the IPG strip since the proteins present in other pH ranges will be discarded. Nevertheless, prefractionation or depletion is required prior to running both IEF and 2D gel. FFE is useful for prefractionation of plasma samples since it gives rise to a specific fraction of interest (e.g., pI, or density). For example, if one knows the pI of certain proteins, free fractionation by FFE can be useful for prefractionation of complex plasma. We describe here one of the several procedures for prefractionation of plasma samples using FFE. 1. Dissolve the TCA-precipitated, flow-through fractions of MARS (∼2.0 mg) into the 500-μL separation medium 3 (see below) (adopted from the manufacturer’s instructions). 2. Add traces of red acidic dye 2-(4-sulfophenylazo)-1,8-dihydroxy-3,6naphthalenedisulfonic acid (SPADNS, Aldrich) to ease the optical control of the migration of sample within the separation chamber. 3. FFE is carried out at 10°C using the following media (solutions marked at each inlet are applied): Anodic stabilization medium (Inlet I1 ), separation medium 1 (Inlet I2 ), separation medium 2 (Inlet I3−−5 ), separation medium 3 (Inlet I6 ), cathodic stabilization medium (Inlet I7 ), and counter-flow medium (Inlet I8 ). 4. To both the anode and the cathode, anodic circuit electrolyte and cathodic circuit electrolyte are applied, respectively. 5. Assemble the ProTeamTM FFE instrument (Tecan). Use a 0.4-mm spacer for the separation chamber and a flow rate of approximately 60 mL/h (Inlet I1−7 ) and a voltage of 1500 V, which results in a current of 20–24 mA. 6. Perfuse the separation chamber with the sample using the cathodal inlet at approximately 0.7 mL/h (4,17). Residence time in the separation chamber is approximately 33 min. 7. Collect each fraction into polypropylene, 96 deep-well plates, numbered 1 (anode) through 44 (cathode) (4). 8. Remove glycerol and HPMC by TCA/acetone precipitation and dissolve the proteins with sample buffer. 9. Load the electro-focused samples with narrow pH to the IPG strips for 2DE.
3.8. Preparation of 2D Gels 1. Cast the glass plates (separated by two 1.5-mm spacers positioned along the sides) and thin plastic sheets in the multi-casting chamber (20). 2. Prepare gel solution for making 10 gels (20 × 20 cm, 1.5-mm spacer, 9–16% gradient): heavy solution (66.7 mL of 5× Tris-HCl buffer, 75 mL of a 40%
68
Cho et al.
acrylamide stock solution, 0.7 mL of 10% ammonium persulfate (APS), 70 μL TEMED, and 191.7 mL of 50% glycerol), light solution (66.7 mL of 5× Tris-HCl buffer, 141.7 mL of a 40% acrylamide stock solution, 0.7 mL of 10% APS, 70 μL TEMED, and 125 mL distilled water). 3. Assemble the gradient maker and peristaltic pump. Pour the light gel solution into the mixing chamber (close to the casting chamber) and the heavy gel solution into the reservoir chamber of the gradient maker. Operate the magnetic stirrer in the mixing chamber. Turn on the peristaltic pump until the gel solution reaches 0.5-1.0 cm below the end of the glass plates (∼5 min). Check the flow rate, which should be between 100-120 mL/min. 4. After the gel solution is poured, overlay the gel solution with distilled water to exclude air and to ensure a level surface on the top of the gel. 5. Allow polymerization to occur overnight at room temperature.
3.9. Equilibration of the Sample and Running of the Gel To solubilize the electro-focused proteins and to allow SDS to polymerize, it is necessary to soak the IPG strips in SDS equilibration buffer. This step is analogous to boiling the sample in SDS buffer prior to SDS-PAGE. The reducing agents, dithiothreitol (DTT) and tributylphosphine (TBP), reduce disulfide bonds to sulfhydryls (cysteine residues). Alkylating agents and iodoacetamide (IAA) prevent reoxidation of the free sulfhydryl groups (21). 1. Prior to use, add approximately 158 μL TBP in 1 mL isopropanol to 100 mL SDS equilibration buffer and sonicate in a bath-type sonicator until the solution becomes transparent (see Note 11) (termed TBP equilibration buffer). 2. Add 15 mL TBP equilibration buffer to each strip (gel side up) and gently shake for 25 min (TBP equilibration) (see Note 12) on an orbital shaker. 3. Briefly rinse the IPG strip with 1× gel buffer and load the IPG strips onto the top of the gel and pour the agarose embedding solution (molten agarose solution with trace amounts of BPB) (see Note 13). 4. Perform SDS-PAGE (40 mA/gel) until the BPB dye reaches the bottom of the gel. Keep the temperature at 10°C. The total run time for 20 × 20 cm gels is approximately 6 h.
3.10. Coomassie Brilliant Blue G-250 Staining 1. Fix the separated proteins into the gel in a 200-mL fixing solution for 1 h. 2. Decant the fixing solution and stain the gel in Coomassie brilliant blue G-250 overnight. 3. Decant the staining solution. 4. Wash several times (>3 times) in distilled water for more than 4 h. 5. Scan the gel, then wrap the gel in plastic, and store it at 4°C.
Protein Profiling by Two-Dimensional Electrophoresis
69
3.11. 2D Gel Image Analysis 1. Import the gel image (recommended 12–16 bit, tiff format) and convert it into an ImageMaster file (*.mel). 2. Detect the protein spots and determine the volume and percentage volume of each spot. The percentage volume is the normalized value that remains relatively independent of any irrelevant variations between gels, particularly those caused by varying experimental conditions. 3. Select the differentially displayed protein spots (see Fig. 3).
3.12. Destaining, In-gel Deglycosylation, and In-gel Tryptic Digestion Most plasma proteins are glycosylated, including clotting factors, lipoproteins, and antibodies (22,23). These carbohydrate-containing proteins play major roles in the normal biological functions in plasma. Since glycopeptides are not easily completely ionized during MS analysis, which may lead to inadequate spectral data and low detection sensitivity due to the attached glycans, a strategy for the removal of glycans is necessary for protein identification. 1. Pick (or excise) the protein spot with an end-cut yellow tip and transfer the gel piece into a 1.5-mL Eppendorf tube. 2. Wash the gel piece with 100 μL distilled water. 3. Add 50 μL of 50 mM NH4HCO3 (pH 7.8) and ACN (6:4), and shake for 10 min. 4. Repeat step 3 until the Coomassie blue G250 dye disappears (2 to 5 times). 5. Decant the supernatant and dry the gel piece in a Speed Vac for 10 min (see Note 14). 6. Add 5 μL trypsin (12.5 ng/μL in 50 mM NH4 HCO3 ) and leave the gel piece on ice for 45 min. 7. Add 10 μL of 50 mM NH4HCO3 to the gel slice. 8. Incubate the gel piece at 37°C for 12 h.
3.13. Desalting of Peptides and MALDI Plating 1. Resin packing: Twist the column body (GELoader tip, Eppendorf) near the end of the tip and push the resin solution [Poros R2:Oligo R3 (2:1) in 70% (v/v) ACN, occasionally in a more efficient ratio of 1:1] with a 1-mL syringe. A packed resin length of 2-3 mm is suitable (18,19). 2. Equilibration of the column: Add 20 μL of 2% (v/v) formic acid and push the solution through the column with the 1-mL syringe. 3. Peptide binding: Add the peptide solution (supernatant of step 9 in Subheading 3.12, approximately 10-12 μL) and push this solution through the column with the syringe. 4. Washing: Add 20 μL of 2% (v/v) formic acid and push this solution through the column with the syringe.
70
Cho et al.
Fig. 3. Detection of PTMs on the 2DE of plasma proteins. (A) 2DE images of plasma proteins that were depleted of the major six abundant proteins through MARS, untreated (left) and alkaline phosphatase (AP)-treated (AP) (right). (B) One of the differentially displayed proteins after treatment with AP. (C) Data-dependant neutral loss scan spectrum of sequence KEPCVESLVSpQYFQTVTDYGKD corresponding to the phosphorylated apolipoprotein A-II precursor.
Protein Profiling by Two-Dimensional Electrophoresis
71
5. MALDI spotting: Add 1 μL matrix solution [10 mg/mL CHCA in 70% (v/v) can and 2% (v/v) formic acid] and directly spot the eluted peptides and matrix mixture onto the MALDI plate (Opti-TOFTM 384-well Insert, Applied Biosystems). 6. Reuse the column: Add 20 μL of 100% ACN and push this solution through the column with the syringe and repeat step 2 for equilibration of the column.
3.14. MALDI-TOF and Peptide Mass Fingerprinting 1. Analyze the peptide mass fingerprinting (PMF) with the Voyager DE-PRO or 4800 MALDI-TOF/TOF mass spectrometer (Applied Biosystems). 2. Obtain the mass spectra in reflectron/delayed extraction mode with an accelerating voltage of 20 kV and sum data from either 500 laser pulses (4800 MALDITOF/TOF) or 100 laser pulses (Voyager DE-PRO). 3. Calibrate the spectrum with tryptic auto-digested peaks (m/z 842.5090 and 2211.1046) and obtain monoisotopic peptide masses with Data Explorer 3.5 (PerSeptive Biosystems). 4. Search the Swiss-Prot and NCBInr databases with the Matrix Science search engine (http://www.matrixscience.com).
3.15. Profiling of PTMs on Selected Spots Although shotgun proteomics that utilize various labeling techniques (e.g., SILAC and iTRAQ) are useful for protein identification in a high-throughput manner, it has many limitations for PTM analysis. However, 2D gels usually display proteins with PTMs or isoforms of certain proteins on a single gel as spots in different positions, which can lead to further identification for their molecular characteristics with the aid of high resolution LC-MS/MS. For example, in a typical 2D gel of plasma, the phosphorylated forms of certain protein can be easily detected in a ladder form that results from different pIs. Figure 3 shows the localization of the exact site of phosphorylated apolipoprotein A-II precursor. As seen in the figure, there is clear difference between spots that are alkaline phosphatase (AP)-treated and those that are untreated in the 2D gel where the treated group has been shifted to a more basic position. The phosphorylation site of these proteins can be determined using multidimensional MS (MS2 and MS3 ). Here, we describe the procedure for identification of phosphorylated proteins by 2DE coupled to MS. 1. Desalting is processed for the MARS-treated (high-abundance proteins depleted) plasma sample using Amicon Ultra-15 (Molecular Weight Cut Off; 5 kDa, Millipore). 2. Dephosphorylation is carried out overnight at 37°C in a solution of 0.4% ammonium carbonate buffer (pH 8.5) with 24 ng/μL calf intestine AP in 0.4% NH4HCO3. 3. The reaction is stopped by freeze drying for further analysis.
72
Cho et al.
4. Execute 2DE, picking, extraction, and desalting of peptides under the same conditions (see Subheadings 3.8-3.13). 5. Dissolve the extracted and desalted peptides in 10 μL of LC-MS/MS solution [0.4% (v/v) acetic acid and 0.005% (v/v) heptafluorobutyric acid (HFBA)]. 6. Nano LC-MS/MS analysis is then performed on an Agilent Nano HPLC system (Agilent) and LTQ mass spectrometer (Thermo Electron, San Jose, CA). 7. The capillary column used for LC-MS/MS analysis (150 mm × 0.075 mm) was obtained from Proxeon (Odense M, Denmark), and the slurry was packed in-house with a 5-μm, 100-Å pore size Magic C18 stationary phase (Michrom Bioresources, Auburn, CA). 8. The mobile phase A for LC separation was 0.4% acetic acid and 0.005% HFBA in deionized water (Cascada™ , Pall, USA), and the mobile phase B was 0.4% acetic acid and 0.005% HFBA in ACN. 9. The sample obtained from the Oasis HLB (Waters, USA) desalting step and Nanosep (Pall, USA) filtering was loaded onto the LC column. 10. The chromatography gradient was designed to provide a linear increase from 5% B to 35% B over 50 min and from 40% B to 60% B over 20 min and from 60% B to 80% B over 5 min. The flow rate was maintained at 300 nL/min. 11. The mass spectra were acquired using data-dependent acquisition with a full mass scan (400-1800 m/z) followed by MS/MS scans. Each MS/MS scan acquired was an average of three microscans on LTQ. 12. The temperature of the ion transfer tube was controlled at 200°C, and the spray was 2.0–3.0 kV. The normalized collision energy was set at 35% for MS2. 13. To determine the exact position of the phosphorylation site, the automated neutral loss MS3 scan was employed, which relies on the observed behavior of phosphopeptides subjected to MS/MS analysis in an ion trap. If the MS/MS scan produces a fragment phosphate group (98 with charge state 1+, 49 with charge state 2+, and 32.6 with charge state 3+), an MS3 scan of the product ion is initiated (see Note 15).
4. Notes 1. Donors were tested and determined negative for HIV-1 and HIV-2 antibodies, HIV-1 antigen (HIV-1), Hepatitis B surface antigen (HBsAg), Hepatitis B core antigen (anti-HBc), Hepatitis C virus (anti-HCV), HTLV-I/II antibody (antiHTLV-I/II), and syphilis. 2. No protease inhibitor cocktails were used. This procedure required 2 h at 2-6°C. 3. Approximately 10% of the sample was left at the bottom of the secondary tube to ensure that no cellular material was collected. 4. If excess of protease inhibitors are used, the resolving power of protein spots in the 2D gel will be decreased, and the border of the spots will be unclear. 5. If protein pellets are dried completely in the Speed Vac, they will be not redissolved in sample buffer. Pellets should be air dried for 15–30 min.
Protein Profiling by Two-Dimensional Electrophoresis
73
6. To ensure complete dissolution of the sample buffer, it is usually recommended to warm the sample buffer at room temperature. The sample buffer that includes proteins should not be heated to avoid carbamylation of proteins by isocyanate, which may lead to charge heterogeneities that are formed from the decomposition of urea. 7. Cup loading: Rehydrate the IPG gel strip with 350 μL sample buffer (proteins are not included), and load the 100-μL protein sample in sample buffer in the sample cup. High salt concentrations are better tolerated by cup loading. 8. Apply low voltages (100 V) at the beginning of the run for 3–5 h. Replace the filter paper (for desalting purposes) at the end of the run. 9. After 1D (first dimension) is run, IPG strips that were not immediately used for 2D (second dimension) run can be preserved at –80°C for several months. 10. If electrical current passes through the system, BPB dye starts to migrate toward the anode reservoir, which eventually results in a change in the color of the anode buffer (to yellow). 11. Concentrated TBP reacts violently with organic matter. All procedures for preparing TBP stock solutions should be done in a fume hood. Store the TBP stock solution in the dark at 4°C. Do not store it longer than 2 weeks. 12. DTT/IAA equilibration procedure: For reduction and alkylation of proteins, the DTT/IAA equilibration procedure is also useful to replace the use of TBP equilibration procedure. Divide the SDS equilibration buffer into two 50-mL aliquots. Add 1 g DTT to the first aliquot and 1.25 g IAA to the second aliquot. Add 10 mL of the DTT equilibration buffer to each strip and place on a shaker for 10 min. Decant the DTT equilibration buffer and shake with 10 mL of the IAA equilibration buffer for another 10 min. 13. To prepare the agarose embedding solution, dissolve 1 g of agarose in 100 mL of small gel buffer and melt in a microwave on medium power. For complete melting of the agarose solution, heat the agarose solution in short intervals with occasional swirling to mix the solution. 14. In-gel deglycosylation: After destaining, one may remove the glycan groups of glycoproteins by trypsin digestion for obtaining peptides of highest purity. Rehydrate gel spots (see Subheading 3.12, step 5) with 10 μL of PNGase F stock solution (10 μU) and incubate for 3 h at 37°C. Decant the supernatant including the glycans. Wash the gel piece with 50 μL 50 mM NH4HCO3 (pH 7.8) and ACN (6:4). Dry the gel piece in a Speed Vac. 15. The SEQUEST software was used to identify the peptide sequences: DeltaCn ≥ 0.1 and Rsp ≤ 4; Xcorr ≥ 1.9 with charge state 1+, Xcorr ≥ 2.2 with charge state 2+, and Xcorr ≥ 3.75 with charge state 3+ were used as cutoffs for peptide identification.
Acknowledgments This study was supported by a grant from the Korean Health 21 R&D project, Ministry of Health & Welfare, Republic of Korea (A030003 to YKP).
74
Cho et al.
References 1. Putnam, F. W. (ed) (1987) The Plasma Proteins, Academic Press, New York. 2. Anderson, N. L., and Anderson, N. G. (2002) The human plasma proteome: history, character, and diagnostic prospects. Mol. Cell. Proteomics 1, 845–867. 3. Lee, H. J., Lee, E. Y., Kwon, M. S., and Paik, Y. K. (2006) Biomarker discovery from the plasma proteome using multidimensional fractionation proteomics. Curr. Opin. Chem. Biol. 10, 42–49. 4. Cho, S. Y., Lee, E. Y., Lee, J. S., Kim, H. Y., Park, J. M., Kwon, M. S., Park, Y. K., Lee, H. J., Kang, M. J., Kim, J. Y., Yoo, J. S., Park, S. J., Cho, J. W., Kim, H. S., and Paik, Y. K. (2005) Efficient prefractionation of low-abundance proteins in human plasma and construction of a two-dimensional map. Proteomics 5, 3386–396. 5. Omenn, G. S., States, D. J., Adamski, M., and Blackwell, T. W. (2005). Overview of the HUPO Plasma Proteome Project: results from the pilot phase with 35 collaborating laboratories and multiple analytical groups, generating a core dataset of 3020 proteins and a publicly-navailable database. Proteomics 5, 3226–3245. 6. States, D. J., Omenn, G. S., Blackwell, T. W., Fermin, D., Eng, J., Speicher, D. W., and Hanash, S. M. (2006) Challenges in deriving high-confidence protein identifications from data gathered by a HUPO plasma proteome collaborative study. Nat. Biotechnol. 24, 333–338. 7. Yang, Z., Hancock, W. S., Chew, T. R., and Bonilla, L. (2005) A study of glycoproteins in human serum and plasma reference standards (HUPO) using multilectin affinity chromatography coupled with RPLC-MS/MS. Proteomics 5, 3353–3366. 8. Wang, Y., Wu, S. L., and Hancock, W. S. (2006) Approaches to the study of N-linked glycoproteins in human plasma using lectin affinity chromatography and nano-HPLC coupled to electrospray linear ion trap-Fourier transform mass spectrometry. Glycobiology 16, 514–523. 9. Gorg, A., Boguth, G., Kopf, A., Reil, G., Parlar, H., and Weiss, W. (2002) Sample prefractionation with Sephadex isoelectric focusing prior to narrow pH range twodimensional gels. Proteomics 2, 1652–1657. 10. Wu, T. L. (2006) Two-dimensional difference gel electrophoresis. Methods Mol. Biol. 328, 71–95. 11. Park, K. S., Kim, H., Kim, N. G., Cho, S. Y., Choi, K. H., Seong, J. K., and Paik, Y. K. (2002) Proteomic analysis and molecular characterization of tissue ferritin light chain in hepatocellular carcinoma. Hepatology 6, 1459–1466. 12. Park, K. S., Cho, S. Y., Kim, H., and Paik, Y. K. (2002) Proteomic alterations of the variants of human aldehyde dehydrogenase isozymes correlate with hepatocellular carcinoma. Int. J. Cancer 2, 261–265. 13. Rai, A. J., Glefand, C. A., Haywood, B. C., Warunek, D. J., Yi, J., Schuchard, M. D., Mehigh, R. J., Cockrill, S. L., Scott, G. B., Tammen, H., Schulz-Knappe, P., Speicher, D. W., Vitzthum, F., Haab, B. B., Siest, G., and Chan, D. W. (2005) HUPO plasma proteome project specimen collection and handling: towards the standardization of parameters for plasma proteome samples. Proteomics 5, 3262–3277.
Protein Profiling by Two-Dimensional Electrophoresis
75
14. Huang, L., Harvie, G., Feitelson, J. S., Gramatikoff, K., Herold, D. A., Allen, D. L., Amunngama, R., Hagler, R. A., Pisano, M. R., Zhang, W. W., and Fang, X. (2005) Immunoaffinity separation of plasma proteins by IgY microbeads: meeting the needs of proteomic sample preparation and analysis. Proteomics 5, 3314–3328. 15. Herbert, B. and Righetti, P. G. (2000) A turning point in proteome analysis: sample prefractionation via multicompartment electrolyzers with isoelectric membranes. Electrophoresis 21, 3639–3648. 16. Miklos, G. L. and Maleszka, R. (2001) Integrating molecular medicine with functional proteomics: realities and expectations. Proteomics 1, 30–41. 17. Weber, G., Islinger, M., Weber, P., Eckerskorn, C., and Volkl, A. (2004) Efficient separation and analysis of peroxisomal membrane proteins using free-flow isoelectric focusing. Electrophoresis 25, 1735–1747. 18. Choi, B. K., Cho, Y. M., Bae, S. H., Zoubaulis, C. C., and Paik, Y. K. (2003) Single-step perfusion chromatography with a throughput potential for enhanced peptide detection by matrix-assisted laser desorption/ionization-mass spectrometry. Proteomics 3, 1955–1961. 19. Gobom, J., Nordhoff, E., Mirgorodskaya, E., Ekman, R., and Roepstorff, P. (1999) A sample purification and preparation technique based on nano-scale RP-columns for the sensitive analysis of complex peptide mixtures by MALDI-MS. J. Mass Spectrom. 24, 105–116. 20. Walsh, B. J., and Herbert, B. R. (1999) Casting and running vertical slap-gel electrophoresis for 2D-PAGE. Methods Mol. Biol. 112, 245–253. 21. Newhall, W. J. and Jones, R. B. (1983) Disulfide-linked oligomers of the major outer membrane protein of chlamydiae. J. Bacteriol. 154, 998–1001. 22. Kaufman, R. J. (1998) Post-translational modifications required for coagulation factor secretion and function. Thromb. Haemost. 79, 1068–1079. 23. Tabas, I. (1999) Nonoxidative modifications of lipoproteins in atherogenesis. Annu. Rev. Nutr. 19, 123–139.
II Clinical Proteomics by 2DE and Direct MALDI/SELDI MS Profiling
5 Analysis of Laser Capture Microdissected Cells by 2-Dimensional Gel Electrophoresis Daohai Zhang and Evelyn Siew-Chuan Koay
Summary Laser capture microdissection (LCM) is a powerful tool for procuring near-pure populations of targeted cell types from specific microscopic regions of tissue sections, by overcoming problems due to tissue heterogeneity and minimizing intermixture and contamination by other cell types. The combination of LCM with various proteomic technologies has enabled high-throughput molecular analysis of human tumors, and provided critical tools in the search for novel disease markers and therapeutic targets. As an example, we describe the application of LCM in dissecting the tumor cells in breast cancer for macromolecular extraction and subsequent protein separation by 2-dimensional gel electrophoresis (2-D GE). The protocols and the key issues involved in preparing ethanol-fixed paraffin-embedded tissue blocks and microscopic sections, microdissecting the cells of interest using the PixCell II LCM system, extracting and separating the cellular proteins by 2-D GE, and preparing selective proteins for peptide mass analysis by mass spectrometry, are discussed. The aim is to provide a practical guide in performing highthroughput microdissection of target cells and gel-based proteomics, which can be adapted to research in cancer formation and growth.
Key Words: laser capture microdissection; 2-dimensional gel electrophoresis; breast cancer; proteomics; silver staining.
1. Introduction Cellular proteins (collectively known as “proteomes”) are less susceptible than the transcriptome to experimental artifacts arising from the rigors of tissue collection and processing, and advances in global protein expression analysis From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols Edited by: A. Vlahou © Humana Press, Totowa, NJ
77
78
Zhang and Koay
(expression proteomics) have been used in mapping cellular pathways, identifying the molecular alterations associated with disease onset and progression and searching for potential tumor markers or drug targets in human disease, especially in cancer. However, to obtain cell-specific protein profiles, homogeneous or near-pure populations of the cells of interest, free from contamination by adjacent cell types, are prerequisites. Laser capture microdissection (LCM) was developed to enable the procurement of near-pure populations of the target cells with a greater speed and precision than is possible with manual dissection methods. LCM permits selective transfer of specific cell types, under direct microscopic visualization, from complex tissues onto a polymer film that is activated by laser pulses, whilst retaining their morphology. The homogeneity of encapsulated cells can be verified microscopically. With these inherent advantages, LCM has become a valuable research tool and has been applied to cellular and molecular studies of various cancers, including breast (1,2), colon (3), and liver (4) cancers. It is equally efficacious in procuring cell populations from both frozen tissues (3,4) and ethanol-fixed, paraffin-embedded tissues (1,5). Protein profiles of the LCM-dissected cells can be obtained by twodimensional fluorescence difference gel electrophoresis (2-D DIGE) (6), 16 O/18 O isotopic labeling (7), differential iodine radioisotope detection (2), isotope-coded affinity tag (iCAT) coupled with two-dimensional tandem mass spectrometry (2-D LCMS/MS) (8), and mass spectrometry compatible silver staining (1,9). Protein samples from LCM-dissected cells can also be applied to reverse-protein arrays to analyze the key cellular signaling pathways and metabolic networks (10,11). In this chapter, the in-house protocols used in the authors’ laboratory for procuring near-pure populations of breast tumor cells from clinical samples, and for the extraction, isolation, and analysis of their protein profiles, are described. These include: (1) preparation of ethanolfixed paraffin-embedded tissue blocks; (2) microdissection using the Pix II LCM System and cellular protein extraction; (3) protein separation by 2-D gel electrophoresis (2-D GE), silver staining, and gel image analysis; and (4) preparation of targeted proteins of interest for peptide mass analysis by tandem mass spectrometry and identification of proteins of interest via database search. 2. Materials 2.1. Histology—Tissue Block and Tissue Section Preparation 1. 2. 3. 4.
70% (v/v), 80% (v/v), 95% (v/v), 100% ethanol Deionized or Milli-Q water (Millipore, Bedford, MA, USA) Hematoxylin solution, Mayer’s (Sigma, St. Louis, MO, USA) Eosin Y solution (Sigma)
Combining LCM with 2-D Gel Electrophoresis
79
5. Complete, mini protease inhibitor cocktail tablets (Roche Applied Science, Pleasanton, CA, USA) 6. Disposable microtome blades (Feather Safety Razor Co., Ltd., Osaka, Japan) 7. Uncharged microscopic glass slides (Paul Marienfeld GmbH & Co, KG, LaudaKoenigshofen, Germany) 8. Sakura Tissue-Tek® V.I.P.TM 5 Jr tissue processor (Sakura Finetek, Inc. Japan Co., Ltd, Tokyo) 9. Paraffin wax—Paraplast® tissue embedding medium; melting point 56-58°C, store at room temperature (RT) (Structure Probe, Inc., West Chester, PA, USA) 10. Xylenes, Reagent Grade (Sigma) 11. Embedding molds—super metal base molds, 66mm × 54mm × 15mm (Surgipath Medical Industries, Richmond, IL, USA)
2.2. Laser Capture Microdissection and Protein Sample Preparation 1. PixCell II LCM system (Arcturus Engineering, Mountain View, CA, USA) 2. CapSure transparent plastic caps (Arcturus Engineering) 3. Lysis buffer: 7 M urea, 2 M thiourea, 4% (w/v) CHAPS, 1% Nonidet P (NP)-40, 0.5% (v/v) Triton X-100, 50 mM dithiothreitol (DTT), 40 mM Tris-HCl, pH 7.5, 2 mM tributyl phosphine (TBP), and 1% (v/v) IPG buffer (pH 3–10). Store at RT. 4. PlusOne 2-D Clean-up Kit (GE Healthcare, San Francisco, CA, USA) 5. Immobilized pH gradient (IPG) buffer (pH 3–10) (GE Healthcare) 6. PlusOne 2-D Quantitation Kit (GE Healthcare)
2.3. Isoelectric Focusing (IEF) and Sodium Dodecyl Sulfate-Polyacrylamide Gel Electrophoresis (SDS-PAGE) 1. EttanTM IPGphorTM IEF electrophoresis unit (GE Healthcare) 2. Ceramic strip holders and EttanTM IPGphorTM Strip Holder Cleaning Solution (GE Healthcare) 3. ImmobilineTM IPG DryStrips (18 cm, pH 3–10, NL) (GE Healthcare) 4. DryStrip Cover Fluid (GE Healthcare) 5. Sample rehydration buffer: 7 M urea, 2 M thiourea, 4% (w/v) CHAPS, 1% (w/v) NP-40, 1% (v/v) IPG buffer, 50 mM DTT. DTT was added freshly to the rehydration buffer prior to use. Store at RT. 6. Equilibration buffer A (prepare 10 ml for each strip): 6 M urea, 30% glycerol, 2% SDS, 1% DTT, 50 mM Tris-HCl, pH 8.8. DTT is added to the stock solution before use. 7. Equilibration buffer B (prepare 10 ml for each use strip): 6 M urea, 30% glycerol, 2% SDS, 250 mg (2.5%, w/v) iodoacetamide (IAA), 50 mM Tris-HCl, pH 8.8. IAA is added to the stock solution before use. 8. 10% SDS-acrylamide gel: 33 ml acrylamide/bis (30% T, 5% C) (Bio-Rad Laboratories, Hercules, CA, USA), 25 ml Tris (1.5 M, pH 8.8), 1 ml 10% (w/v) SDS, 0.5 ml 10% (w/v) ammonium persulfate (freshly prepared on the day of use), 35 μl TEMED (Bio-Rad). Make up to 100 ml with Milli-Q water.
80
Zhang and Koay
9. Water-saturated isobutanol: Shake equal volumes of Milli-Q water and isobutanol in a glass bottle and allow the mixture to separate. Transfer the top layer to a new bottle and store at RT. 10. Agarose sealing solution: Dissolve 0.5% low-melting-point agarose and 0.1% (w/v) bromophenol blue in 1× SDS-PAGE running buffer. Store at RT. 11. SDS-PAGE running buffer: 25 mM Tris, 198 mM glycine, 0.2% (w/v) SDS, pH 8.3 12. PROTEANTM II xi Cell system (Bio-Rad)
2.4. Silver Staining (see Note 1) 1. Fix solution: 5% acetic acid and 50% ethanol per 100 ml 2. Sensitivity-enhancing solution: 30% (v/v) ethanol, 6.8% (w/v) sodium acetate, 100 μl of 2% (w/v) sodium thiosulphate per 100 ml 3. Silver staining solution: 0.25% (w/v) silver nitrate 4. Development solution: 2.5% (w/v) anhydrous potassium carbonate, 20 μl of 2% (w/v) sodium thiosulphate per 100 ml, 40 μl of 37% formaldehyde per 100 ml. 5. Stop solution: 4% (w/v) Tris and 2% (v/v) acetic acid per 100 ml 6. Gel store (soak) solution: 1% (w/v) sodium acetate and 10% (v/v) methanol per 100 ml
2.5. Gel Image Analysis 1. Personal Densitometer SI (Molecular Dynamics, Sunnyvale, CA, USA) 2. ImageMaster 2D Elite (Platinum) software (GE Healthcare)
2.6. In-gel Trypsin Digestion and Preparation for MS Analysis 1. Destaining solution: 30 mM potassium ferricyanide and 100 mM sodium thiosulfate (1:1) 2. 25 mM sodium bicarbonate 3. Dehydrating solution: 50 mM sodium bicarbonate and 50% (v/v) methanol per 100 ml 4. SpeedVac centrifuge (TeleChem International, Inc., Sunnyvale, CA, USA) 5. Digestion solution: 40 ng/μl trypsin sequencing grade (Promega, Madison, WI, USA) in 20 mM ammonium bicarbonate solution 6. Extraction solution (for hydrophobic peptides): 5% (v/v) trifluoracetic acid (TFA) and 50% (v/v) acetonitrile (ACN) per 100 ml 7. Peptide reconstitution solution: 0.1% (v/v) TFA 8. ZipTip C18 columns (Millipore) 9. Eluant: 70% (v/v) ACN and 0.1% TFA per 100 ml 10. Stainless steel MALDI-TOF sample target plates (Applied Biosystems, Framingham, MA, USA) 11. Alpha-cyano-4-hydroxycinnamic acid (-CHCA) matrix, 3 mg/ml (Sigma) 12. Applied Biosystems 4700 MALDI-TOF/TOF mass spectrometer
Combining LCM with 2-D Gel Electrophoresis
81
2.7. Database Search for Protein Identification 1. MASCOT software (Matrix Science, London, England) 2. MS-Fit software (http://prospector.ucsf.edu)
3. Methods The methods described below have been successfully used in the authors’ laboratory for proteomics studies in human breast cancer specimens (1,9) and can be applied to other cancer tissues as well. Breast tumors and matched normal tissues were obtained from the Tissue Repository Unit of the National University Hospital, Singapore, after approval by our Institutional Review Board. 3.1. Preparation of Tissue Sections for LCM In this step, frozen tissues can be directly transferred from the –80°C freezer, where they had been stored after surgical excision and trimming, to a pre-cooled tube containing 70% (v/v) ethanol and kept on ice. Ethanol-fixed paraffinembedded tissue blocks should be prepared as quickly as possible, and the completed blocks stored at or below 4°C. 1. Fix the frozen tissue overnight in 70% ethanol at 4°C. 2. Place each ethanol-fixed tissue piece, trimmed to appropriate dimensions, into a pre-cooled cassette within the tissue processor and dehydrate according to the following procedure: 30 min each in 70% and 80% ethanol at 40°C; 45 min in 95% ethanol at 40°C (twice); 45 min in 100% ethanol at 40°C (twice), and 45 min in xylene at 40°C (twice) (see Note 2). 3. Embed the specimen in paraffin using embedding molds, with four changes of paraffin after every 30-min interval. 4. Store the paraffin blocks at or below 4°C, if they were not to be processed immediately for sectioning. 5. Put the block in a –20°C freezer for at least 1 h before cutting sections from it. 6. Cut sections of 8 μm thickness using a standard microtome. Blades should be changed regularly (see Note 3). 7. Collect the tissue sections on uncharged microscopic glass slides, allow tissue sections to be air dried, and store the cut sections at or below 4°C.
3.2. Staining of Paraffin-embedded Sections The staining of sections for LCM is similar to that used in most histology laboratories for morphological assessment. However, using minimal amount of the stain to visualize the tissue for microdissection will improve macromolecule recovery (see Note 4). One tablet of protease inhibitor cocktail should be added
82
Zhang and Koay
to every 10 ml of each reagent (except xylene), and all reagents prepared using double deionized water or Milli-Q® water. Staining should be performed as close as possible to the scheduled LCM dissection. 1. Deparaffinize the sections in fresh xylene for 5 min, followed by another 5 min with a fresh change of xylene. 2. Rehydrate for 15 s in each step of the following series: 100% ethanol, 95% ethanol, 75% ethanol, and deionized water. 3. Stain with Mayer’s Hematoxylin for 30 s. 4. Rinse off excess stain with deionized water for 15 s; repeat rinse a second time. 5. Dehydrate for 15 s in 70% ethanol. 6. Stain with Eosin Y for 5 s. 7. Dehydrate the sections for 15 s (twice) in 95% ethanol, 15 s (twice) in 100% ethanol, and 60 s in xylene. 8. Air-dry for approximately 2–5 min to allow xylene to evaporate completely (see Note 5). 9. The tissue is now ready for LCM (see Note 6).
3.3. Laser Capture Microdissection and Protein Sample Preparation The PixCell II LCM system (Arcturus Engineering, Mountain View, CA, USA) is used for specific microdissection of tumor cells in our laboratory. Tissue sections are usually mounted on uncoated glass slides to provide support for the CapSure cap during microdissection. LCM utilizes an infrared laser integrated into a standard microscope, and when the desired cells move into the path of the light source, the investigator activates the laser, which in turn activates the membrane (a short laser pulse emitted heats the transparent membrane to ∼90°C for 5 ms). This melts the membrane, with subsequent binding and encapsulation of the cells of interest, segregating them from the surrounding cells and connective tissues. Images of the tissues before and after microdissection and of the captured cells on the cap can be visualized, thus maintaining an accurate record of each dissection. The laser beam diameter may be adjusted from 7.5 to 30 μm to procure either single cells or groups of cells, respectively. 1. Place the slide containing the prepared tissue on the microscope stage. Set the laser parameters as follows: spot diameter at 15 μm, pulse duration at 5 ms, and power at 50 mW. 2. Scan the tissue section to locate the desired cells. Dissect out the target cells of interest and capture all encapsulated cells from each section in quick succession into one cap. Cells dissected from ∼2500 shots can be captured into one cap (see Note 7). Figure 1 shows an example of tumor cells before and after microdissection.
Combining LCM with 2-D Gel Electrophoresis A
B
83 C
Fig. 1. Laser capture microdissection (LCM) of breast tumor cells. The tissue section on the uncharged glass slide was stained with hematoxylin and eosin and microdissected with the PixCell II LCM system (Arcturus Engineering). (A) section before LCM; (B) section after LCM; (C) microdissected cell. 3. Place the LCM cap on an Eppendorf tube containing 100 μl of lysis buffer with protease inhibitor and invert the tube and vortex vigorously for 1 min. 4. Place the tube on ice for approximately 20 min and sonicate the microdissected sample in a bath sonicator with 5 s pulses, in between 5-s intervals, for a duration of 1 min. 5. Replace the sample on ice immediately after 1-min sonication. 6. Centrifuge the sample at 16,000 g for 20 min at 4°C and transfer the supernatant to a new Eppendorf tube. 7. Determine the protein concentration using the PlusOne 2D Quantitation kit (GE Healthcare) and clean up the sample using the PlusOne 2-D cleanup kit (GE Healthcare), following the manufacturer’s instructions closely. 8. Dissolve the protein pellet in the appropriate volume of sample rehydration buffer and aliquot according to experimental plans for immediate and later usage. Store the aliquotted samples at –80°C until analyzed (see Note 8).
3.4. First-dimension Gel Electrophoresis (Isoelectric Focusing) 1. Prepare the strip holder for the 18-cm IPG strip (see Note 9). 2. Squeeze a few drops of Ettan™ IPGphor™ Strip Holder Cleaning Solution (GE Healthcare) into the slot and clean thoroughly. Rinse with Milli-Q water and dry completely. 3. Mix approximately 50 μl of the reconstituted protein samples (∼100–150 μg) with the appropriate volume of rehydration buffer. The total volume should be 340 μl for one 18-cm IPG strip. 4. Transfer the entire volume of the diluted protein sample into the groove of the IPG strip holder. 5. Remove the cover from the IPG strip (18 cm, pH 3–10) and place the IPG strip in the holder such that the gel of the strip is in contact with the sample (i.e., gel
84
Zhang and Koay
6. 7. 8. 9.
side down). Try to remove any trapped air bubbles by lifting the strip up and down from one side. Overlay the IPG strip with 2–3 ml of DryStrip Cover Fluid to prevent urea crystallization and evaporation, and replace the cover on the strip holder. Rehydrate the IPG strip at 20 V for 12 h at 20°C. Perform IEF under the following conditions: 500 V for 1 h, 2000 V for 1 h, 4000 V for 1 h, and 8000 V for 6 h. Once focusing is complete, pour off the oil. The strips can be stored at –20°C for several weeks, or immediately treated as described below (see Subheading 3.5).
3.5. IPG Strip Equilibration 1. Place the focused IPG strips in a container with 10 ml of equilibration buffer A and shake for 15 min at RT (see Note 10). 2. Transfer the IPG strip to a container with 10 ml of equilibration buffer B and shake for 15 min at RT (see Note 10). 3. The equilibrated strips can then be processed for second-dimension gel electrophoresis.
3.6. Second-dimensional SDS-PAGE Prepare the SDS-polyacrylamide gels in advance, and make sure that the gels are well polymerized before performing the equilibration of IPG strips. The proteins have to be charged by equilibration with SDS, and be reduced and alkylated to avoid the formation of oligomers. In our laboratory, we use the PROTEAN II xi Cell system (Bio-Rad) for SDS-PAGE. 1. Assemble the gel casting cassette as per the manufacturer’s instructions. 2. Prepare 10% SDS-PAGE (see Note 10) and pour the solution slowly into the cassette (two 16 cm × 20 cm glass plates sandwiched by 1.5-mm thick spacers) until the gel height is approximately 1 cm from the top. 3. Overlay the gel solution with 2 ml of water-saturated isobutanol. It is best to pour 1 ml of water-saturated isobutanol from one side of the gel and 1 ml on the other side. Do not pour it all along the gel meniscus. 4. Allow the gel to polymerize for at least 2 h. 5. When polymerization is completed, remove the water-saturated isobutanol and rinse with water again. 6. With a pair of forceps, carefully place the equilibrated strip on top of the PAGE gel, with the acidic side of the strip at left. Cover the strip with melted agarose sealing solution (see Note 11). 7. Assemble the electrophoresis unit (Bio-Rad) and perform electrophoresis at 15°C as follows: 40 V for 15 min or until the blue dye enters the gel and then raise the voltage to 125 V and run the gel overnight or until the blue dye migrates to the bottom of the gel. 8. Switch off the main power and disassemble the gel cassette.
Combining LCM with 2-D Gel Electrophoresis
85
9. Place the gel in a glass container and wash the gel with Milli-Q water. 10. Stain the gel using the mass spectrometry-compatible silver staining protocol (see Subheading 3.7).
3.7. Silver Staining and Image Analysis 1. The silver staining protocol as described below is used in the authors’ laboratory and is highly compatible with protein identification by MALDI-TOF MS and MALDI-TOF/TOF MS/MS. It should be noted that adequate washing with MilliQ water is essential to reduce the risk of keratin contamination. All the solutions must be prepared with Milli-Q water, and all the chemical reagents should be filtered to remove any particles that may cause interference during MS analysis. All solutions prepared from solid chemicals should be freshly prepared before performing silver staining. Fix the gel with fixing solution for at least 2 h, changing the solution afresh at hourly intervals. 2. Briefly wash with Milli-Q water, with constant shaking for about 15 min. 3. Remove the wash and cover the gel with appropriate sensitivity-enhancing solution and incubate for 1 h, with constant shaking. 4. Wash the gel thoroughly with Milli-Q water for 6 × 15 min, with gentle shaking and replacing with fresh Milli-Q water after each cycle (see Note 12). 5. Stain the gel with silver staining solution for 30 min. 6. Wash off excess stain from the gel with Milli-Q water (twice, for 2 × 1 min). 7. Develop the gel for 5–30 min in a developing solution (see Note 13). 8. Add Stop Solution and shake the gel for approximately 20 min to stop the reaction. 9. Wash the gel using Milli-Q water for 20 min; replace water and repeat the wash. 10. Scan the gel using Personal Densitometer SI, or store the gel in the gel soak solution for analysis at a later time. 11. Capture the image using ImageMaster 2D Elite software (GE Healthcare). The image analysis includes spot detection, quantification and normalization of spot intensity to the background interferences, according to the instructions from the software. An example of images showing the differences between the protein profiles of LCM-microdissected HER-2/neu positive and -negative tumor cells is shown in Fig. 2. 12. Analyze the image using the software and identify spots that show significant differences in spot intensities (see Note 14), reflecting differential protein expression in the two subtypes of breast cancer triggered by the presence or suppression of HER-2/neu oncogene. Only those spots that show either more than threefold or less than threefold change in signal intensity, consistently from three replicate sets of gels, are considered as demonstrating differential protein expression and selected for further analysis by MALDI-TOF MS/MS. The likelihood of any protein displaying less convincing evidence of differential protein expression being a potential biomarker for early detection of tumor growth or a therapeutic target for breast cancer treatment is low.
86
Zhang and Koay HER-2/neu-P kDa
HER-2/neu-N
pI3
10
pI3
10
92 NP004095
50 AAH025396
35
P04075
P06753-2 P07339 NP001531
28 AAB49495
NP000627
Fig. 2. Silver-stained protein profiles of LCM-dissected cells. Protein samples from HER-2/neu positive and -negative cells are separated by using IPG® ( strips (18 cm, pH 3–10 NL) and homogeneous SDS-PAGE (10%), and then stained with silver nitrate. Silver-stained gels were scanned using the Personal Densitometer SI (Molecular Dynamics) and differentially expressed protein spots were analyzed by ImageMaster 2-D Elite software (GE Healthcare). The Accession Numbers indicate the protein ID identified by MALDI-TOF/TOF tandem mass spectrometry and NCBInr database search using Mascot software (Matrix Science, London, UK).
3.8. Trypsin Digestion and Preparation of Peptides for Mass Spectrometric Analysis 1. Excise the silver-stained protein spots showing significant differential protein expression, as mentioned above, one at a time, taking care not to include adjacent proteins in vicinity, and transfer to individual tubes. 2. Wash with 100 μl of Milli-Q water for 5 min. 3. Add 50 μl of the destaining solution into the tubes, and about 20 min on a platform shaker at RT until the gels become clear in color. 4. Remove the solution carefully and wash with 100 μl of Milli-Q water. 5. Incubate the gel pieces with 25 mM sodium bicarbonate for 20 min, and then cut them into smaller pieces with the tip of the transfer pipette. Avoid carryover and contamination during repetitive work on consecutive samples. 6. Rinse the gel pieces with Milli-Q water, discard the wash after pulsing down the gel pieces, and repeat the washing process three times. 7. Add 100 μl of dehydrating solution and incubate for 20 min at RT. 8. Dry the gel pieces in a SpeedVac centrifuge. 9. Re-swell the dried gel pieces with 10–20 μl of Digestion Solution and leave overnight at 37°C to ensure complete digestion. 10. Extract the resultant hydrophilic peptides first with 10 μl of Milli-Q water for 1 h.
Combining LCM with 2-D Gel Electrophoresis
87
11. Then extract the hydrophobic peptides with Extraction Solution for 2 h. 12. Pool the extracted hydrophilic and hydrophobic peptides and dry the peptide mixture using the SpeedVac centrifuge. 13. Redissolve the dried peptides in 10 μl of 0.1% (v/v) TFA. 14. Desalt the sample with ZipTip C18 columns (Millipore) and elute the treated and purified peptides with 2.5 μl of Eluant. 15. Mix 0.5 μl of the sample eluate with 0.5 μl of CHCA matrix (3 mg/ml) and spot the mixture onto the stainless steel MALDI-TOF sample target plates. 16. The pretreated peptide samples must be stored on ice during transfer to the core facility for mass spectrometric analysis. In our laboratory, peptide mass spectra are obtained by the Applied Biosystems 4700 Proteomics Analyzer MALDI-TOF/TOF mass spectrometer, set in the positive ion reflector mode. The subsequent MS/MS analyses are performed in a data-dependent manner, and the 10 most abundant ions fulfilling certain preset criteria are subjected to high-energy CID analysis. The collision energy is set to 1 keV, and nitrogen is used as the collision gas.
3.9. Database Search to Match Protein Identities Database searches were conducted using the MASCOT search engine (http://www.matrixscience.com). For database search, known contamination peaks, such as keratin and autoproteolysis peaks, were removed prior to database search. Protein identification was performed using the MASCOT software (Matrix Science, London, UK), and all tandem mass spectra were searched against the NCBInr database, with mass accuracy of within 200 ppm for mass measurement, and within 0.5 Da for MS/MS tolerance window. Searches were performed without constraining the protein molecular weight (Mr) or isoelectric point (pI) and species, and allowing for carbamidomethylation of cysteine and partial oxidation of methionine residues. Up to one missed tryptic cleavage was considered for all tryptic-mass searches. Protein scores greater than 75 are considered to be significant (p < 0.05). 3.10. Experimental Example: Differential Protein Profiles between HER-2/neu Positive and -Negative Breast Tumors We dissected the tumor cells from two different subtypes of breast tumors and compared their protein profiles, based on the protocols described above. Figure 2 shows the LCM-dissected tumor cell protein patterns visualized by silver staining. It should be noted that pooled protein samples from different cases of the same tumor subtypes were used for 2-D GE. This gel-based protein visualization technique requires high amount of proteins, and thus more sensitive detecting reagents and protein identification strategies had to be developed to produce meaningful results (see Notes 15 and 16). Using
88
Zhang and Koay
the silver-staining protocol, we identified 500–600 protein spots in the protein profiles generated by coupling LCM and 2-D GE. Protein spots of interest would be excavated and digested with trypsin (Promega), desalted with ZipTipc18 (Millipore), and analyzed using MALDI-TOF/TOF tandem mass spectrometry. Protein identities, as shown in Fig. 2, are obtained by searching the NCBInr databases using the MASCOT software (Matrix Science).
4. Notes 1. All the chemical solutions should be filtered by passing them through filter paper (Cat No. 1001 150, Whatman® , Whatman International Limited, Springfield Mill, Maidstone, Kent, England) to minimize precipitates occurring onto the gels during silver staining. 2. Tissue processors in standard histopathology laboratories generally include formalin fixation as the first step in the paraffin infiltration procedure. It is important to avoid these steps when processing tissues intended for molecular gene and proteome profiling. 3. Consistent LCM transfers have been demonstrated from 5–10 μm thick paraffinembedded tissue sections. For a successful LCM transfer, the strength of the bond between polymer film and targeted tissue must be stronger than that between the tissue and the underlying glass slide. Therefore, for most tissue types, sections should be collected with uncharged glass slides. To prevent cross-contamination while sectioning, residual paraffin and tissue fragments should be wiped off from the area of the sectioning blade with xylenes between consecutive slides. If possible, a fresh microtome blade should be used to section a different block. 4. In our hands, hematoxylin and eosin are best reduced to 10% of their standard concentrations used for routine histomorphological work, when applied to slides prepared for LCM. Breast tumor cells can be clearly visualized and identified from other cell types, without influencing the procurement of tumor cells by LCM, with this modification. Minimum staining also improves macromolecular recovery during cellular protein extraction. 5. Complete dehydration and air drying of sections are the main factors influencing the efficiency of LCM. Prolonged air drying or presence of moisture in the sections appears to inhibit, at least partially, the transfer of cells to the plastic firm. 6. If the investigators have less experience in checking cancer tissue sections, we strongly recommend that investigators consult with the pathologists in their institutions to get assistance in identifying the target cell types that will be microdissected using LCM. It is essential to avoid contamination of other cell types, or dissecting the wrong cells. 7. During microdissection, make sure that there are no irregularities on the tissue surface in or near the area to be microdissected. It should also be noted that wrinkles can elevate the LCM cap away from the tissue surface and decrease the
Combining LCM with 2-D Gel Electrophoresis
8.
9.
10.
11.
12.
89
membrane contact during laser activation. Use an adhesive pad after microdissection to remove cells that may have attached non-specifically to the LCM cap. A cap-alone control is recommended for each experiment to ensure that non-specific transfer is not occurring during microdissection. The cap should be processed together with other tissue-containing caps and serves as a negative control. For protein separation by 2-D GE, 20 to 30 sections from each tissue sample are dissected, depending on the percentage of targets cells in the full sections. Generally, 2300–2700 laser pulse shots are used for each cup. Cells from at least 50,000 shots (spot diameter is 15 μm) are required for each 18-cm gel. Up to 15 mg of proteins can be solubilized with 500 μl of the sample rehydration buffer, but with our breast tumor tissue samples, we usually reconstitute 1–2 mg of extracted proteins in 500 μl, or 2–4 mg/ml. It is recommended that the reconstituted proteins be stored in appropriate aliquots, and that only the required number of aliquots needed for the experiment at hand be removed at any time, to avoid repeated freezing and thawing the peptides, which will lead to sample deterioration. IEF is performed using Ettan™ IPGphor™ IEF electrophoresis unit. Rehydration loading of protein samples is used in the authors’ laboratory. The IPG strips for first-dimensional separation are commercially available, and can be procured from GE Healthcare and other suppliers. IPG strips with various pH gradients and dimensions are available. They are used for protein separation with appropriate resolution needed. The strips should be kept frozen at –20°C, and thawed just before use. The IEF conditions are dependent on the pH range. Reference to the manufacturer’s protocol is recommended. For alkali pH loading, cup loading is a must, and DTT in the rehydration buffer should be replaced by other reducing agents, such as hydroxyethyl-disulfide (HED) reagent (Destreak, GE Healthcare). It is essential to equilibrate the strips before being applied for the seconddimension gel electrophoresis (2-D SDS-PAGE). DTT added to buffer A will reduce the disulfide bonds whereas IAA in buffer B will alkylate the formed sulfydryl groups of proteins. This is to prevent re-oxidation of sulfydryl groups and streaking of spots during 2-D SDS-PAGE. Further, the presence of SDS makes the proteins negatively charged and suitably primed for SDS-PAGE. Use the best quality SDS available for sample and running buffers that include SDS in their formulation. We recommend C12 Grade SDS from Pierce (Rockford, IL, USA). When placing the strips on top of the gel, ensure that the plastic backing of the strips is in contact with the glass wall. If necessary, the strips can be trimmed properly. When adding agarose sealing solution, make sure that there are no air bubbles trapped between the IEF strip and 2-D gel. Wash the gels thoroughly and repeatedly, as recommended, prior to the development step and during the development step itself, to get clear stained gels. During the development of the gels, formaldehyde should be added prior to use,
90
13.
14.
15.
16.
Zhang and Koay and the suggested concentration should be followed strictly to avoid interference during MALDI-TOF analysis. During the developing stage, the gel should be constantly shaken to reduce the background. The developing time depends on the total amount of protein that is used for 2-D separation. With a higher amount of protein, a shorter developing time can be used, without compromising the aim of visualizing the maximum number of protein spots. It is important to manually verify spot detection and matching, as the variations in gel resolution, staining, gel background, and automatic image analysis may not correctly define the spot contours in every case. This variability and the complexity of 2-D gel patterns hinder the accurate matching of analogous spots in different gels. In our experience, approximately 500 to 600 distinct proteins from the dissected breast tumor cells can be visualized on 2D-PAGE stained with silver. On average, we can extract approximately 4–6 μg of total cellular proteins from 2500 laser pulses. Our experience is that silver staining of LCM-dissected cell proteins is a sufficiently sensitive tool for isolating and identifying the dysregulated cellular proteins of high or moderate abundance. However, for the dysregulated proteins of low abundance, the lower detection limit of this technology would have to be enhanced by other techniques such as 125-iodine labeling or biotinylation and fluorescent dye labeling. In addition, the use of scanning immunoblotting with class-specific antibodies, for example, would allow sensitive detection of specific subsets of proteins, e.g., all known proteins involved with cell-cycle regulation. Protein identification by MALDI-TOF, LC-MS/MS, or other techniques is also limited by the requirement of a minimal protein input amount, which is often not attainable from certain types of biopsy samples. A useful strategy to improve protein identification is to produce parallel “diagnostic” fingerprints derived from microdissected cells and “sequencing” the fingerprints generated from the whole tissue section from each case. Alignment of the diagnostic and sequencing 2D gels permits determination of the proteins of interest for subsequent mass spectrometry or N-terminal sequence analysis.
Acknowledgments The Tumor Repository of the National University Hospital, Singapore, provided the clinical breast cancer frozen tissues for LCM. The use of the PixCell II LCM system was courtesy of the Department of Pathology, Yong Loo Lin School of Medicine, National University of Singapore (NUS). This work was supported by an Academic Research Fund from the NUS (Grant No. R-179-000-032) to the authors.
Combining LCM with 2-D Gel Electrophoresis
91
References 1. Zhang, D., Tai, L. K., Wong, L. L., Sethi, S. K., Koay, E. S. (2005) Proteomics of breast cancer: enhanced expression of cytokeratin 19 in human epidermal growth factor receptor type 2 positive breast tumors. Proteomics 5, 1797–1805. 2. Neubauer, H., Clare, S. E., Kurek, R., Fehm, T., Wallwiener, D., Sotlar, K., et al. (2006) Breast cancer proteomics by laser capture microdissection, sample pooling, 54-cm IPG IEF, and differential iodine radioisotope detection. Electrophoresis 27, 1840–1852. 3. Lawrie, L. C., Curran, S., McLeod, H. L., Fothergill, J. E., Murray, G. I. (2001) Application of laser capture microdissection and proteomics in colon cancer. J. Clin. Pathol: Mol. Pathol. 54, 253–258. 4. Ai, J., Tan, Y., Ying, W., Hong, Y., Liu, S., Wu, M., et al. (2006) Proteome analysis of hepatocellular carcinoma by laser capture microdissection. Proteomics 6, 538–546. 5. Ahram, M., Flaig, M. J., Gillespie, J. W., Duray, P. H., Linehan, W. M., Ornstein, D. K., et al. (2003) Evaluation of ethanol-fixed, paraffin-embedded tissues for proteomic applications. Proteomics 3, 413–421. 6. Greengauz-Roberts, O., Stoppler, H., Nomura, S., Yamaguchi, H., Goldenring, J. R., Podolskym R. H., et al. (2005) Saturation labeling with cysteine-reactive cyanine fluorescent dyes provides increased sensitivity for protein expression profiling of laser-microdissected clinical specimens. Proteomics 5, 1746–1757. 7. Zang, L., Palmer-Toy, D., Hancock, W. S., Sgroi, D. C., Karger, B. L. (2004) Proteomic analysis of ductal carcinoma of the breast using laser capture microdissection, LC-MS, and 16 O/18 O isotopic labeling. J. Proteome Res. 3, 604–612. 8. Li, C., Hong, Y., Tan, Y. X., Zhou, H., Ai, J. H., Li, S. J., et al. (2004) Accurate qualitative and quantitative proteomic analysis of clinical hepatocellular carcinoma using laser capture microdissection coupled with isotope-coded affinity tag and two-dimensional liquid chromatography mass spectrometry. Mol. Cell. Proteomics 3, 399–409. 9. Zhang, D., Tai, L. K., Wong, L. L., Chiu, L. L., Sethi, S. K., and Koay, E. S. (2005) Proteomic study reveals that proteins involved in metabolic and detoxification pathways are highly expressed in HER-2/neu-positive breast cancer. Mol. Cell. Proteomics 4, 1686–1696. 10. Cowherd, S. M., Espina, V. A., Petricoin, E. F. III, Liotta, L. A. (2004) Proteomic analysis of human breast cancer tissue with laser-capture microdissection and reverse-phase protein microarrays. Clin. Breast Cancer 5, 385–392. 11. Gulmann, C., Espina, V., Petricoin, E. III, Longo, D. L., Santi, M., Knutsen, T., et al. (2005) Proteomic analysis of apoptotic pathways reveals prognostic factors in follicular lymphoma. Clin. Cancer Res. 11, 5847–5855.
6 Optimizing the Difference Gel Electrophoresis (DIGE) Technology David B. Friedman and Kathryn S. Lilley
Summary Difference gel electrophoresis (DIGE) technology has been used to provide a powerful quantitative component to proteomics experiments involving 2D gel electrophoresis. DIGE combines spectrally resolvable fluorescent dyes (Cy2, Cy3, and Cy5) with sample multiplexing for low technical variation, and uses an internal standard methodology to analyze replicate samples from multiple experimental conditions with unsurpassed statistical confidence for 2D gel-based differential display proteomics. DIGE experiments can facilely accommodate sufficient independent (biological) replicate samples to control for the large interpersonal variation expected from clinical samples. The use of multivariate statistical analyses can then be used to assess the global variation in a complex set of independent samples, filtering out the noise from technical variation and normal biological variation thereby focusing on the underlying variation that can describe different disease states. This chapter focuses on the design and implementation of the DIGE methodology employing the use of a pooled-sample internal standard in conjunction with the minimal CyDye chemistry. Notes are also provided for the use of the alternative saturation labeling chemistry.
Key Words: difference gel electrophoresis; two-dimensional gel electrophoresis; quantification.
1. Introduction Human disease phenotypes are a direct result of protein expression and modification. In many cases, such phenotypes cannot be tied directly to a single alteration in the genome or resulting proteome, but are likely to be the result From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols Edited by: A. Vlahou © Humana Press, Totowa, NJ
93
94
Friedman and Lilley
of multiple factors. Studying disease at the protein level is challenging, but as proteins are the mediators of phenotype, the study of protein abundance on a global scale is required to gain a more complete understanding of the underlying molecular mechanisms of disease. Proteomics in the clinical setting is rapidly developing and is having a major impact on the way in which diseases will be diagnosed, treated, and monitored (1). It has been estimated that there could be hundreds of thousands of different protein isoforms in a mammalian cell, but the vast dynamic range of protein abundance results in only the most abundant species of proteins being observable by quantitative proteomics approaches unless technically variable biochemical or subcellular fractionation is employed. The repertoire of techniques and associated hardware, which is now applied to this field, is expanding exponentially, and although a complete visualization of the proteome is still beyond reach of any single technique, each technology platform can provide complementary datasets. Difference gel electrophoresis (DIGE) has proven to be a powerful quantitative technology for differential display proteomics on a global level, where the individual abundance changes for thousands of intact proteins can be simultaneously monitored in replicate samples over multiple variables with statistical confidence (see Note 1). This includes quantitative information on protein isoforms that arise due to post-translational modifications (such as acetylation or phosphorylation), which result in a change in the isoelectric point of the protein. This also includes splice variants and the results of protein processing, all of which are resolved for individual quantification and subsequent analysis by MS. DIGE is based on conventional 2D gel technology that is capable of resolving several thousands of intact proteins first by charge using isoelectric focusing (IEF) and then by apparent molecular mass using SDS-polyacrylamide gel electrophoresis (PAGE) (6,7) (see Note 2 and Chapters 4 and 5 by Cho et al. and Zhang et al., respectively). Importantly, DIGE overcomes many of the limitations commonly associated with 2D gels such as analytical (gel-to-gel) variation and limited dynamic range that can severely hamper a quantitative differential display study. This is accomplished using up to three spectrally resolvable fluorescent dyes (Cy2, Cy3, and Cy5, referred to as CyDyes) that enable low- to subnanogram sensitivities with >104 linear dynamic range, and then by multiplexing the prelabeled samples into the same analytical run (2D gel). Multiplexing in this way allows for direct quantitative measurements between the samples coresolved in the same gel, and is therefore beyond the limitations imposed by between-gel comparisons with conventional 2D gels. The highest statistical power of this multiplexing approach stems from the utilization of a pooled-sample internal standard comprised of an equal aliquot
Optimizing DIGE Technology
95
of every sample in the experiment (see Subheading 1.2.1). With this method, two dyes (Cy3 and Cy5) are used to individually label two independent samples from a much larger experiment, and the Cy2 dye is used to label an internal standard, which is comprised of an equal aliquot of proteins from every sample in the experiment. This pooled-sample internal standard is labeled only once in bulk to avoid additional technical variation, and enough is made and labeled to allow for an equal aliquot to be coresolved on each gel. The three differentially labeled samples are then coresolved on the same 2D gel, after which direct measurements can be made for each resolved protein using the spectrally exclusive dye channels without interference from technical variation of the separation (gel-to-gel variation). Rather than making direct quantitative measurements between the two samples in the gel, the measurements are instead made relative to the Cy2 signal for each resolved protein. The Cy2 signal should be the same for a given protein across different gels because it came from the same bulk mixture/labeling; therefore, any difference represents gel-to-gel variation, which can be effectively neutralized by normalizing all Cy2 values for a given protein across all gels. Using the Cy2 signal to normalize ratios between gels then allows for the Cy3:Cy2 and Cy5:Cy2 ratios for each protein within each gel to be normalized to the cognate ratios from the other gels, encompassing all samples. Each gel may contain different (and/or replicate) samples in the Cy3 and Cy5 channels, but all samples can be quantified relative to each other because each protein from each sample is measured to the cognate Cy2 signal from the internal standard present on each gel. With the use of sufficient replicates, a plethora of advanced statistical tests can be applied, which can highlight proteins of interest whose change in expression is related to the disease state under investigation. Since the technical noise is low, these vital replicates should be independent (biological) replicates as most of the observed variations will be clinical sample related rather than technical or experimental related. In a final step, specific proteins of interest are then identified using standard mass spectrometry (MS) approaches on gel-resolved proteins that have been excised and proteolyzed into a discrete set of peptides. Briefly, excised proteins are subjected to in-gel digestion with trypsin protease (typically), and MS is used to acquire accurate mass determinations on the resulting peptides, as well as fragmentation on individual peptides. The mass spectral data are then used to identify statistically significant candidate protein matches through sophisticated computer search algorithms that compare the observed MS data with theoretical peptide masses (using data generated by peptide mass fingerprinting) or collision-induced fragmentation patterns (obtained from tandem MS) generated in silico from protein sequences present in databases. (see Chapter 19 by Fitzgibbon et al.).
96
Friedman and Lilley
1.1. Optimizing Sensitivity and Resolution There are currently two forms of CyDye labeling chemistries available: minimal labeling involving the use of N-hydroxy succinimidyl (NHS) ester reagents for low-stoichiometry labeling of proteins largely via lysine residues, and saturation labeling, which utilize maleimide reagents for the stoichiometric labeling of cysteine sulfhydryls. The most established DIGE chemistry is the “minimal labeling” method, which has been commercially available since July 2002. Here the CyDye DIGE fluors are supplied as NHS esters, which react with the -amine groups of lysine side chains. The three fluors are mass matched (ca. 500 Da), and carry an intrinsic +1 charge to compensate for the loss of each proton-accepting site that becomes labeled (thereby maintaining the pI of the labeled protein). Each dye molecule also adds a hydrophobic component to proteins, which along with MW influences how proteins migrate in SDS-PAGE. Minimal labeling reactions are optimized such that only 2–5% of the total number of lysine residues are labeled, such that on average a given labeled protein would contain only one dye molecule. This is necessary because lysine is an abundant amino acid, and multiple labeling events may affect the hydrophobicity of some proteins such that they may no longer remain soluble under 2DE conditions. Although a given protein form may exhibit specific labeling efficiencies, these will be the same for labeling with all three dyes, allowing for direct relative quantification. Minimal labeling with CyDye DIGE fluors is very sensitive, comparable to silver-staining or postelectrophoretic fluorescent stains such as Sypro Ruby, Deep Purple or Flamingo Pink (ca. 1 ng), but with a linear response in protein concentration over five orders of magnitude (8)(see Note 3). For maleimide labeling of the cysteine sulfhydryls, the overall lower cysteine content in proteins allows for labeling of these residues to saturation without increasing the overall hydrophobicity of the proteins to cause insolubility problems. Saturation labeling is ultimately more sensitive (150–500 picograms, and even more so for proteins with high cysteine content). Its use is not as commonplace, most likely due to the availability of only Cy3 and Cy5 with this chemistry (see Note 4), the fact that it is blind to the small but significant population of noncysteine containing proteins, and the additional optimization of complete cysteine reduction necessary for reproducible labeling. For these reasons, saturation DIGE is usually reserved for experiments where samples are limited, where the advantage of the increased sensitivity outweigh these additional considerations. To maximize the information that can be gained from DIGE experiments, it is imperative that resolution of protein species within gels is optimized. Although single 2DE runs can resolve proteins with pI ranges between pH 3 and 11, and
Optimizing DIGE Technology
97
apparent molecular mass ranges between 10 and 200 kDa, higher resolution and sensitivity can be obtained by running a series of medium range (e.g., pH 4–7, 7–11) and narrow range (e.g., pH 5–6) IEF gradients with increasing protein loads, leading to an overall more comprehensive proteomic analysis (6,7,10). (see Note 5). This is analogous to gaining increased resolution and sensitivity in an LC/MS-based strategy by using multiple high performance liquid chromatography columns with different affinity chemistries [e.g., MuDPIT (12)]. Much of the sensitivity limitation associated with 2D gels can be attributed to the analysis of unfractionated, whole-cell and whole-tissue extracts. Additional sensitivity can be gained via enrichment for the proteins of interest, such as by analyzing prefractionated or subcellular samples, or immune complexes. However, the additional experimental manipulations required for prefractionation introduce more technical variation into the samples and necessitates increased independent (biological) replicates (which can be accommodated with the DIGE internal standard methodology). The identification of proteins of interest using MS can be performed directly from the DIGE gels when protein amounts have been optimized in this way (see Subheading 3.5). Alternatively, some experimental approaches perform DIGE analysis using “analytical” gels with lower protein amounts, followed by protein excision from a secondary, “preparative” gel with higher protein amounts. This approach has its advantages when dealing with small sample amounts, such is often the case using the saturation dye chemistries, but is also prone to uncertainties that arise due to the disproportionate amount of protein loading (see Note 6). The methods presented in this protocol are for optimization of both the DIGE data as well as material for subsequent MS using high protein loads. 1.2. Optimizing Statistical Significance 1.2.1. Using the Internal Standard The ability to coresolve and compare two or three samples in a single gel is attractive, because it allows for direct relative quantification for a given protein without any interference from gel-to-gel variations in migration and resolution, removing the need for running replicate gels for each sample (similar to stable isotope LC/MS-based strategies, see Chapter 10). This approach has limited statistical power, however, since confidence intervals are determined based on the overall variation within a population (see Subheading 3.6.2). Many researchers new to DIGE technology are not immediately aware of the increased statistical advantage and multiplexing capabilities of DIGE when combining this approach with a pooled-sample mixture as an internal standard for a series of coordinated DIGE gels (13). This design will allow for repetitive measurements (vital to any type of experimental investigation), and in
98
Friedman and Lilley
such a way as to control both for gel-to-gel variation and provide increased statistical confidence. In this way, statistical confidence can be measured for each individual protein based on the variance of repetitive measurements, independent of the variation in the population. Incorporating independently prepared replicate samples into the experimental design also controls for unexpected variation introduced into the samples during sample preparation. This more complex and statistically powerful experimental design is accomplished by using one of the three dyes (usually Cy2) to label an internal standard, which is comprised of equal aliquots of protein from all of the samples in an experiment. The total amount of the Cy2-labeled internal standard is such that an equal aliquot can be coresolved within each DIGE gel that also contains an individual Cy3- and Cy5-labeled sample from the experiment. Since this standard is composed of all of the samples in a coordinated experiment, each protein in a given sample should be represented in the standard and thus have its own unique internal standard (see Note 7). Direct quantitative comparisons are made individually for each resolved protein between the Cy3- or Cy5labeled samples and the cognate protein signal from the Cy2-labeled standard for that gel (without interference from gel-to-gel variation) and results in the calculation of a standardized abundance for every spot matched across all gels within a multigel experiment. The individual signals from the internal standard are also used to normalize and compare between each in-gel direct quantitative comparison for that particular protein from the other gels. Using the Cy2-labeled standard in this fashion, therefore, allows for more precise and complex quantitative comparisons between gels, including independent (biological) sample repetition (Fig. 1). Importantly, the internal standard experimental design allows for the identification of significant changes that would not have been identified if the analyses were performed separately, even when using Cy3- and Cy5-labeled samples on the same DIGE gel (14). This experimental design also allows for multivariable analyses to be performed in one coordinated experiment, whereby statistically significant abundance changes can be quantitatively measured simultaneously between several sample types (e.g., different genotypes, drug treatments, or disease states), with repetition and without the necessity for every pairwise comparison to be made within a single DIGE gel (15,16) (see Note 8 and Chapter 17 by Carpentier et al.). 1.2.2. Assessing Intersample Variation Clinical proteomics is hampered by the significant variation associated with patient samples. The largest proportion of this variation comes from biological diversity, but a significant amount may also come from variable collection
Optimizing DIGE Technology
99
Fig. 1. Illustration of DIGE and experimental design using the mixed-sample internal standard. (A) Representative gel from a six-gel set containing three differentially labeled samples: Cy2-labeled internal standard, Cy3-labeled sample #1, and Cy5labeled sample #2. The individual protein forms all coresolve in this one gel, but these three independently labeled populations of proteins can be individually imaged using mutually exclusive excitation/emission properties of the CyDyes. (B) Schematic of the sample loading matrix indicating gel number, CyDye labeling and three replicates (indicated as “1, 2, and 3”) of the four conditions being tested (A, B, C, D). Within the boxed regions representing each labeled sample is depicted a theoretical protein that is upregulated in condition D. Dotted lines illustrate how the protein signals from each sample are directly quantified relative to the Cy2 internal standard signal for that protein without interference from gel-to-gel variation, and how the Cy3:Cy2 and Cy5:Cy2 intragel ratios are normalized between the six gels. (C) A graphical representation of the normalized abundance ratios for this theoretical protein change. Adapted from (10).
and storage of biological samples. It is of vital importance to identify changes in protein abundance that are disease specific rather than patient or sample specific. In order to gain the more robust data sets necessary to be able to draw accurate conclusions from clinical proteomics studies, it is, therefore, necessary to collect and store samples using very stringent and closely adhered to
100
Friedman and Lilley
protocols. It is also necessary to assess the biological variation within the population being tested and also within a single individual. Interindividual variation has been the focus of several studies (17,18) and determining a typical diversity within a single patient (i.e., taking longitudinal samples and assessing variability in protein abundance) and between patients will determine the minimum number of patient samples required for an experiment. This is an essential step before embarking on any large-scale and potentially costly DIGE experiment. Without this type of pretest, the results of underpowered experiments run the risk of being peppered with false information (both false positives and negatives). As with all complex technologies, the DIGE technique itself is subjected to technical variation, which will be laboratory specific to a greater or lesser extent. However, the amplitude of this variation is generally outweighed by the biological variation associated with a typical sample set (19).
1.2.3. Univariate Statistical Analyses To date, the majority of published quantitative proteomics studies using the DIGE technology have applied a univariate test, such as a Student’s t-test or analysis of variance (ANOVA), to identify protein species with significant changes in expression [(20) and Chapter 17 by Carpentier et al.]. These tests calculate the probability (p) that the samples being compared are the same and therefore any apparent change in expression occurs by chance alone. Typically an expression change is considered significant if the calculated p-value falls below a prescribed significance threshold, typically 0.05 (whereby 1 in 20 tests may give a change in expression by chance). For more stringent analyses, a p-value of 0.01 is often used as the significance threshold. When employing these tests on DIGE datasets, there are several factors that must be considered if correct assumptions are to be made from ensuing analyses. Student’s t-tests and ANOVA assume that the data achieved is normally distributed and that any variance is homogeneous. The measurement and correction of systematic bias within DIGE experiments have been the subject of several studies, which chart methods to optimize normalization of data sets (21,22,23). Another important consideration is that of false discovery rate (FDR), which could arise as a result of statistical tests such as the ones described above. These tests involve the simultaneous and independent testing of thousands of spots. The probability of a false positive being recorded for each test is such that a substantial number of false positives may accumulate. There are several approaches to determine the FDR and adjust p-scores to compensate for this,
Optimizing DIGE Technology
101
the most widely used to date being the Benjamini and Hochberg method, whose use in conjunction with DIGE data has been described by Fodor et al. (21). 1.2.4. Multivariate Statistical Analyses Discovery phase proteomics often produce large lists of proteins that are identified as changing significantly in the experiment, many of which may well be false positives. Another approach to overcome these is the application of additional multivariate statistical analyses to these datasets, which can help to filter out false positives that result from whole sample outliers (i.e., sample misclassification and/or poor sample preparation technique). These analyses, such as principle components analysis (PCA), partial least squares discriminate analysis, and unsupervised hierarchical clustering (HC) (see Figs. 2 and 3 and Chapter 16 by Marengo et al.) have recently been applied to DIGE datasets [(10,24,25,26,27,28,29,30,31,32)]. Raw and normalized data can be exported from most DIGE software solutions (e.g., DeCyder, Progenesis), and several multivariate analyses are now part of an extended data analysis (EDA) software module as part of the DeCyder suite of software tools (GE Healthcare), which was specifically developed for DIGE analysis (see Subheading 3.6). These multivariate analyses work essentially by comparing the expression patterns of all (or a subset of) proteins across all samples, using the variation of expression patterns to group or cluster individual samples. Technical noise (poor sample prep, run-to-run variation) and biological noise (normal differences between samples, especially present in clinical samples) are almost always
Heme
–Fe
control
PC2
Δfur
PC1 Fig. 2. Illustration of the use of principle component analysis. DIGE was used to analyze changes in Staphylococcus proteins in response to genetic and chemical alterations affecting iron utilization. Adapted from (24).
102
Friedman and Lilley
Fig. 3. Hierarchical clustering (by average distance correlation) of representative novel circadian proteins detected by 2D DIGE of soluble protein extracts from mouse liver. Pale gray represents low levels of protein expression, black represents intermediate levels, and dark gray represents high levels of expression. Adapted from (32).
associated with any analytical dataset of this nature, and may well override any variation that arises due to actual differences related to the biological questions being tested. Unsupervised clustering of related samples, therefore, adds additional confidence that a “list of proteins” changing in a DIGE experiment are not arising stochastically (10).
Optimizing DIGE Technology
103
1.3. DIGE in the Clinical Setting Although the potential for DIGE to address clinical studies is only beginning to be addressed [for example, see (29,30)], many studies have been published demonstrating the feasibility and benefit of DIGE/MS using small patient cohorts for preliminary studies in colon (14), liver (33,34,35), breast (36,37), esophageal (38,39), and pancreatic cancers (40), as well as other important clinical studies such as Severe Acute Respiratory Syndrom (SARS) (41). Many studies also explore the important benefit of procuring samples using laser capture microdissection (LCM – see Chapters 3, 5, and 9 by Diaz et al., Zhang et al., and Mustafa et al., respectively) for a highly enriched population of the cells under study (16,30,42,43,44). These LCM studies necessitate the use of the saturation chemistry owing to the increased sensitivity but limited multiplexing power, and typically require secondary preparative gels with higher protein loads to enable protein identification by MS. The study of Suehara et al. (29) represents the utility of a multivariable DIGE/MS analysis with an extended sample set pertinent for a clinical study. Eighty soft tissue sarcoma samples comprising seven different histological backgrounds were analyzed. Using the saturation DIGE fluors, individual samples were labeled with Cy5 and multiplexed with a pooled-sample internal standard (labeled in bulk with Cy3) for each DIGE gel. Using high-resolution 2D gel separations and a combination of multivariate statistical tools (support vector machines, leave-one-out cross-validation, PCA, and HC), these studies identified a small subset of proteins including tropomyosin and HSP27 that were able to discriminate between the different classes of tumors. HSP27 in particular was part of a subclass of discriminating proteins that could distinguish between leiomyosarcoma and malignant fibrous histiocytoma (MFH), as well as correlate with patient survival between low-risk and high-risk groups. HSP27 has long been associated with prognosis in MFH as well as in other human carcinomas (45). 2. Materials This chapter assumes a solid understanding in 2D gel electrophoresis and will focus on the design and implementation of the DIGE method using the pooled-sample internal standard methodology and the minimal dye chemistry for Cy2, Cy3, and Cy5, with notes provided for saturation labeling chemistry. 2.1. Cell Lysis Buffers 1. TNE: 50 mM Tris–HCl pH 7.6, 150 mM NaCl, 2 mM EDTA pH 8.0, 2 mM DTT, 1% (v/v) NP-40.
104
Friedman and Lilley
2. RIPA buffer: 50 mM Tris–HCl pH 8.0, 150 mM NaCl, 1% NP-40, 0.5% deoxycholic acid, 0.1% SDS. 3. Two-dimentional gel electrophoresis lysis buffer: 7 M urea, 2 M thiourea, 4% CHAPS, 2 mg/mL DTT, 50 mM Tris–HCl pH 8.0. 4. ASB14 lysis buffer: 7 M urea, 2 M thiourea, 2% amidosulfobetaine 14, 50 mM Tris–HCl pH 8.0.
NB: depending on the sample, it may also be necessary to add protease inhibitors and phosphatase inhibitors [sodium pyrophosphate (1 mM), sodium orthovanadate (1 mM), beta-glycerophosphate (10 mM) and sodium fluoride (50 mM)] to the chosen lysis buffer (see Subheading 3.1). 2.2. SDS-Polyacrylamide Gel Electrophoresis 1. Immobilized pH gradient (IPG) strips and accompanying ampholyte mixures can be purchased from a number of commercial vendors. Strip lengths vary from 7 cm to high-resolution 24 cm strips, and pH ranges vary from wide-range (e.g., pH 3–11) to high-resolution narrow-range (e.g., pH 5–6) strips. 2. Bind silane working solution (50 mL): 40 mL ethanol, 1 mL acetic acid, 50 μL bind silane solution (GE Healthcare), 9 mL water (see Note 9). 3. 4× separating gel buffer. 1.5 M Tris-base pH 8.8. 4. 30% acrylamide:bis-acrylamide (37.5:1), N,N,N,N´-tetramethyl-ethylenediamine, and ammonium persulfate. 5. 10× SDS-PAGE running buffer (1 L): 30.25 g Tris-base, 144.13 g glycine, 10 g SDS (0.1%). 6. Fixing solution for SyproRuby staining (1 L): 100 mL methanol, 70 mL acetic acid, 830 mL water. SyproRuby stain is available form several commercial sources and can be substituted by other total protein stains, such as Deep Purple (GE Healthcare) or Flamingo Pink (BioRad). 7. Two-dimensional equilibration buffer: 6 M urea, 50 mM Tris-base pH 8.8, 30% glycerol, 2% SDS, trace bromophenol blue. 8. Water-saturated butanol (see Note 10). 9. Dithiolthreitol (store dessicated). 10. Iodoacetamide (store dessicated, keep in the dark).
2.3. DIGE Labeling Materials 1. N,N-dimethyl formamide (DMF) (see Note 11). 2. Labeling (L) buffer: 7 M urea, 2 M thiourea, 4% CHAPS, 30 mM Tris-base (do not pH, but ensure that pH of final solution is between 8.0 and 9.0), 5 mM magnesium acetate (see Note 12). Alternatively, 4% CHAPS can be replaced with 2% ASB14, especially in cases where membrane rich samples are being utilized. 3. Rehydration (R) buffer: 7 M urea, 2 M thiourea, 4% CHAPS, 2 mg/mL DTT (13 mM; 2%).
Optimizing DIGE Technology
105
4. Cyanine dyes with NHS-ester chemistry for minimal labeling (Cy2, Cy3, and Cy5), and with maleimide chemistry for saturation labeling (Cy3 and Cy5) are available from GE Healthcare as dry solids. 5. Quenching solution (for minimal labeling): 10 mM lysine. 6. Dithiothreitol reduction stock solution: 200 mg/mL DTT.
3. Methods The DIGE is a powerful technique for quantitative multivariable differential display proteomics. However, the quality of the data will only be as good as the quality of the underlying 2D gel electrophoresis technology upon which it is based. The main focus of this chapter is to provide detailed notes on the DIGE technology; however, some key considerations to successful high-resolution 2D gel electrophoresis are also provided. This section describes methods associated with labeling using minimal CyDyes. 3.1. Sample Preparation The key to success for any analytical measurement begins with robust sample preparation. This not only includes the buffers and materials used, but also the nature of the samples and the way in which they are procured. The addition of exogenous materials (such as DNAse, RNAse), or allowing for uncontrolled manipulation of the sample (such as conditions that may lead to proteolysis) can severely hamper and sometimes completely prevent an analysis. Care should be taken to ensure against common laboratory contaminants (e.g., mycoplasma for tissue culture) that if present may be detected as significant changes using DIGE, either due to the presence in a subset of samples, or by responding to the experimental perturbation. 1. Prepare protein extracts using any method of preference. The appropriate amount of protein can be subsequently precipitated prior to resuspension in the CyDye labeling buffer (see Subheading 3.2). Ensure against proteolysis and loss of post-translational modifications (e.g., phosphorylation) as this is of monumental importance. Care should be taken not to use reagents that will resolve on the 2D gel, such as soybean trypsin inhibitor. Small molecule inhibitors such as aprotinin, leupeptin, pepstatinA, antipain, 4 - (20aminoethyl) benzenesulfonyl fluoride hydrochloride (AEBSF), sodium orthovanadate, okadaic acid, and microcystin, among others, are far better choices. 2. Lyse cells using standard lysis buffers such as TNE and RIPA buffers, or even the buffers used for 2D gel electrophoresis.
106
Friedman and Lilley All of these buffers have the capability of producing high-resolution samples for 2DE. In most cases, the presence of reagents that would otherwise interfere with CyDye labeling (such as those that contain primary amines) will be removed prior to labeling by protein precipitation (see Subheading 3.2).
3. Sonicate cells if necessary to improve sample quality. Sonication improves sample quality by disrupting nucleic acids, which are subsequently removed by sample cleanup (see Subheading 3.2) along with phospholipids. Both of these nonproteinaceous ionic components can obliterate the resolution during IEF. Short bursts with a tip-sonicator are suggested. It is important to keep the system chilled, especially in the presence of urea-containing samples that should never be heated (see Note 12). 4. Determine the protein concentration of the sample using a system that is compatible for the buffer that the proteins are extracted in. CHAPS and thiourea in the buffers used for DIGE, although adequately chaotropic, interfere with either the Bradford or bicinchoninic acid assays, making the data inaccurate and unreliable. In these cases, aliquots should be precipitated prior to quantification in a suitable buffer, or the use of a detergent compatible assay should be utilized. 5. Aim to use a protein concentration between 1 and 10 mg/mL. Too dilute and it will be difficult to quantitatively recover proteins following precipitation cleanup (see Subheading 3.2); too concentrated and it will be difficult to accurately dispense the appropriate volume for the experiment. Freeze/thawing should also be kept to a minimum; freezing samples in 1 mL aliquots or less will usually suffice.
3.2. Sample Cleanup The desired amount of sample to be used in the experiment should be precipitated prior to labeling. This removes both nonproteinaceous ions from the sample (e.g., nucleic acids, phospholipids) that can interfere with IEF, as well as transfers the proteins into a labeling buffer optimized for CyDye labeling and subsequent IEF. Determine how much total protein will be on each gel, and precipitate ½ of that amount for each sample to be run on that gel. This is straightforward for a two-component separation, but also works out for the multigel experiments where 1/3 of the total protein amount on each gel comes from the pooled-sample internal standard (see Table 1.) Precipitate only what is needed for each sample for the experiment; too much material may create pellets that are difficult to resolubilize completely.
107
150 μg 24 μL 16 μL
150 μg 24 μL 16 μL
Control-2 150 μg 24 μL 16 μL
Treated-2
2 μL 2 μL 2 μL 30 min on ice in the dark 2 μL 2 μL 2 μL 2 μL 10 min on ice in the dark 20 μL 20 μL 20 μL 20 μL For each gel, combine the quenched Cy3-and Cy5-labeled quenched Cy2-labeled pooled mixture 20 + 20 + 20 μL 20 + 20 + 20 μL 60 μL 60 μL 120 μL 120 μL to Vf to Vf
2 μL
150 μg 24 μL 16 μL
Treated-1
Gel 2
2 μL
2 μL
150 μg 24 μL 16 μL
Treated-3
Gel 3
20 + 20 + 20 μL 60 μL 120 μL to Vf
20 μL 20 μL samples and add 1/3 of the
2 μL
2 μL
150 μg 24 μL 16 μL
Control-3
This table illustrates a typical DIGE labeling experiment, as described in Subheadings 3.2 and 3.3.
2× R-buffer Total R-buffer
Total volume
Lysine (quench)
Precipitated amount L-buffer Aliquot Cy2 Cy3 Cy5
Control-1
Gel 1
Samples
Table 1 Experimental Design for CyDye Labeling Using a Pooled-Sample Internal Standard
60 μL
6 μL
8 μL (×6) 6 μL
Pool
108
Friedman and Lilley
Many precipitation methods are available, the following is a MeOH/CHCl3 protocol that works well for DIGE, and can be easily performed in 1.5 mL tubes [adapted from (46)]: 1. 2. 3. 4. 5. 6.
7. 8. 9. 10.
Bring up predetermined amount of protein extract to 100 μL with water. Add 300 μL (3-volumes) water. Add 400 μL (4-volumes) methanol. Add 100 μL (1 volume) chloroform. Vortex vigorously and centrifuge; the protein precipitate should appear at the interface. Remove the water/MeOH mix on top of the interface, being careful not to disturb the interface. Often the precipitated proteins do not make a visibly white interface, and care should be taken not to disturb the interface. Add another 400 μL methanol to wash the precipitate. Vortex vigorously and centrifuge; the protein precipitate should now pellet to the bottom of the tube. Remove the supernatant and briefly dry the pellets in a vacuum centrifuge. Resuspend the pellets in a suitable amount of CyDye labeling buffer (L-buffer, see Table 1).
An alternative widely used precipitation method is as follows: 1. 2. 3. 4. 5. 6. 7. 8. 9.
Add 5 volumes of cold 0.1 M ammonium acetate in methanol. Leave at –20°C for 12 h or overnight. Centrifuge at ∼3000 rpm (1400×g) for 10 min at 4°C and remove the supernatant. A pellet of protein should be visible at this stage. To wash the pellet, add 80% 0.1 M ammonium acetate in methanol and mix to resuspend the protein. Centrifuge at 3000 rpm (1400×g) for ten min at 4°C and remove the supernatant. To dehydrate the pellet add 80% acetone and resuspend the pellet by mixing. Centrifuge at 3000 rpm (1400×g) for ten min at 4°C and remove the supernatant. Dry pellet for 15 min by leaving open tube in a laminar flow cabinet.
3.3. DIGE Experimental Design 1. Start with a preliminary gel. All experiments should start with a preliminary gel on representative samples to ensure equivocal protein amounts between samples, and that the highest resolution and sensitivity are obtained before embarking on a multigel DIGE experiment. (see Notes 13 and 6). The preliminary gel will also show any problems with the sample preparation that may be corrected by adjusting the procurement methods (see Subheading 3.1). This step can also be used to optimize the maximal amount of protein can be loaded without adversely affecting resolution.
The preliminary gel needs only to test one or two of the samples of a much larger experiment. This gel can simply be stained with a total protein stain (e.g., Sypro Ruby or Deep Purple) to visually inspect the resolution and sensitivity.
Optimizing DIGE Technology
109
Alternatively, the gel can contain two different samples prelabeled with Cy3 and Cy5 and coresolved. (see Note 14). 2. Choose a suitable pH gradient for the IEF. Precast IEF strips are commercially available from several vendors. The widest length is currently 24 cm, providing the highest resolving power for a given pH range. Medium-range IEF gradients (e.g., pH 4–7) offer the best trade-off between overall resolution and sensitivity. Subsequent experiments can then be designed to resolve proteins in the basic range (pH 7–11) and in narrow pI ranges with commensurate increases in protein loading to gain access to the lower abundant proteins in a given sample (see Note 5). In this way a more comprehensive picture of the proteomes under study can be obtained. 3. Incorporate a pooled-sample mixture internal standard on every DIGE gel in a coordinated experiment. This internal standard, usually labeled with Cy2, is composed of an equal aliquot of every sample in the entire experiment, and therefore represents every protein present across all samples in an experiment. The use of this pooled-sample internal standard on every DIGE gel in a coordinated experiment allows for the facile comparison of independent sample replicates with increased statistical confidence. This experimental design also enables the simultaneous quantitative comparison between multiple variables in a coordinated experiment (Fig. 1). 4. Plan out which samples will be labeled with which dyes ahead of time. For minimal dye labeling chemistry (see Subheading 3.4), each gel will contain two individual samples labeled with either Cy3 or Cy5, and an equal amount of the pooled-sample internal standard. The example outlined in Table 1 is for a twocomponent comparison repeated in triplicate, with 300 μg total protein loaded onto each of three gels. In this case, 150 μg of each sample should be precipitated (see Subheading 3.2), resuspended in L-buffer and then split 2:1. Two-thirds of each sample (100 μg) will be individually labeled with either Cy3 or Cy5. The remaining 1/3 of each sample will be pooled together and labeled with Cy2 to serve as an internal standard. By following this, there will be enough of the Cy2-labeled internal standard to have an equal amount as the Cy3 or Cy5 samples loaded onto each gel. (see Note 15).
3.4. CyDye Labeling All steps are performed on ice. The following protocol is for sample loading via rehydration of IPG strips, and assumes incorporation of a pooled-sample internal standard to coordinate many samples across multiple DIGE gels simultaneously. The steps are summarized in Table 1 (see Note 16). 1. Resuspend precipitated sample in 24 μL labeling (L) buffer. Remove 8 μL (1/3 of sample) and place into a new tube that will contain the pooled-sample internal standard (8 μL from all of the other individual samples will be pooled into this tube) (see Note 17).
110
Friedman and Lilley
2. CyDyes are purchased as dry solids and should be reconstituted to 10× stock solutions (1 nmol/μL) in fresh DMF. Dilute stock solutions of CyDyes 1:10 in fresh DMF to a final working concentration of 100 pmol/μL (see Note 11). 3. Label each sample (50–250 μg) with 2–4 μL (200–400 pmol) of either Cy3 or Cy5 working dilution for 30 min on ice in the dark. Label the pooled-sample mixture with 2–4 μL (200–400 pmol) of Cy2 working dilution for every equivalent amount of sample present in the pooled standard as compared with the individually labeled samples. That is, if 100 μg of each sample is labeled with 200 pmol of Cy3 or Cy5, then 50 μg of each of these samples is present in the pooled standard, and 200 pmol of Cy2 is used for every 100 μg of pooled standard. (see Table 1 and Note 18). 4. Quench reactions with 2 μL of 10 mM lysine for 10 min on ice in the dark. 5. For each gel, combine the quenched Cy3- and Cy5-labeled samples and add 1/3 of the quenched Cy2-labeled pooled mixture. 6. To each tripartite mixture, add an equal volume of 2× R-buffer and incubate on ice for 10 min. 2× R-buffer is R-buffer supplemented with an additional 2 mg/mL DTT using the 200 mg/mL DTT stock solution. DTT is omitted from the L-buffer to prevent unfavorable interaction with the CyDyes. Adding an equal volume of 2× R-buffer to the quenched reactions provides the reducing agents to the total reaction volume at a 1× final concentration. 7. Add R-buffer (1× DTT concentration) to a final volume suggested by the manufacturer for the given IPG strip length (e.g., 450 μL for 24 cm strips). Add the appropriate volume of IPG buffer ampholines to 0.5% final (v/v) for IEF. Proceed with rehydration of dehydrated IPG strips for >16 h and proceed with IEF (see Subheading 3.5.3 and Note 19).
3.5. 2D Gel Electrophoresis and Poststaining As a result of the minimal labeling, quantification with the CyDyes is carried out on only 2–5% of the proteins that are labeled, and the labeled portion of the protein may migrate at a higher apparent molecular mass than the majority of the unlabeled protein due to the added mass and hydrophobicity of the dyes (exacerbated in lower Mr species). To ensure that the maximum amount of protein is excised for subsequent in-gel digestion and MS, minimally labeled 2D DIGE gels are poststained with a total protein stain such as SyproRuby or Deep Purple. Accurate excision is also ensured by preferentially affixing the second dimension gel to a presilanized glass plate during gel casting so that the gel dimensions do not change during the analysis (see Notes 20 and 21). These methods assume the use of the Ettan 2D electrophoresis system (GE Healthcare), but are easily adaptable to other commercially available systems. It also assumes usage of high-resolution 24 cm × 20 cm gels. 1. Special gels for second dimension SDS-PAGE. Using low-fluorescence glass plates, pretreat one plate for each gel with 3–5 mL bind silane working solution,
Optimizing DIGE Technology
2.
3.
4.
5.
6. 7.
8.
111
carefully wiping the entire surface of the plate with a lint-free wipe. Leave treated plates covered with lint-free wipes for several hours to allow for sufficient outgassing of fumes (that may contain bind silane) before assembling gel plates and casting of second dimensional SDS-PAGE gels (see Note 22). Assemble plates and pour 12% homogeneous SDS-PAGE gel(s) using the appropriate amount of 30% stock acrylamide and 4× separating gel buffer for the volumes needed for the number of gels being poured (see Note 23). Overlay the gels with water-saturated butanol for several hours to provide a straight and level surface to place the focused IPG strip (see Note 10). Perform IEF using an IPGphor II IEF unit (GE Healthcare) of the combined tripartite-labeled samples, brought up to final volume with 1× R-buffer and passively rehydrated into IPG strips for >16 h (see Subheading 3.4.7) (see Note 24). Equilibrate the focused IPG strips into the second dimensional equilibration buffer. During this step, the cysteine sulfhydryls in the focused proteins are reduced and carbamidomethylated by supplementing the equilibration buffer with 1% DTT for 20 min at room temperature, followed by 2.5% iodoacetamide in fresh equilibration buffer for an additional 20 min room temperature incubation (see Note 25). Place equilibrated IPG strip on top of the SDS-PAGE gels that were precast with low-fluorescence glass plates. Use a thin card or ruler to carefully tamp down the IPG strip to the SDS-PAGE gel, removing air bubbles at the interface (see Notes 26 and 27). Perform second dimensional SDS-PAGE at constant wattage, using 1 W/gel for at least 1 h prior to ramping up to 3) are handled by the biological variation analysis (BVA) module of DeCyder. In a BVA experiment, the signals emanating from the internal standard are used both for direct quantification within each DIGE gel in a coordinated set (using Differential In-gel Analysis (DIA) module), as well as for normalization and protein spot pattern matching between gels (see Note 31). This allows for the calculation of Student’s t-test and ANOVA statistics for individual abundance changes (see Subheading 3.6.2, and Table 2). BVA is also used to match patterns between SyproRuby- and CyDye-stained images to facilitate protein excision for subsequent MS (see Notes 20, 21, and 30). 3.6.2. Experimental Design and Statistical Confidence In the simplest form of a DIGE experiment, two or three samples are separately labeled with one of the three dyes and separated in the same gel for direct pairwise comparisons. In this case, the software first normalizes the entire signal for each CyDye channel and then calculates the protein spot volume ratio for each protein pair. A normal distribution is modeled over the actual distribution of protein pair volume ratios, and two standard deviations of the mean of this normal distribution represent the 95th percent confidence level for significant abundance changes. This N = 1 type of experiment has limited statistical power, since the 95th percentile confidence interval is determined based on the overall distribution of changes within the population (see Note 32). Many more changes in abundance of much lesser magnitude can be detected with much greater statistical confidence (Student’s t-test and ANOVA, Table 2) by incorporating independent
Optimizing DIGE Technology
113
Table 2 Statistical Applications of DeCyder Biological Variation Analysis and Extended Data Analysis (EDA) Modules Average ratio
Student’s t-test
One-way ANOVA
Two-way ANOVA
Principle component analysis (EDA only)
Hierarchical clustering (EDA only)
K-means (EDA only) Self organizing maps (EDA only) Gene shaving (EDA only)
Discriminant analysis (EDA only)
Calculated for each protein spot feature between two groups or experimental conditions. Derived from the log standardized protein abundance changes that were directly quantified within each DIGE gel relative to the internal standard for the protein spot feature. Univariate test of statistical significance for an abundance change between two groups or experimental conditions. p-values reflect the probability that the observed change has occurred due to stochastic chance alone. With DIGE, p-values of 10
112
Fig. 3. Quantitative analysis results of 261 proteins from LCM-ICAT-2D-LCMS/MS. A total of 149 differentially expressed proteins with at least twofold quantitative alterations in HCC and non-HCC hepatocytes were detected, including 55 upregulated proteins (32 with 2∼5 folds, 13 with 5∼10 folds, 10 with >10 folds) and 94 downregulated spots in HCC hepatocytes (62 with 2∼5 folds, 17 with 5∼10 folds, 15 with >10 folds). Reprinted with permission from (34).
Proteomic Analysis of Clinical HCC Using LCM
203
Table 1 Summary of Total Proteins Identified in HCC-NESP-1D-LC-MS/MS, HCC-NESP-2D-LC-MS/MS and HCC-LCM-2D-LC-MS/MS
Protein quantity Total proteins identified Hydrophobic proteins Trans-membrane proteins Proteins with Mr >100KD or < 10KD Proteins pI >9
HCCNESP-1DLC-MS/MS
HCCNESP-2DLC-MS/MS
HCCLCM-2DLC-MS/MS
200μg 208 25(12.0%) 8(3.9%) 19(9.1%) 21(10.1%)
200μg 626 64(10.2%) 30(4.8%) 77(12.3%) 78(12.5%)
200μg 644 80(12.4%) 54(8.4%) 75(11.6%) 126(19.6%)
2. The general average hydropathicity (GRAVY) score is calculated as the arithmetic mean of the sum of the hydropathic indices of each amino acid (32). Examples of the results produced are shown in Table 1 and Fig. 5C. 3. The trans-membrane prediction is conducted using the computer server program TMHMM server 2.0, which can be accessed from the CBS (http://www.cbs.dtu.dk/services/TMHMM/). Examples of the results produced are shown in Table 1 and Fig. 5D. 4. All identified proteins are classified by their molecular function, cellular component, and biological process with the tools on http://www.geneontology.org. An example of the results produced is shown in Fig. 4.
4. Notes 1. Glutamine-free RPMI 1640 medium must be cold (4°C) before use. Washing should be done as quickly as possible, until there are no contaminations (blood, etc.) on tissues. Glutamine-free RPMI 1640 medium could be replaced by PBS (pH 7.4), 0.9% NaCl solution, or any other isotonic buffer. 2. Store the lysis buffer in small aliquots at –8°C to avoid multiple freeze-thaw cycles. Protease inhibitor tablet mixture (Roche Molecular Biochemicals) should be dissolved in lysis buffer. 3. Store the samples in small aliquots at –8°C to avoid multiple freeze-thaw cycles. Protein concentrations of the samples should be about 10 μg/μl for subsequent experiments. 4. The sections should be very lightly stained with toluidine blue only to distinguish hepatocytes during microdissection. Otherwise, the redundant stains could affect follow-up experiments. 5. In fact, in order to reduce microdissection time, manipulators could choose to capture hepatocytes or remove other cells based on the condition of each section.
204
Li et al.
A.
B.
Fig. 4. Classification of differentially expressed proteins obtained by LCM-ICAT2D-LC-MS/MS. (A) shows proteins with at least twofold increased expression levels in HCC hepatocytes. (B) shows proteins with at least twofold decreased expression levels in HCC hepatocytes. Reprinted with permission from (34).
6. Precipitation solution, acetone, and ethanol must be cold at –20°C before use. 7. Ultrafiltration is very important to remove redundant salts, stain, and other impurities, and ensure follow-up steps. 8. TBP is a much stronger but more toxic reducing agent for labeling ICAT reaction than DTT.
Proteomic Analysis of Clinical HCC Using LCM
205
A. M r distribution
10
9
18 11
27 13 12 4
3
Protein number
15 6
20 30
0 >1
) ~1 0) (9
~9
~8
(8
17
15 10 5 0
(– < 1. –1 (– 0~– .0 0. 0 (– 9~– .9) 0 0 (– .8~ .8) 0. –0 (– 7~– .7) 0. 0. (– 6~– 6) 0. 0 . (– 5~– 5) 0. 0 (– 4~– .4) 0. 0 3 . (– ~– 3) 0. 0. 2~ 2) – (– 0.1 0. ) 1 (0 ~0 ~ ) (0 0.1 .1 ) – (0 0.2 .2 ) –0 .3 >0 ) .3
Protein number
31
6
D. Trans-membrane protein distribution
39
37
(7
pI range
C. Hydrophile and hydrophobicity distribution 45 40 35 30 25 20 15 10 5 0
)
)
M r range
)
0
~6
0
~7
50 – 100 >100 kDa kDa
30 – 50 kDa
(5
10 – 30 kDa
21
(6
7 01 1+e
005
Reprinted with permission from (10). 4. Obtain an initial noise level estimate by the percentile of MS/MS peak intensities at each of the 200 design points, where the percentile used at each point is derived from step 3 above (see Note 11). 5. Smooth the initial noise estimates with a Gaussian kernel smooth (150 m/z bandwidth) and interpolate between the 200 design points to obtain the final MS/MS noise estimate at each measured m/z value. Subtract this estimate from the measured MS/MS peak intensities and set any negative values to zero. An example of a high and low signal-to-noise MS/MS spectrum and the resulting estimated noise levels is shown in Fig. 3.
3.7. Peptide Identification A detailed description of peptide identification is beyond the scope of this chapter, but some general discussion is warranted given the importance of the subject and its linkage to quantification with the proposed method. The primary problem with peptide identification is controlling for false-positive identifications while maintaining a reasonable sensitivity to detect correct identifications. Our approach utilizes the outputs of two search engines, Sequest (19) and X! Tandem (20), along with other descriptive features of identification (e.g., charge state, peptide length, etc.) as inputs to a classifier that has been trained
218
Higgs et al.
0
Intensity 20,000
50,000
(A)
200
600
1000
1400
350,000
m/z
0
Intensity 150,000
(B)
500
1000
1500
m/z
Fig. 3. Example MS/MS spectra and their estimated noise levels. 443 original peaks reduced to 118 peaks above estimated noise level in high-noise spectrum (A). 589 original peaks reduced to 173 peaks above estimated noise level in lower noise spectrum (B). Reprinted with permission from (10).
to identify correct identifications (21). The output of the classifier provides a unit-less score indicative of the likelihood of a correct identification. Falsepositive identifications are controlled by running the searches against reversed versions of the protein databases and estimating the p-values: the probability of observing a model score from the reversed database search that exceeded the observed score from the correct database. P-values alone are insufficient due to the large number of tests (identifications) being done (i.e., with a 0.05 p-value cutoff, 5% of identifications declared correct would in fact be incorrect in the null condition where there are truly no matches to any MS/MS spectra). To account for multiple testing, false discovery rates (FDRs) (q-values) for
Label-Free Biomarker Identification
219
peptide identifications are estimated from p-values using the method described by Benjamini and Hochberg (22). Peptides with identification q-values less than a threshold, say 0.10, are retained for quantification. Proteins identified by only one peptide are visually examined to eliminate obvious incorrect identifications (e.g., less than four consecutive y- or b-ions). We estimate that the proportion of false identifications using such a procedure is less than or equal to 2%. Overall, the method is similar in strategy to PeptideProphet (23) with the following extensions: multiple search engines are employed, a more flexible classifier (e.g., Random Forests) is used, and statistical significance is estimated from a null distribution of classifier scores derived from reversed database searching instead of fitting a mixture model to the distribution of classifier output scores. The method is described in detail in Higgs et al. (11). In general, we typically restrict biomarker hypothesis generation to identified peptides. The same relative quantification method can be used with unidentified peptides (MS features), although in practice these features need to be identified to be of practical use to clinicians and biologists. To maximize the coverage of proteins identified in a study, identifications from all samples in the study are pooled and used to create a list of peptides to quantify in each sample. Thus, a confident identification needs to be made once out of a sample in order for the associated peptide ion current to be quantified in all study samples. Pooling the identifications across all samples in a study significantly increases the number of identifications relative to the number of identifications from any single sample. 3.8. Chromatographic Alignment Variability in the abundance of individual peptides between different samples may result in that peptide triggering an MS/MS scan in one sample and not in another. The area of this peptide may still be extracted from the primary mass spectrum in each sample. However, doing so requires high-quality chromatographic alignment between the samples so that a consistent region in the extracted ion chromatogram (XIC) is used for integration across all samples in a study. Large biomarker studies can produce chromatographic retention time shifts greater than 1 min between pairs of samples run several days and many samples apart. Simply expanding the integration window by 1 or 2 min to account for chromatographic variability is not an option in our experience as we are analyzing complex samples with multiple co-eluting peaks at most XIC masses. An expanded integration window that includes multiple peaks masks the quantification of individual peptides, produces results that are confounded with multiple peptides contributing to a value, and increases variability. Peak picking is another option, but was not applied here due to the computational
220
Higgs et al.
cost as well as the inherent heuristic nature of peak picking algorithms with an associated variability in what is being integrated. We have found a simple pair-wise alignment between all samples and a select reference sample in the study to work well for numerous biomarker discovery projects. This approach to alignment is founded on the following assumptions: (a) the samples included in the study are generally quite similar to each other with respect to their peptide content (i.e., there are many peptides or landmarks in common between the samples), (b) the same chromatographic conditions are used for each sample in the study, and (c) in a local region of retention time, the retention time offset between any two samples is approximately constant (see Note 12). 1. Identify the landmarks in the reference sample by taking all triple-play scan events with a zoom scan cross-correlation score of 0.65 or greater. This set of reference sample landmarks will be matched against other samples in the study. 2. Identify the matching landmarks in a study sample by declaring a landmark match if the sample and reference triple-play events have: (a) the retention time of the triple play event between the samples is within a user-specified amount (5 min), (b) the charge state of the peptide matches, (c) the m/z value of the monoisotopic peak from the zoom scans is within a user-specified amount (0.7 Da) between the two samples, (d) the zoom scan cross-correlation coefficient of both peptides to their respective theoretical isotope patterns exceeds a threshold (0.65), and (e) the similarity between the corresponding MS/MS spectra exceeds a threshold (e.g., 0.75). The MS/MS similarity metric has been implemented as a cross-correlation coefficient between two MS/MS spectra following a convolution of each MS/MS stick-spectrum with a Gaussian peak shape. 3. For each matching pair of landmarks identified in step 2 above, generate the XIC for the feature in a local retention time window (e.g., ±5 min of scan event time in each sample). Convolve the two XICs to identify the time shift value that maximizes the convolution result between the landmark XICs in both samples. Record the time shift and cross-correlation at the optimal shift value for each landmark. The cross-correlation value will be used as a weighting factor in the subsequent smoothing step below. 4. The optimal time shift values for each pair of landmarks between a sample and the reference defines a warping function that can be used to transform the retention time values of a sample to the reference. Estimate a smooth warping function by fitting a weighted loess (24) to the time shift versus retention time values for each sample. The loess should be done in a weighted manner using the XIC cross-correlation values from step 3 above as weights. The result is a smooth function that can be used to transform a sample’s retention time to a common time defined by the reference sample Fig. 4. 5. The loess warping function for a sample is then applied to all the retention times in the chromatogram (landmark or not). Thus, all samples in a study are projected onto the same retention time scale. The warping function between two samples is generally not monotonic over the entire retention time range, and no restriction
221
0.0 –0.5
Shift (min) n = 462
0.5
Label-Free Biomarker Identification
0
20
40
60
80
100
120
Ret. Time (min)
Fig. 4. Example chromatographic alignment (“warping”) function between two rat serum samples. Retention time shift (min) vs. retention time (min) for 462 landmark peptides are plotted with the resulting loess fit. Reprinted with permission from (10). on overall monotonicity is used in our estimate of the warping function. We do, however, preserve the overall rank order of the retention times following alignment by constraining the bandwidth (span = 0.5) used in the loess fitting (24) (see Note 13).
3.9. Peptide Quantification Relative quantification of peptides is carried out by integration of the XIC peak (using normalized retention times from the chromatographic alignment) from the primary mass spectrum within each sample. A list of peptides to integrate within each sample is constructed by pooling together all triple-play events across all the samples. This pooling can be done with or without the use of peptide identification. As previously noted, we typically restrict the analyses to identified peptides. For each identified peptide, perform the following steps: 1. For each sample in which the peptide was identified, extract the XIC for the peptide and compute the centroid (weighted average of retention time values where weighting factor is the XIC ion current) of the XIC in a small retention time neighborhood (–0.5 min to +1.0 min from triple-play trigger time) using the aligned time values in the XIC. Compute the mean centroid time for the peptide over all samples in which the peptide was identified. Also compute the mean average m/z value estimated from the zoom scan spectrum for each sample in which the peptide was identified.
222
Higgs et al.
2. For each sample in the study, create an XIC for the peptide using the mean zoom scan average m/z value determined in step 1. 3. Estimate a local XIC baseline level and subtract the baseline from the XIC intensity values from each sample. A local linear baseline can be estimated by fitting a line between the lowest intensity XIC point before the peak and the lowest intensity XIC point following the peak in a local neighborhood (e.g., 5 min). This simple local linear baseline estimate always results in a baseline estimate below the signal intensity in the local neighborhood, leading to a low bias in the estimated baseline. For large peaks, this bias is negligible but for small peaks the bias may have a more pronounced effect on quantification. Alternatively, an asymmetric least squares smoothing approach may be used to estimate the baseline XIC values in order to reduce the potential bias with the simple local linear approach (25). 4. A fixed retention time window (±0.5 min for the chromatography described) around the mean centroid time value described in step 1 is used for integration. The width of this window is dependent on the chromatography method used. For the chromatography method reported here, the peak width remains relatively constant across the HPLC gradient (i.e., no band-broadening is observed). If band-broadening is observed, then the integration window width should be modeled as a function of the retention time (e.g., integration window width = intercept + slope × retention time). 5. Integrate the baseline corrected XIC values within the fixed retention time window for each sample in the study using a numerical integration algorithm such as the trapezoid rule. Record the XIC area values for each peptide in each sample. An example of XIC integration for a small study is shown in Fig. 5.
3.10. Data Transformation and Normalization Following the integration of peptide-specific XIC peaks in all study samples, we have a rectangular data table with N rows corresponding to N samples in the study, and P columns corresponding to peptides detected in the study. The cell values in this data table are the peptide peak areas. With this table in hand, the usual operations of transformation and normalization may be applied prior to any statistical analysis. 1. Peptide peak areas are approximately log-normal distributed. Apply a log2 transformation to all peak area values (see Note 14). 2. Normalize the log2 transformed peak areas using a quantile normalization procedure (26) (see Note 15). 3. Normalized log2 peptide areas may be used directly as input to the statistical analysis for the study (peptide level analysis). Additionally, the average of normalized log2 peptide areas for all the peptides identified from a protein can be used as an overall estimate of the protein level (protein level analysis, see Note 16).
Label-Free Biomarker Identification
223
Fig. 5. XICs from the 2+ –1 macroglobulin peptide ATPLSLCALTAVDQSVLLLKPEAK for eight rat serum samples following chromatographic alignment. Note that the peak from all samples fits within the highlighted [83.2, 84.2] integration region. Reprinted with permission from (10).
224
Higgs et al.
3.11. Study Design, Power, Sample Size, and Analysis Our strategy of producing an N × P table of relative peptide levels allows the flexibility for the analysis to be done in a manner consistent with the study design. Note that no part of the described method imposes any limitation on the final study statistical analysis (e.g., pooling of samples, subtractiveor difference-based methods, etc.). In general, the statistical analysis used for identifying potential protein biomarkers in a study should follow the same approach as a primary clinical endpoint analysis would take (i.e., a simple paired design should be analyzed with a paired t-test, a crossover design with repeated measures within period should be analyzed as a crossover study with repeated measures within period, etc.). An analysis of a single clinical endpoint may use the familiar type I error threshold of 0.05 as a measure of statistical significance. This approach does not work well when testing hundreds or thousands of proteins in a study because, by definition, 5% of all p-values from a null experiment (an experiment in which there is truly no treatment or group effect) will have a p-value less than 0.05. The Bonferroni approach to control the family-wise type I error (controlling for no errors in the set of declared changes) has been commonly employed as a means to control false-positive findings (27). However, many investigators doing proteomic hypothesis generation are willing to tolerate some level of falsepositive findings in a declared set as long as it is relatively low and estimated. The use of FDR as a means to identify a set of declared findings with a specified proportion of false-positives has been widely applied in genomics (22) and is the current recommendation for proteomic hypothesis generating experiments. There are numerous estimators of FDR (28,29) with the original method described by Benjamini and Hochberg used in the work presented here (22). Just as multiple comparisons should be considered in the analysis of study data, these should also be considered at the design stage of a new study aimed at generating hypotheses from highly multiplexed measurements like proteomics. This is a relatively new field of research with several methods recently reported (30,31,32,33). A simple approach originally suggested by Benjamini and Hochberg (22), and adapted by Bemis (34), uses traditional sample-size calculations with the following expression for average type I error 1 (ave ) over a set of tested hypotheses: ave = f ave q ∗ m +mm1−q ∗ where fave is the 1 0 average power of hypothesis tests conducted in a study, q ∗ is the rate at which FDR is to be controlled, m0 is the number of true null hypotheses tested, and m1 is the number of true alternative hypotheses tested. Sample-size estimates are made by first estimating ave using the desired values for fave and q ∗ , assumed values for m0 and m1 , and existing sample size calculators using for a given study design. An example set of sample-size curves using ave this approach for the two-sample t-test design is given in Fig. 6.
Label-Free Biomarker Identification
225
Fig. 6. Estimated sample sized required to detect protein changes in a two-sample t-test design. Number of subjects in each of the two groups is plotted against the detectable effect size expressed as a fold-change. Four different levels of total variability are shown (10% CV, 20% CV, 30% CV, and 40% CV). Sample size estimates were made using 85% power, a 0.10 target FDR for declaring significance, and an estimated 0 proportion of true null hypotheses, m m+m , set to 0.98. 0
1
4. Notes 1. We find that plasma total protein concentration, as measured by a Bradford assay, has a total coefficient of variation (CV) of approximately 11% (includes inter-subject, intra-subject, and assay error) and ranges between approximately 48 and 68 mg/mL (12). Due to the apparent highly regulated plasma total protein concentration, it is not generally necessary to measure total protein concentration for each sample in a study in order to load a consistent amount of protein. 2. The depletion material used is based on a dye affinity removal method for albumin. There are commercially available antibody-based depletion kits that may improve albumin removal at a reasonable cost. Abundant protein depletion is an open and active research area at the time of this writing. 3. Chicken lysozyme is added as a spiked internal standard at this stage in order to qualitatively assess the digestion efficiency as well as to quantitatively assess the measurement error across the samples in a study. Other internal standard(s) could also be used. 4. The reduction/alkylation solution should be prepared just before use. Triethylphosphine is pyrophoric and should be handled in a fume hood in accordance with the material safety data sheet. The use of volatile reagents for this step
226
5.
6.
7.
8.
9.
10.
Higgs et al. reduces the variability in the sample prep by minimizing sample handling steps and removing the majority of reduction and alkylating reagents. The digestion is performed with trypsin, which is sensitive to the presence of reducing reagents. We find that CSF total protein concentration, as measured by a Bradford assay, has a total CV of approximately 27% (includes inter-subject, intra-subject, and assay error with the additional total variability relative to plasma total protein attributed to a higher CSF inter-subject variance) with a range between approximately 0.12 and 0.41 μg/mL (12). The higher overall variability is attributed to a significantly higher inter-subject variability relative to plasma total protein (12). Due to the higher variability with CSF total protein, we use the results of Bradford total protein assay to process a consistent total CSF protein amount in the proteomics assay. The HPLC pumps must be capable of producing a smooth gradient at 50 μL/min. The gradient formation should be verified by using water in A and 1% acetone in water for B and running the gradient with UV monitoring at 254 nm. New HPLC columns should be conditioned with at least four runs of digested serum before use in the method. The mass spectrometer’s source should be carefully cleaned to minimize chemical noise. Monitor above 300 m/z and try to maximize the injection time as this is directly proportional to achievable dynamic range in an ion trap mass spectrometer. The spray conditions should be optimized for a peptide of about ˜1700 Da. Alternatively, a design could be used to balance various study factors (e.g., treatment, gender, age, etc.) with injection order. This approach may be most appropriate for small studies (e.g., Find All Features. This will bring up the Extract Features dialog box as shown in Fig. 3. 3. In the “Save Features to File” field, enter (or browse for) a path and add a name for the new Feature Set file. 4. Specify a scan range in the “Start Scan” and “End Scan” fields to limit feature finding to a subset of scans. By default, msInspect will attempt to find peptides in all scans (see Note 9). 5. Left click the Find Features button to begin the feature finding process. As the file is processed, the status bar at the bottom of the msInspect window will display progress. For a large input file, processing may take upwards of 20–30 min. 6. When processing is complete, features will be written to the specified output file and highlighted as colored crosses in the Image and Detail Panes. The status bar will display “Finding features complete. See file yourfilepath\yourfile.peptides.tsv.” Place the mouse cursor over one of the detected features to display a summary of its properties. Left click on the feature to view details in the Properties Pane (display by Windows > Show/hide Properties). 7. Select Tools > Display Peptides… to open the Display Features dialog box as shown in Fig. 4A for customization:
Fig. 3. Extract Features dialog box.
Open LC-MS Analysis Platform (A)
(B)
Fig. 4. Continued
375
376
Fitzgibbon et al. a. Display or hide the colored crosses by checking or unchecking the box under the “Display” field. b. Change the color of the crosses by left clicking on the colored box under the “Color” field. A new color can be selected from a color palette. c. View the Feature Set browser by left clicking on the “…” button. This browser lists details of all peptides in the Feature Set. This list can be sorted and edited, comments can be added to a feature, features can be deleted, and the modified Feature Set file may be saved (see Note 10).
3.3. Filtering to Eliminate Low-quality Peptides Low-quality peptides can be removed in msInspect by applying userspecified filtering criteria (e.g., a minimum number of isotopic peaks detected). Removing low-quality peptides is particularly helpful when peptide arrays are to be generated (described in Subheading 3.4.1). 1. Select Tools > Display Peptides…. 2. Left click the Filter tab at the bottom of the Display Features dialog box. This tab displays several parameters by which features can be filtered. 3. Set Min Charge = 1, Min Scans = 3, Min Intensity = 5, Max KL = 1.0, and Min Peaks = 2 as shown in Fig. 4A (see Note 11). 4. Left click the Apply button. The Detail Pane now shows only the features that meet these filtering criteria. 5. Save the filtered Feature Set file over the original file by left clicking on the “…” button at the top right of the Display Features dialog box, then left clicking on the Save button.
3.4. Quantitation of Peptide Features 3.4.1. Quantitation Using Label-free Approaches Features from multiple experiments can be compared in msInspect by simultaneously opening Feature Set files from multiple LC-MS runs, displaying them together, and generating a peptide array. Below are directions for multiple LCMS run comparisons after Feature Set files have been produced (as described above in Subheadings 3.1–3.3) for all LC-MS runs to be compared. 1. Select Tools > Display Peptides…. 2. Left click on the Add Files button (Fig. 4A).
Fig. 4. (A) Display Features dialog box with one file loaded and the Filter tab selected. (B) Display Features dialog box with two files loaded and the Peptide Array tab selected.
Open LC-MS Analysis Platform
377
3. Browse to find another Feature Set file (with file extension.peptide.tsv) and open it. A different colored cross is assigned in the Image Pane to the features from each newly opened file. In this way, multiple Feature Set files can be opened and overlaid in the Image Pane (see Note 12). 4. Left click on the Filter tab (Fig. 4A) at the bottom of the Display Features dialog box and make sure the filter criteria are still set to the values entered in Subheading 3.3 (Min Charge = 1, Min Scans = 3, Min Intensity = 5, Max KL = 1.0, and Min Peaks = 2). Left click on the Apply button if any changes are made. 5. Left click on the Peptide Array tab (Fig. 4B) to set criteria for the peptide array to be generated: a. Enter a name for peptide array file that will be generated. By convention, this file name should end with “.pepArray.tsv.” b. Click the Optimize button to have msInspect search for reasonable tolerances for matching features across runs (see Note 13). c. Check the Normalization box if normalization of features is desired (2). d. Click the Calculate button to actually compute the peptide array. 6. The generated peptide array file consists of one column of intensities for each run and one row for each matched feature. The file is stored in a simple tab-delimited format, which can be exported (to Excel and other programs) and analyzed using tools traditionally applied to genomic arrays (see Note 14).
3.4.2. Quantitation Using Isotopic Labeling A common method of relative quantitation of peptides involves applying heavy and light isotopic labels separately to two samples, then mixing them prior to collecting LC-MS data. Typically, tandem MS/MS (or MS2) experiments are used to analyze these labeled samples. Peptide sequencing in MS/MS can detect the number of labeled residues in each peptide and therefore determine the expected mass difference between light and heavy forms of each peptide. msInspect can perform relative quantitation even in the absence of MS/MS information. Provided with the mass of the light and heavy reagents and with a threshold on the number of labeled residues to consider, msInspect will search for pairs of features consistent with isotopic labeling. 1. Open the file to be analyzed as described in Subheading 3.1. 2. Select Tools > Find All Features. 3. This will again bring up the Extract Features dialog box as shown in Fig. 3. Enter a new output file name and select a scan range of interest as described in Subheading 3.2.3–3.2.4. 4. Note the “Quantitate” check box in this dialog. Selecting this box will enable several options for relative quantitation.
378
Fitzgibbon et al.
5. Select one of several common isotopic labeling strategies (e.g., Cleavable ICAT and O16 /O18 ) from the pull-down menu. Details can be entered including masses for light and heavy label reagents, the particular amino acid labeled, and the maximum number of labeled residues to consider. 6. Left click on the “Find Features” button to locate all features in the specified scan range. Display features from the Feature Set file as described in Subheading 3.2.7. An additional matching step is performed to locate isotopically labeled pairs. A pair is indicated by a vertical bar connecting the light and heavy partners in the Detail Pane. Selecting a pair by left clicking in the Detail Pane will display feature properties including the light and heavy intensities, the ratio of light to heavy, and the number of isotopic labels detected. 7. The results of this quantitation process are stored in a tab separated value (TSV) file specified in step 3.4.2.3. One record is written for each isotopically labeled pair and for each unlabeled peptide (see Note 15).
4. Notes 1. More information on the mzXML file format, as well as utilities to convert native acquisition files from many common MS instruments to mzXML, can be found on the Sashimi website at http://sashimi.sourceforge.net. 2. Running msInspect via Java Web Start is highly recommended for casual use, as it greatly simplifies installation and update of the software. msInspect’s major features, such as feature finding and peptide array creation, are available from the command line as well, and command-line use is more appropriate for batch processing of large numbers of mzXML files. To use msInspect from the command line, the stand-alone JAR file can be downloaded from http://proteomics.fhcrc.org/CPL/msinspect.html. This web page also allows download of the msInspect user’s guide, which contains detailed instructions on installation, using msInspect’s features from the command line, and full source code for the released version (5). 3. Feature extraction can require a great deal of memory since it operates on several scans at a time. By default the Java Web Start version of msInspect allows up to 384 MB of memory to be allocated so that a number of scans and intermediate results may be cached. If additional memory is available on the computer, the amount of memory accessible by msInspect may be increased when running msInspect from the command line with the “-Xmx” option when invoking Java. For example “java –Xmx512M –jar viewerApp.jar.” 4. Sample data files are available at https://proteomics.fhcrc.org/CPAS. From that website, follow the “Published Experiments” link on the lower left side and then left click on the “MiMB Clinical Proteomics” link on the left side. Because LC-MS files can be quite large, the samples provided for download are only small subregions of the files used as figures in Section 3. Some browsers, such as Internet Explorer, may add a “.mzXML.xml” suffix when downloading these
Open LC-MS Analysis Platform
5.
6.
7.
8.
9.
10.
379
files. This should not affect msInspect’s ability to read the files and may be safely modified to “.mzXML” if desired. The first time a particular mzXML file is loaded, msInspect will write a “.inspect” file in the same directory where the mzXML file is located. This file contains an index of each scan in the original file, which will speed subsequent file access. Construction of this index file can take some time for larger input files; the status bar at the bottom of the msInspect window will indicate progress. The area shown in the Detail Pane is indicated in the main Image pane by a blue rectangle. Several aspects of Detail Pane behavior can be adjusted by selecting Detail Pane Settings from the Tools menu. There, feature detection can be turned on or off, background noise that falls below a threshold can be hidden, and the color scheme of the Detail Pane can be modified. Note that in Fig. 1 the Chart Pane clearly shows individual isotopic peaks because the data is from a high-resolution instrument (in this case a Waters LCT Premier). msInspect depends on resolving individual isotopes to infer the charge state of the peptide and therefore its mass. The charge is derived from the reciprocal of the distance between adjacent peaks. In Fig. 1 the peaks of the peptide on the left side of the Chart Pane are 0.5 m/z units apart, therefore msInspect infers that this peptide has a charge of 2. It is not possible to infer a charge for a single peak, so “stray peaks” that cannot be grouped into an isotopic cluster are assigned a charge of zero. msInspect includes a number of feature extraction algorithms, which can be selected in the Tools menu. The default, two dimensional (2D) peak alignment, is recommended for most purposes. The single scan algorithm may be useful if there is little or no scan-to-scan coherence. The feature extraction algorithms in msInspect have been designed to work on high-resolution profile mode data. The algorithms have been successfully applied to centroided data, but performance will depend on the particular centroiding algorithm used and on the noise characteristics of the run under consideration. For such data, the centroided scan algorithm may be appropriate. Once peptides have been located, some amount of visual curation is recommended. The Heat Map view (accessed from the Tools menu) can provide a global view of features grouped by charge state and sorted by various metrics such as mass or intensity. Each column in the Heat Map view consists of a small intensity window around each feature, colored from low intensity (red) to high intensity (yellow). Clicking on a feature in the Heat Map will highlight it in the other windows. By sorting on KL score or intensity and inspecting a few features, one can gain a sense of what filtering criteria might be appropriate for a given data set. When new filter settings are applied, as described in Subheading 3.3, the Heat Map view is automatically updated. A typical example of editing a Feature Set file: a. Sort by ascending KL score (Left click on the “KL” column header). b. Find a feature with KL < 1 that was misidentified by examining its spectrum in msInspect window’s Chart Pane.
380
Fitzgibbon et al. c. Double click in the Description field for the feature to add a comment to the Feature Set List noting that this feature is “questionable.” d. Click “Save” to save changes by overwriting the old Feature Set file.
11. Filtering peptide features can improve the performance of subsequent steps such as construction of peptide arrays. Specific filtering criteria will depend on instrumentation and the experiment goals. The most frequently used filtering criteria include: a. Minimum charge – msInspect locates features by first finding peaks and then grouping them into isotopic distributions consistent with individual peptides. Some peaks will not group with any others and are referred to as “stray peaks.” As described in Note 7, it is not possible to infer the charge state of these stray peaks, so they are assigned a charge of zero. Setting the minimum charge to 1 when filtering will remove these stray peaks, which are often due to noise or chemical contaminants. b. Minimum number of peaks – confidence in the location and charge state assignment of a peptide feature may be greater if it is supported by more isotopic peaks. Setting the minimum number of peaks to 2 will also eliminate the stray peaks described above. c. Minimum number of scans – set the minimum number of scans that a peptide must span in order to be considered. This has the effect of eliminating peptide features that persist for only a brief time. d. Minimum intensity – setting a minimum intensity threshold is often appropriate, although the specific value used will depend on the instrument. e. Maximum KL score – peaks are grouped by how well they match a model of the isotopic distribution of a peptide with a given mass. The KL score described in Bellew, et al. (1) measures how much an extracted group of peaks deviates from this model; in general, a lower KL score indicates a better match. 12. When multiple feature sets are loaded, it is often useful to hide particular sets or to change the colors of the crosses that mark features in a given set. Both of these can be accomplished in the Display Features dialog box as shown in Fig. 4A (select Tools > Display Peptides). For each feature set, this dialog box provides a checkbox to control visibility and a color palette to select colors for the crosses. 13. After optimization, the mass and scan window values that give the best alignment results automatically populate the Peptide Array tab. 14. A number of high-quality open source tools are available for microarray analysis. To analyze peptide arrays produced by msInspect, tools from the Bioconductor project (http://www.bioconductor.org) and from the TM4 microarray software suite (http://www.tm4.org) have been used. 15. Results from isotopic labeling should be treated as suggestive rather than authoritative. Without peptide sequence information, the mass difference between heavy and light partners cannot be definitively ascertained. The quality of the
Open LC-MS Analysis Platform
381
matching is therefore dependent on the quality of feature filtering and the density of features in each run.
Acknowledgments The authors would like to thank Matthew Bellew, Marc Coram, Jimmy Eng, Ruihua Fang, Mark Igra, and Tim Randolph for their intellectual contributions to the development of msInspect. This work was supported by contract # 23XS144A from the National Cancer Institute. References 1. Bellew, M., Coram, M., Fitzgibbon, M., Igra, M., Randolph, T., Wang, P., May, D., Eng, J., Fang, R., Lin, C.W., Chen, J., Goodlet, D., Whiteaker, J., Paulovich, A., and McIntosh, M. (2006) A suite of algorithms for the comprehensive analysis of complex protein mixtures using highresolution LC-MS. Bioinformatics Advance Access published on June 9, 2006 http://bioinformatics.oxfordjournals.org/cgi/reprint/btl276v1. 2. Wang, P., Tang, H., Zhang, H., Whiteaker, J., Paulovich, A.G., and McIntosh, M. (2006) Normalization regarding non-random missing values in high-throughput mass spectrometry data. Proceedings of the Pacific Symposium on Biocomputing 11, 315–326. 3. May, D. Fitzgibbon, M., Liu, Y., Holzman, T., Eng, J., Kemp, C.J., Whiteaker, J., Paulovich, A., and McIntosh, M. (2007) A Platform for Accurate Mass and Time Analyses of Mass Spectrometry Data. Journal of Proteome Research 6(7), 2685–2694. 4. Pedrioli, P.G., Eng, J.K., Hubley, R., Vogelzang, M., Deutsch, E.W., Raught, B., Pratt, B., Nilsson, E., Angeletti, R.H., Apweiler, R., Cheung, K., Costello, C.E., Hermjakob, H., Huang, S., Julian, R.K., Kapp, E., McComb, M.E., Oliver, S.G., Omenn, G., Paton, N.W., Simpson, R., Smith, R., Taylor, C.F., Zhu, W., and Aebersold, R. (2004) A common open representation of mass spectrometry data and its application to proteomics research. Nature Biotechnology 22(11), 1459–1466. 5. Computational Proteomics Laboratory. msInspect website. Accessed on June 28, 2006 at http://proteomics.fhcrc.org/CPL/msinspect.html.
20 Pattern Recognition Approaches for Classifying Proteomic Mass Spectra of Biofluids Ray L. Somorjai
Summary The statistical classification strategy we have developed for magnetic resonance, infrared, and Raman spectra for the analysis of biomedical data is discussed, particularly as it applies to proteomic mass spectra. A general discussion of the current use of pattern recognition methods is given, with caveats and suggestions relevant for clinical applicability.
Key Words: visualization; preprocessing; feature selection/extraction; robust classifier; classifier aggregation; proteomics; mass spectroscopy; magnetic resonance spectroscopy; biodiagnostics.
1. Introduction Unlike magnetic resonance spectroscopy (MRS), infrared spectroscopy (IRS), and Raman spectroscopy (RS) (1,2,3), proteomic mass spectroscopy (PMS) is a relative newcomer to the field of biodiagnostics. However, with the goal of discriminating various disease and disease states, it is a welcome complementary technique that provides yet another means of analyzing biofluids. In particular, this complementarity extends the range of characterizing biofluids, from vibrational states of specific chemical groups (IRS, RS), through the identification of small molecules (MRS), to proteins and protein fragments (PMS). Being an emerging field, PMS suffers from growing-up pains. In particular, there are experimental difficulties specific to PMS that have yet to be addressed From: Methods in Molecular Biology, vol. 428: Clinical Proteomics: Methods and Protocols Edited by: A. Vlahou © Humana Press, Totowa, NJ
383
384
Somorjai
(see Note 1) (in the following, the author assumes that the spectra, for which classifiers are to be developed, have been properly “processed”). Typically, biomedical data consist of a relatively few (of the order 10–100) samples (patterns) that are initially presented in a very high-dimensional feature space (feature ≡ m/z intensity), with dimensionality L (dimension ≡ features) of order 1000–10,000. Unfortunately, these two characteristics lead to two curses that impede the development of robust classifiers: the curse of dimensionality and the curse of dataset sparsity (3). The consequence of the two curses is that the sample to feature ratio (SFR) is 1/10–1/1000, instead of the minimal 5–10, required for robust classification, as is generally accepted by the machine learning community. In this chapter, the author presents the specific strategy [dubbed statistical classification strategy (SCS)] they have developed over the last dozen years to deal with such problems, particularly as they apply to MR, IR, and Raman spectra. We have been adapting this strategy and applying it with success to biomedical data derived from both proteomics mass spectra and microarrays (see Note 2). The author compares the differences and similarities of the SCS with the proteomics data analysts’ current tools and wherever possible, makes recommendations. 2. The Statistical Classification Strategy Lifting the twin curses of high dimensionality and dataset sparsity requires special approaches. The “strategy” part of the SCS reflects the fact that no single approach is, or can be optimal [“there are no panaceas in data analysis” (4)], and that a data-driven, multistage strategy is necessary or even essential. Using a divide-and-conquer philosophy, the SCS consists of five stages: 1. 2. 3. 4. 5.
Data visualization Preprocessing Feature selection/extraction Robust classifier development Classifier aggregation (ensembles)
The five stages are, of course, intimately interrelated; in particular, we use the visualization stage to constantly monitor how well the other stages of the strategy are working. Figure 1 provides a flowchart of the SCS. A more detailed description of the SCS can be found in (5) (see Note 3). 2.1. Visualization of High-Dimensional Data Proper data visualization is an essential first step that requires dimensionalityreducing mapping/projection from typically a very large, L-dimensional feature
Pattern Recognition for Proteomic Spectra
385
DATA VISUALIZATION PREPROCESSING FEATURE SELECTION / EXTRACTION CLASSIFIER DEVELOPMENT CLASSIFIER AGGREGATION
Fig. 1. Flowchart for the five stages of the SCS.
space to one to three dimensions. Of course, mapping from high dimensions to lower ones cannot preserve all distances exactly, because most of the original degrees of freedom are lost. However, if only class separability is required, exact visualization, our primary goal, is both achievable and sufficient. In fact, we recently proposed such an approach (6). It involves mapping highdimensional patterns to a special plane, the relative distance plane (RDP). The mapping procedure starts with the selection of a distance measure. This can range from Euclidean, city block, maximum norm to Mahalanobis, and its generalization (Anderson – Bahadur, AB) (7). Next, two reference patterns are chosen, one from each class. The critical observation, on which the RDP mapping relies, is that the distance of any other pattern to these two reference points is preserved exactly even after the mapping. This is because a triangle remains a triangle in any dimension and for any distance metric. Hence, the three distances of any such a triangle can be displayed in two dimensions, without distortion. By cycling through all possible reference pairs, we can display and visualize the data with respect to these sets, i.e., from a large number of possible “perspectives” (as an analogy, consider looking at a sculpture from every angle to assess its shape and form), a very powerful approach for detecting outliers (e.g., poor quality spectra), discovering additional subgroups within a class (clustering), assessing whether training and test sets derive from the same distributions, etc., in short, for establishing and ensuring quality control. 2.2. Preprocessing Preprocessing enables the user to adapt, “tune” the data, so that the subsequent stages of the SCS are optimized. For spectra, whether MS or MR, we found that the most useful preprocessing approaches, alone or in combination, are normalization (“whitening,” or scaling to unit area), smoothing (filtering), and/or peak alignment (with respect to some internal or external
386
Somorjai
reference). Various transformations of the spectra lead frequently to better classification. Examples of such transformations include replacing the spectra by their (numerical) derivatives or by rank-ordered variants (the nonlinear rank-ordering replaces the original features by their ranks, thus minimizing the influence of accidentally large or small feature values) and combinations of these. Furthermore, creating differently preprocessed versions of the same dataset, selecting different sets of features from these (stage 3), and developing different classifiers using these feature sets (stage 4) facilitates the aggregation of these multiple classifiers for possibly increased accuracy (stage 5). The achieved classifier’s accuracy and reliability are also assessed by visualization of the results (stage 1). This demonstrates how the strategy uses the stages in an interactive, feedback fashion.
2.3. Feature Selection/Extraction In general, this stage is one of the two most important components of the SCS. It is essential not only for dimensionality reduction (which helps lifting the curse of dimensionality), but, when done properly, also helping to arrive at biologically relevant and transparent interpretations of the data (“biomarker” identification). The driving force behind feature selection/extraction (FSE) is the goal of satisfying one of the two critical requirements for any reliable classifier development, lifting the curse of dimensionality. Spectra, whether mass or MR, are peculiar: their “intrinsic dimensionality,” the number of independent, relevant features they possess, is generally much smaller than their original dimensionality. This is because spectra have many irrelevant features (“noise”), and adjacent features are strongly correlated. Some of these correlated features correspond to spectral peaks, representing small molecules (MRS), or small proteins, protein fragments, or peptides (PMS). Thus, it is clearly beneficial to eliminate irrelevant features and identify discriminatory peaks (potential “biomarkers”). For spectra, principal component analysis, a frequently used dimension reduction method (often the principal tool of many PMS data analysts), is doubly dangerous. First, it “scrambles” the original features, making discriminatory feature identification and selection problematic; second, since the principal components (PCs) are ordered according to the maximum variance explained in the data, there is no guarantee that the first few PCs are discriminatory for classification. Even if one were to choose the first M L PCs from the original, total L-term set, these are rarely the best discriminators. One could try selecting m < M PCs as optimal for classification (e.g., by exhaustive search); our early experience indicates that some of the good discriminators are among the remaining k = M + 1,…,L
Pattern Recognition for Proteomic Spectra
387
subset of PCs. All these difficulties point to the need for a feature selection method specific to spectral data, one that preserves spectral interpretability. There are two generic approaches to feature selection (8). The filter method selects features without consideration of the classifiers to be used with these features. The wrapper (embedding) method finds optimal features, while using the eventual classifier to guide the selection method. We have developed a genetic algorithm-based optimal region selection (GA-ORS) method that finds discriminatory features without loosing spectral interpretability (9). The GA-ORS is based on the wrapper approach and is an example of feature extraction. It has the advantage that the spectral ranges found are averaged over adjacent data points (thus equivalent to peak area determination). Such averaging increases the signal to noise ratio, a bonus. Within the GA-ORS suite of programs, one can also control the widths of the selected spectral subregions (discriminatory peaks); this helps to eliminate those regions that appear to be discriminatory simply because of accidental differences in the “noise” regions due to the limited sample size (9,10). The GA-ORS has been very successful in identifying discriminatory subregions of MR, IR, and Raman spectra of biofluids and tissues, obtained for distinguishing between various diseases and disease states (1). In the context of feature selection, many proteomic mass spectroscopists first identify “relevant” peaks, sometimes in an ad hoc fashion, as possible contributors to discrimination. Although using all available “domain knowledge” is very important and should always be considered when available, it can also introduce bias, because of possible preconceived notions of what is relevant for discrimination. Our feature selection approach, sketched above, removes most of such bias, by identifying hitherto unsuspected, novel discriminatory “peaks,” or more accurately, discriminatory spectral subregions. Furthermore, by its explicit multivariate nature, GA-ORS tends to identify a “fingerprint,” a “panel” of peaks whose simultaneous interaction is necessary for discrimination. When the multidimensional feature space does not arise from spectra, e.g., microarray data or preselected discrete peaks in PMS, for which averaging adjacent features is not meaningful, direct application of the GA-ORS methodology may not be appropriate [although we have used it as a preliminary, clustering-type feature selection “trick” (5)]. However, when possible, exhaustive, or when not, a dynamic programming-based search for optimal or near-optimal discriminatory feature subsets is still feasible and is one of the options available in GA-ORS. Figure 2 demonstrates the importance of feature selection, and the relevance of an interactive, feedback-mode visualization of data. For the two-class, prostrate cancer vs. healthy proteomic (mass spectral) dataset (11), we display a Euclidean distance-based mapping, either directly from the original 15,154
388
Somorjai Prostate Cancer – L2 Mapping from 15,154 Dimensions
5 Dimensions
Fig. 2. Mapping from the original 15,154 dimensions (left panel) misclassified eight samples from the training set (TS; class 1, black disks, class 2, black crosses) and nine from the independent validation (test) set (VS; class 1, grey triangles, class 2, grey squares). The mapping from five dimensions (right panel), classified correctly all TS and the VS samples. The dashed lines shown are the optimal LDA separators.
dimensions (left panel) or from five dimensions, reduced via GA-ORS (right panel). Clearly, the success of class separation depends on the dimensionality of the feature space. When mapping from the original 15,154 dimensions, the optimal two-dimensional separation of training sets (TS; black disks for class 1, black crosses for class 2) and test sets (VS; grey triangles for class 1, grey squares for class 2) misclassify eight samples from the training set and nine from the independent test set. For the mapping from five dimensions, all samples are classified correctly (see Note 4). 2.4. Robust Classifier Development There are two, generally interrelated goals for supervised classifiers. First, we want robust classifiers, i.e., with high generalization power. This is realized when the classifier classifies new, unknown “patterns” correctly and reliably. Second, we want to identify the smallest subset of maximally discriminatory features. Eventual disease management/treatment would benefit from having only a few, biologically relevant and interpretable features. Ideally, both classification goals should be achieved, especially in clinically relevant studies. Unfortunately, achieving the first goal is frequently at the expense of the second. A good example is the recent use of support vector machines (SVMs) for classification. These have become particularly popular because of their
Pattern Recognition for Proteomic Spectra
389
persuasive theoretical foundations (12,13) (see Note 5). However, because the SVMs project the data into even higher dimensional feature spaces to achieve linear separability of the classes, relevant, discriminatory feature identification becomes more difficult. The technical complexity and sophistication of the classifiers used range from the simplest correlation techniques, through k nearest neighbors, linear and quadratic discriminant analysis, decision trees, neural nets, etc., to (nonlinear) SVMs. However, the choice of classifier seems not to be dictated by the data to be classified, but rather by “expert” recommendation (usually based on other types of data), personal experience or preference, or simply software availability. The maxim “simpler is better” has mostly been ignored [see however (14)]. In general, no specific effort has been expended on choosing the most appropriate, optimal type of classifier for a given dataset. With a few exceptions, the proteomics (mass spectroscopy) community tends to use the “best” (i.e., the most sophisticated) classifier, whether appropriate or not! If the dataset size is sufficiently large, then the optimum approach for developing a robust classifier is to partition the data into training set, monitoring set and a completely independent test (validation) set. Such partitioning is required to prevent overfitting. This occurs when the classifier adapts itself too closely to the peculiarities of a training set that comprises a limited number of samples. Using a monitoring set helps decide when to stop training. The ultimate assessment of the classifier’s generalization capability is how well it does on the independent test set that was in no way involved in creating the classifier. Unfortunately, a sufficiently large sample size is a luxury rarely available to the data analysts of biomedical data. The only recourse is to use some version of crossvalidation (CV) (15). CV comes in different flavors, each with its advantages and disadvantages. All of them are designed to deal with the bias introduced by using the entire dataset both to develop the “optimal” classifier and to estimate the classification error (see Note 6). It is important to re-emphasize that because of the typical small sample size of biomedical data, the best approach to robust classifier development is to select the simplest classifier possible. This suggests linear classifiers. Complex classifiers have too many parameters that need optimization, inevitably raising the scepter of overfitting (see Note 7). Dimensionality reduction (FSE) is, of course, essential for obtaining an appropriate SFR. Realizing the role of the SFR is important when developing classifiers. However, an essential caveat is that data sparsity can render any classification result statistically suspect, even if the SFR is satisfied (3). The importance of guaranteeing the appropriate SFR is being recognized. However, the consequences of data set sparsity are still not appreciated (16).
390
Somorjai
The control of disparate sensitivities and specificities produced by classifiers when the dataset is imbalanced has particular clinical relevance (typically, there are many more samples from normal subjects than from patients with particular diseases) and tuning methods are needed for the classifiers developed. The standard method in the pattern recognition literature is either oversampling (taking multiple samples from the sparser class), or undersampling (taking a subset of the samples from the larger class), such that the sample sizes in the two classes become balanced (sensitivity, SE ≈ specificity, SP). However, this approach fails quite frequently. Our approach is based on penalizing misclassification of members of the smaller class until SE ≈ SP (note that the penalty weight is generally not equal to the ratio of the class sizes). 2.5. Classifier Aggregation Clinically relevant classifiers require statistically significant class assignments for the samples. Thus, when a classifier’s assignment probability for a sample is “fuzzy” (e.g., less than 75% for a second class problem) that assignment is not really useful from a clinical point of view. If the overall accuracy of a classifier is low and the assignments are fuzzy, a multiple classifier strategy (classifier aggregation) can frequently be beneficial. The idea is to combine the outputs of several classifiers, with the expectation that the new classifier thus formed will be more accurate and less fuzzy than the best of the individual constituents. One of the requirements for accurate ensemble-based classifiers is diversity. It is believed that the component classifiers should be as different as possible. This can be achieved in several ways. One of these approaches used conceptually and methodologically very different classifiers (Linear Discriminant Analysis (LDA), neural nets, and dynamic programming) on the same, unmodified data (17). However, our more recent experiments and experiences suggest that classifier diversity is not necessarily required. Comparable accuracy can be achieved in a simpler way, by employing a single, simple classifier (e.g., LDA) and producing diversity using different transformations of the data (we have already discussed some of these in the context of feature selection). How are we to combine the outcomes of the various classifiers? Some of the combinations range from the simple majority rule to more complex, trainable rules, e.g., stacked generalization (SG) (18). SG uses the output probabilities of the constituent classifiers as input features for a new classifier. Boosting (19) is a very powerful version a learnable classifier combination rule (see Note 8). It was used for identifying proteomic biomarkers for cancer detection (20). There are many classifier combination rules. When choosing such a rule, it is important to take into account both sample size and classifier complexity.
Pattern Recognition for Proteomic Spectra
391
3. Discussion Of course, experimental quality control is essential for good classifiers, i.e., those that have useful generalization properties. Much has been made of the “surprising” observation that different (or even the same) experimental groups, using different classifiers end up with totally different sets of discriminatory features (21). These are ascribed to various possible experimental differences in the spectral acquisition, etc. (22,23,24). Although these are indeed significant contributing factors, and must be considered and corrected, sight is lost of the important fact that when nonunique discriminatory sets are found, they are as likely caused by dataset sparsity (3) as by differences in experimental protocols. The initial euphoria is over: one cannot (or should not be able to) publish in prestigious journals (e.g., Science, Nature, Lancet, PNAS, etc.) proteomic results based on very limited sample sizes. Furthermore, even when there are enough data to produce a respectable classifier, high-impact journals are unlikely to accept a manuscript unless the results are independently validated. In particular, the chemical/biological identification of the discriminatory proteins, protein fragments, or peptides must accompany the classification results. This increased focus on establishing the clinical relevance of putative biomarkers is definitely a good sign. However, at this stage of the game, it is possibly premature, and one would prefer first to have a quick, noninvasive, reliable diagnostic/prognostic tool. To be clinically relevant, many more samples are required to develop such a tool (i.e., a sufficiently robust classifier; this requirement will likely rule out the reliable detection of rare diseases). Unfortunately, currently available sample sizes preclude the discovery of unique biomarker “fingerprints” of a disease. This nonuniqueness due to data sparsity leads inevitably to expensive, onerous, and unnecessary laboratory investigations to sift out medically relevant, unique subsets from the plethora of putative biomarkers found and suggested for various diseases. Understanding the biochemical causes is, of course, essential for, say, finding a possible cure, but should succeed the diagnostic/prognostic stage. Despites such caveats, the proteomics field is maturing and once the technical problems are successfully resolved, will undoubtedly provide important medical/clinical insights. The author further suggests that the power of proteomic spectroscopy can be enhanced by the simultaneous consideration of other experimental modalities that complement PMS, especially MRS, which could identify smaller discriminatory compounds also present in biofluids. 4. Notes 1. Amongst these are correcting the nonflat baselines arising from the matrix material, peak alignment of the spectra, reconciling data acquisition at different times, in different laboratories, with mass spectrometers of different sensitivity,
392
2.
3.
4.
5.
6.
Somorjai correcting high frequency noise, etc. Proper experimental design, including rigorous quality assessment and control is essential before any classifier development is attempted. Good discussions and summaries are given in (21,22,23,24). The realization that some classification strategy is essential for the analysis of proteomic data is recent. That these strategies are different emphasizes that not only there is no best classifier, but also that no unique, best strategy exits either; different groups discovered different strategies that worked well for the data they analyzed (20,25). What common is that all strategies are multistage. The data-driven nature of the SCS emphasizes the fact that there is no simple, universal prescription for creating an optimal classifier (4), i.e., no simple, ready “recipe” is or likely to be available. This much-improved result strengthens the importance of feature selection. Note that both mappings were done using the Euclidean distance, necessary, because one cannot use any other distance measure (e.g., Mahalanobis) that involves matrix inversion. After feature selection, when the number of features is fewer than the number of samples, much more powerful and relevant distance measures can be used. For a fair comparison, the Euclidean distance is used for both cases presented in Fig. 2 [for further possible improvements obtainable using other distance measures see (6)] In practice, SVMs are not nearly as effective as suggested by theory. In fact, we have found (26) that a simple LDA classifier, with wrapper-driven feature selection, when applied to several publicly available proteomic mass spectra, and to six microarray datasets, generally outperformed a linear SVM, even when the latter was used with feature selection. Furthermore, SVM-based classifiers frequently produce classification results that are distinctly out of balance. The accuracy obtained for one of the classes is most of the time considerably better. This imbalance between sensitivity and specificity is of clinical relevance when trying to minimize false negatives and/or false positives. Different variants of CV deal differently with the so-called bias-variance dilemma, particularly acute for datasets with limited sample size. The simplest version, the leave-one-out (LOO) method, removes one of the N samples, develops a classifier with the remaining N – 1 samples, and tests its prediction accuracy on the left-out sample. By cycling through all N samples, N accuracy assessments are found. For small N (for which the data partition, as described in the main text, is not possible), LOO suffers from large variance, even though it minimized the bias. K-fold CV is frequently used to balance bias and variance. The samples are partitioned into K roughly equal subsets. K – 1 subsets are used for training the classifier, while the leftout subset is the current test set. Cycling through the K partitions and then calculating the mean and standard deviation of the accuracies over the K test sets assess how well and how reliably one is expected to classify new, unknown samples. K is typically chosen to be 5 or 10, whether or not the sample size warrants this choice. A more reasonable approach is to determine the best K via CV. Particularly, powerful is Efron’s bootstrapping approach (15). This involves the entire dataset, but uses a random resampling with replacement strategy. A large number of artificial datasets
Pattern Recognition for Proteomic Spectra
393
of the same size as the original are thus produced. A classifier is created for each of these, and the outcomes are averaged. Bootstrapping is supposed to reduce both large bias and variance. Inspired by the bootstrapping concept, we have been using, with some success, its generalization (27). 7. Instead of the direct use of nonlinear classifiers, with the attendant optimization problems, a simple trick is to use nonlinear terms but retain the simplicity of a linear classifier. One approach we found useful is to first develop a linear classifier (with feature selection) and then augment the linear features by constructing from them nonlinear functions, say, quadratic terms. This, of course, increases the number of parameters to be determined. However, the problem remains linear in the augmented feature space and linear classifiers can be developed. Furthermore, our explicit approach produces new features that remain interpretable as interaction terms. This is unlike the SVM classifiers that map implicitly into a much higher dimensional linear feature space, without interpretability. In addition, we can reduce the dimensionality of our augmented feature space by additional feature selection via exhaustive search, optimized by CV. 8. Boosting requires “weak” base classifiers, Cj , j = 1,2,…,j that are combined into a more accurate composite classifier, Dj = C1 + C2 + … = Cj . At stage m, the boosting algorithm carries out a weighed selection of a base classifier, given all previously chosen base classifiers. For the new base classifier Cm , larger weights are given to samples that are incorrectly classified by the current composite classifier Dm−1 so that Cm will be chosen with a tendency to correctly classify previously incorrectly classified samples.
Acknowledgments The author thanks the entire Biomedical Informatics Group for their decadelong, essential contributions to the development of the algorithms and softwares described. References 1. Lean, C. L., Somorjai, R. L., Smith, I. C. P., Russell, P., Mountford, C. E. (2002) Accurate diagnosis and prognosis of human cancers by proton MRS and a three stage classification strategy. Annual Reports on NMR Spectroscopy 48, 71–111. 2. Somorjai, R. L., Dolenko, B., Nikulin, A., Nickerson, P., Rush, D., Shaw, A. et al. (2002) Distinguishing normal from rejecting renal allografts: application of a threestage classification strategy MR and IR spectra of urine. Vibrational Spectroscopy 28, 97–102. 3. Somorjai, R. L., Dolenko, B., Baumgartner, R. (2003) Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: curses, caveats, cautions. Bioinformatics 19, 1484–1491. 4. Huber, P. J. (1985) Projection pursuit. Ann. Statistics 13, 435–475.
394
Somorjai
5. Somorjai, R. L., Alexander, M., Baumgartner, R., Booth, S., Bowman, C., Demko, A., Dolenko, B., Mandelzweig, M., Nikulin, A. E., Pizzi, N., Pranckeviciene, E., Summers, R., Zhilkin, P. (2004) A data-driven, flexible machine learning strategy for the classification of biomedical data. In: Dubitzky, W. and Azuaje, F. (eds.) Artificial Intelligence Methods and Tools for Systems Biology, Chapter 5. Computational Biology Series, Vol. 5. Springer, pp. 67–85. 6. Somorjai, R. L., Demko, A., Mandelzweig, M., Dolenko, B., Nikulin, A. E., Baumgartner, R. et al. (2004) Mapping high-dimensional data onto a relative distance plane – a novel, exact method for visualizing and characterizing highdimensional patterns. Journal of Biomedical Informatics 37, 366–379. 7. Anderson, T. W., Bahadur, R. R. (1962) Classification into two multivariate normal distributions with different covariance matrices. Annals of Mathematical Statistics 33, 420–431. 8. Kohavi, R., John, G. H. (1997) Wrappers for feature subset selection. Artificial Intelligence 273–324. 9. Nikulin, A. E., Dolenko, B., Bezabeh, T., Somorjai, R. L. (1998) Near-optimal region selection for feature space reduction: novel preprocessing methods for classifying MR spectra. NMR in Biomedicine 11, 209–217. 10. Li, J., Zhang, Zh., Rosenzweig, J., Wang, Y. Y., Chan, D. W. (2002) Proteomics and bioinformatics approaches for identification of serum biomarkers to detect breast cancer. Clinical Chemistry 48, 1296–1304. 11. Dataset “JNCI-7-3-02,” downloaded from the NIH/FDA Clinical Proteomics Program Databank (http://clinicalproteomics.steem.com). 12. Vapnik, V. N. (2000) The nature of statistical learning theory, 2nd edition, Statistics for Engineering and Information Science. Springer, New York. 13. Schölkopf, B., Smola, A. J. (2002) Learning with Kernels. Support Vector Machines, Regularization, and Beyond. The MIT Press, Cambridge, Mass. 14. Lee, K. R., Lin, X., Park, D. C., Eslava, S. (2003) Megavariate data analysis of mass spectrometric proteomics data using latent variable projection method. Proteomics 3, 1680–1686. 15. Efron, B. (1982) The Jackknife, the Bootstrap and Other Resampling Plans. SIAM, Philadelphia. 16. Diamandis, E. P. (2003) Proteomic patterns in biological fluids: do they represent the future of cancer diagnostics? Clinical Chemistry 49(8), 1272–1278. 17. Somorjai, R. L., Nikulin, A. E., Pizzi, N., Jackson, D., Scarth, G., Dolenko, B., Gordon, H., Russel, P., Lean, C. L., Delbridge, L., Mountford, C. E., Smith, I. C. P. (1995) Computerized consensus diagnosis: a classification strategy for the robust analysis of MR spectra. I. Application to 1 H spectra of thyroid neoplasms. Magnetic Resonance in Medicine 33, 257–263. 18. Wolpert, D. H. (1992) Stacked generalization. Neural Networks 5, 241–259. 19. Schapire, R. R. (1990) The strength of weak learnability. Machine Learning 5, 197–227. 20. Yasui, Y., Pepe, M., Thomson, M. L., Adam, B.-L., Wright Jr., G. L., Qu, Y., Potter, J. D., Winget, M., Thornquist, M., Feng, Z. (2003) A data-analytic strategy
Pattern Recognition for Proteomic Spectra
21. 22.
23.
24.
25.
26. 27.
395
for protein biomarker discovery: profiling of high-dimensional data for cancer detection. Biostatistics 3, 449–463. Diamandis, E. P. (2004) Mass spectrometry as a diagnostic and a cancer biomarker discovery tool. Molecular and Cellular Proteomics 3(4), 367–378. Baggerly, K. A., Morris, J. S., Coombes, K. (2004) Cautions about reproducibility in mass spectrometry patterns: joint analysis of several proteomic data sets. Bioinformatics 20, 777–785. Hu, J., Coombes, K. R., Morris, J. S., Baggerly, K. A. (2005) The importance of experimental design in mass spectrometry experiments: some cautionary tales. Briefings in Functional Genomics and Proteomics 3(4), 322–331. Shin, H. and Markey, M. K. (2006) A machine learning perspective on the development of clinical decision support systems utilizing mass spectra of blood samples. Journal of Biomedical Informatics 39, 2237–2248. Zhu, W., Wang, X., Ma, Y., Rao, M., Glimm, J., Kovach, J. S. (2003) Detection of cancer-specific markers amid massive mass spectral data. Proceedings of National Academic Science USA 100(25), 14666–14671. Somorjai, R. L. and Pranckeviciene, E. (2006) (Unpublished). Somorjai, R. L., Dolenko, B., Nikulin, A., Nickerson, P., Rush, D., Shaw, A., De Glogowski, M., Rendell, J., Deslauriers, R. (2002) Distinguishing normal from rejecting renal allografts: application of a three-stage classification strategy to MR and IR spectra of urine. Vibrational Spectroscopy 28, 97–102.
Index
Affi-gel Protein A MAPS II kit, 277 Aflatoxin B1 (AFB1), 194 Alkaline phosphatase (ALP) assay, 233, 237 Alpha-fetoprotein, 194 Alzheimer’s disease, 310 Annexin V, 172 ANOVA, analysis of variance, 100, 112, 114, 259, 330, 335, 344 Antibody arrays construction, 270–272 direct labeling methods, for cancer diagnostics, 268–269 formats for, 264–266 labeling and hybridization, of serum samples, 269–270, 272–274 and other proteomic strategies, 263–264 planar, labeling-hybridization methods and, 266–268 printing, 269 scanning and data analysis, 274 Anti-SAPE antibody, 267 ArrayQuant scanners, 281 AutoPixTM , 48. See also Laser-capture microdissection Axon scanners, 281
Bayesian classification methods. See Linear Discriminant Analysis Bayes’s rule, 300 BCA 200 Protein Assay Kit, 277 Bead-based multiplex assays. See also Suspension antibody microarrays detection antibody, 254 diluents, 254 general protocol for, 254–255 sample preparation, 252–254 screening protocol, 255–256 Biological variation analysis (BVA) module, of DeCyder, 112–113 “Biomarker panel,” 11 Bio-Rad Micro Bio-Spin P30 column, 277 Biotinyl-tyramide, 275
BLAST, 352, 358 Blood samples, preanalytical phase collection of, 36 processing of, 37–38 protease inhibitors, 38 serum and plasma specimens, characteristics of, 36–37 Bradford assay, 225
Carboxylated beads, 249. See also Suspension antibody microarrays activation, 251 antibodies coupling to activated, 251 cell-counting chamber and, 252 washing and storage of coupled, 251 1-(5-Carboxypentyl)-1-methylindodi-carbocyanine halide (Cy5) N-hydroxy-succinimidyl ester, 163 1-(5-Carboxypentyl)-1-propylindocarbocyanine halide (Cy3) N-hydroxy-succinimidyl ester, 163 CAST. See Clustering Affinity Search Technique Celecoxib, and cyclooxygenase-2 (COX-2), 183 Charge-couple device (CCD) camera-based imaging system, 268, 293, 332 CIMminer (Clustered Image Maps), 259 Cleavable isotope-coded affinity tag (cICAT) labeling technology, 195, 197, 200–201 Clinical proteomics, 1 biological specimens, 6–7 biomarker discovery and, 9–14 overview and scope of, 2–3 sample specimens and processing techniques, 4–9 Cluster analysis techniques, 297–299, 306 gene expression-based, 307 Clustering Affinity Search Technique, 259 Coomassie brilliant blue (CBB) staining, 68, 332, 339 Creatinine assay, 142 Cyanines (Cy3/Cy5), 264, 333 Cyclooxygenase-2 (COX-2) and celecoxib, 183
397
398 CyDye labeling, 95, 105–106, 109–110. See also Difference gel electrophoresis (DIGE) technology Cy2-labeled internal standard, 98–99 minimal labeling method, 96 pooled-sample internal standard for, 107 saturation labeling, 96 Cy3-labeled streptavidin, 267 Cytokeratin 19 (CK19), 163 DA-PLS method. See Discriminant analysis–partial least squares method DeCyder software, 101, 112–113, 342. See also Difference gel electrophoresis (DIGE) technology Delayed extraction-matrix assisted laser desorption/ionization time-of-flight mass spectrometry (DE-MALDI-TOF-MS), 194 Dendrogram, 297, 299 Dialysis, 150. See also Urine protein profiling, by 2DE and MALDI-TOF-MS Difference gel electrophoresis (DIGE) technology, 78, 93, 330, 332–333, 342–345 ANOVA, 100, 112, 114 in clinical setting, 103 CyDye labeling, 95, 105–106, 109–110 Cy2-labeled internal standard, 98–99 minimal labeling method, 96 pooled-sample internal standard for, 107 saturation labeling, 96 DeCyder suite of software tools, 101, 112–113 2D gel electrophoresis and poststaining, 94, 110–111 experimental design, 108–109 and statistical confidence, 112–114 extended data analysis (EDA) software module, 101, 113 false discovery rate (FDR), 100 hierarchical clustering (HC), 102 labeling materials, 104–105 LCM and, 163–170 MeOH/CHCl3 protocol, 106 MuDPIT, 97 multivariate statistical analysis, 114–115 principle component analysis, 101 SDS-polyacrylamide gel electrophoresis, 104 software algorithms, 111–112 Student’s t-test, 100, 112, 114 DIGE/MS analysis, 103, 115 Direct labeling, 264, 268 protocol for, 272–274 Discriminant analysis–partial least squares method, 306, 309–311
Index Discrimination power (DP), 303–305 Dithiothreitol (DTT), 68 Dot-plot style alignment, of protein sequence, 358–359 DTT/IAA equilibration procedure, 73 ECM. See Extracellular matrix EDA software. See Extended data analysis software EDC/Sulfo-NHS, 249. See also Suspension antibody microarrays 2DE-MALDI-TOF-MS assay, 194 EnsEmbl, 352, 356 Escherichia coli, 307 Ethylene vinyl acetate (EVA) polymer, 161 Ettan 2D electrophoresis system, 110 Exosomes, 142 ExPASy proteomics tools, 202, 352 Expressed sequence tags (ESTs), 357 Extended data analysis software, 101, 113 Extracellular matrix, 8 and matrix vesicles (MVs) proteomes, MS and, 231–232 alkaline phosphatase assay, 234, 237 immunofluorescence staining and, 235, 239 MC3T3-E1, osteoblast cell line, 233, 236–237, 239 nanoRPLC-MS/MS, 235, 238–239 strong cation exchange liquid chromatography, of peptides, 234–235, 238 Extracted ion chromatogram, 219, 221–222, 224 Fetal bovine serum (FBS), 254 Fisher’s F-test, 302 Flow cytometric analysis, 160 Fluorophores, 264, 267 photobleaching and quenching of, 274–275 Fourier transformer mass spectrometry (FTMS), 172–174 Free flow electrophoresis (FFE), plasma samples fractionation and, 60–61, 67 Frontotemporal dementia, 310 GAORS method. See Genetic algorithm-based optimal region selection method 2D Gaussian function, 312 Gaussian multivariate probability distribution, 300 2-D Gel-electrophoresis (2-D GE), 292. See also 2D-PAGE maps analysis LCM cells analysis by, 77 HER-2/neu positive and -negative breast tumors, 87–88
Index isoelectric focusing (IEF), 79–80, 83–84 MASCOT search engine, 87 paraffin-embedded sections staining, 81–82 preparation and analysis, 61, 67–69 protein sample preparation, 79, 82–83 SDS-PAGE, 79–80, 84–85 silver staining and image analysis, 80, 85–86 tissue block and tissue section preparation, 78–79, 81 trypsin digestion and MS analysis, 80, 86–87 Gel-free mass spectrometry and LCM, 171–172 Gene expression microarrays, 45 GenePix Pro 3.0 software program, 280–281 GeneScan program, 356 Genetic algorithm-based optimal region selection method, 387–388. See also Proteomic mass spectroscopy gp96, tumor rejection antigen, 169 GRANTA-519, 308
HCC. See Hepatocellular carcinoma HCL. See Hierarchical clustering Hematoxylin and eosin (H&E) staining, tissue sample collection, 44, 47–48 Hepatitis B/C virus (HBV/HCV), 194 Hepatocellular carcinoma, 8, 11, 59, 67, 163, 170, 193 qualitative and quantitative proteomic analysis of cICAT labeling technology, 195, 197, 200–201 2DE-MALDI-TOF-MS assay, 194 2D-LC-MS/MS for, 195–197, 201–202 ExPASy proteomics tools, 202 LCM for, 194–196, 199 nonenzymatic method (NESP), 196, 198–199 toludine blue removal and protein mixture digestion, 197, 199–200 HERMeS software package, PCA and, 306 HER-2/neu oncogene, 85–86, 163 Hierarchical clustering, 259, 299. See also Cluster analysis techniques High performance liquid chromatography, 169, 171, 183, 212–214 Horseradish peroxidase (HRP), 267 HPLC. See High performance liquid chromatography HSP27 protein, 103 HT-29, COX-2 expressing colon cancer cell line, 183 Human Proteome Organization, 143 Hydrogels, 271. See also Antibody arrays
399 ICAT labeling. See Isotope-coded affinity tag labeling IMAC-Cu2+ ProteinChips, 134, 136 Image analysis. See also 2D-PAGE maps analysis by fuzzy logic principles image defuzzyfication, 312 image digitalization, 311–312 multi-dimensional scaling (MDS), 315–317 PCA and classification methods, 315 refuzzyfication, 312–313 moment functions, 317 Legendre moments, 318–319 Image Master Platinum software, 339, 341 Immobilized pH gradient strip. See also Two-dimensional electrophoresis (2DE) isoelectric focusing (IEF) with, 60, 65 rehydration of, 64–65 Immunofluorescence staining, 235 InterPro, 352, 361 Iodoacetamide (IAA), 68 IPG strip. See Immobilized pH gradient strip Isotope-coded affinity tag labeling, 78, 195 mass spectrometry (MS) and, 181 celecoxib, cyclooxygenase-2 (COX-2) and, 183 cell culture and harvest, 183, 186 cell lysis, desalting, and protein quantitation, 184–187 cleavable reagents, 182, 185, 187–188 cleaving biotin, 186, 189 labeled peptides purification, 185–186, 188–189 proteins, denaturation and reduction of, 185, 187 quantitative proteomic analysis and, 184 Java Runtime Environment, 370. See also msInspect, for LC-MS data analysis KMC (K-Means/K-Medians Clustering), 259 Kolmogorov–Smirnov test, 335, 339, 341 Kruskal–Wallis test, 335 Laser-capture microdissection, 8, 44–45, 160. See also Tissue sample collection, for proteomics analysis AutoPixTM , 48 cells analysis, by 2-D GE, 77 HER-2/neu positive and -negative breast tumors, 87–88 isoelectric focusing (IEF), 79–80, 83–84
400 MASCOT search engine, 87 paraffin-embedded sections staining, 81–82 protein sample preparation, 79, 82–83 SDS-PAGE, 79–80, 84–85 silver staining and image analysis, 80, 85–86 tissue block and tissue section preparation, 78–79, 81 trypsin digestion and MS analysis, 80, 86–87 development, 161 different labeling techniques and, 170 DIGE and, 163–170 and 2-D GE, 162–163 gel-free mass spectrometry and, 171–172 for HCC and non-HCC hepatocytes isolation, 194–195, 199 LCM lysate, 49–50 and mass spectrometry analysis, 172–174 PixCell II instrument, 48–49, 161 and protein chip technology, 172 separation methods and, 171 for tissue sample collection, 44–45 VeritasTM , 48 Laser microdissection and pressure catapulting, 8 LC-ESI-MS/MS. See Liquid chromatography-electrospray ionization tandem mass spectrometry LCM. See Laser-capture microdissection LC-MS data. See Liquid chromatography-mass spectrometry data LC-MS/MS. See Liquid chromatography-tandem mass spectrometry LDA. See Linear Discriminant Analysis Legendre moments, 317–319 Levene’s test, 334 Linear Discriminant Analysis, 300–301, 315–316 Liquid chromatography-mass spectrometry data, 370, 374–376, 377 Liquid chromatography-mass spectrometry data analysis, msInspect for, 369 data viewing and navigation, 371–373 locating peptides in, 373–376 low-quality peptides, elimination of, 376 peptide quantitation, 376–378 software installation for, 370 Liquid chromatography-tandem mass spectrometry, 170, 171 label-free, for biomarker identification, 209–210 albumin/IgG depletion, 211–213 chromatographic alignment, 218–221 data transformation and normalization, 222 HPLC, 212–214 mass spectrometer, 212, 214
Index MS/MS spectral filtering, 216–217 peptide identification, 217–218 peptide quantification, 221–222 statistical analysis, 223 zoom scan data processing, 214–216 LMPC. See Laser microdissection and pressure catapulting two-dimensional (2D-LC/MS/MS), 78 Lysine labeling, 169 MALDI/SELDI protein profiling, of serum, 125–126 on MALDI-TOF–TOF data collection, 131–132 MB fractionation, of human serum, 131 protein identification by, 132–133 MB-based fractionation, 127, 128, 131 SELDI and MALDI spectra acquisition, 129 SELDI ProteinChip, 130 (Magnetic bead based) on SELDI-TOF, 133 ProteinChip arrays, 134–135 SPA matrix addition, 135 spectra collection on, 135–138 MALDI-TOF-MS. See Matrix-assisted laser desorption time of flight mass spectrometry MALDI-TOF, peptide mass fingerprinting (PMF) and, 62, 71 MALDI-TOF–TOF, serum protein profiling on data collection, 131–132 MB fractionation, of human serum, 131 protein identification by, 132–133 Maleimide labeling, of cysteine sulfhydryls, 96 MARS. See Multiple affinity removal system MASCOT software, 81, 87–88 Mass spectrometry, 58–59, 214 ICAT labeling and, 181 celecoxib, cyclooxygenase-2 (COX-2) and, 183 cell culture and harvest, 183, 186 cell lysis, desalting, and protein quantitation, 184–187 cleavable reagents, 182, 185, 187–188 cleaving biotin, 186, 189 labeled peptides purification, 185–186, 188–189 proteins, denaturation and reduction of, 185, 187 quantitative proteomic analysis and, 184 LCM and, 172–174
Index Matrix-assisted laser desorption time of flight mass spectrometry, 125–126, 142, 163, 194 LCM and, 171 for urine protein profiling. See Urine protein profiling, by 2DE and MALDI-TOF-MS MAVER-1 cell lines, 308 MC3T3-E1, osteoblast cell line, 233, 236–237, 239 MDS technique. See Multi-dimensional scaling techniques MeOH/CHCl3 protocol, 106 Metalloproteins, 350 MicroSol-IEF, ZOOM® , 60, 65–66 Miniaturized parallelized sandwich immunoassays. See Suspension antibody microarrays MS. See Mass spectrometry MS-Fit software, 81 msInspect, for LC-MS data analysis, 369 data viewing and navigation, 371–373 locating peptides in, 373–376 low-quality peptides, elimination, 376 peptide quantitation, 376–378 software installation for, 370 MS/MS spectral filtering, 216–217 Multi-dimensional scaling techniques, 313, 315–317 MultiExperiment Viewer (MeV), 259 Multiple affinity removal system, 59, 63–64 Multiplexed bead-based flow-cytometry assays, 266 Nanoflow reversed-phase LC-tandem mass spectrometry (nanoRPLC-MS/MS), 233, 235, 238–239 Non-enzymatic sample preparation (NESP), 194, 196, 198–199 One-antibody label-based assays, 264–266 One-dimensional liquid chromatography coupled with tandem mass spectrometry (1D-LC-MS/MS), 201–202. See also Hepatocellular carcinoma 16 O/18 O isotopic labeling, 78 Osteoblasts, 232. See also Extracellular matrix MC3T3-E1, 233, 236–237, 239 2D-PAGE maps analysis, 291 dedicated software packages and, 292–294 image analysis fuzzy logic, 311–317 moment functions, 317–319 spot volume datasets, analysis of, 294 cluster analysis, 297–299 DA-PLS method, 309–311
401 linear discriminant analysis, 300–301 pattern recognition methods, 306–309 PLS regression and DA-PLS regression, 306 principal component analysis, 294–297 SIMCA method, 301–305 PALM microlaser dissector, 161 Parkinson’s disease, 310 Partial least squares regression, 306, 308, 338 Pattern recognition methods cluster analysis. See Cluster analysis techniques PCA. See Principle component analysis proteomic mass spectroscopy and. See Proteomic mass spectroscopy SIMCA classification. See Soft-independent model of class analogy method PCA. See Principle component analysis PCa-24 protein, in epithelial cells, 172 PDB. See Protein data bank PDQuest system, 293, 308 Peptide mass fingerprinting, MALDI-TOF and, 62, 71 Peptide/protein separation system, 171 PerkinElmer scanners, 281 Pfam, 352, 360 PIN. See Prostatic intraepithelial neoplasia PIVKA-II, 194 PixCell II system, 48–49, 77, 82–83, 161. See also Laser-capture microdissection Planar antibody arrays, 248, 264. See also Antibody arrays main formats of, 265 types of, labeling-hybridization methods and, 266–268 10plex soluble receptor assay, 255–256, 258. See also Bead-based multiplex assays PLS regression. See Partial least squares regression PMF. See Peptide mass fingerprinting PMS. See Proteomic mass spectroscopy Position-specific scoring matrix, 361 Post-translational modification (PTM) profiling, on selected spots, 71–72 Principle component analysis, 101, 259, 294–297, 308, 315–316, 343. See also 2D-PAGE maps analysis Escherichia coli, 307 for explorative data analysis, 336–338 in HERMeS software package, 306 U937 human lymphoma cell line and, 307 Prostatic intraepithelial neoplasia, 44 Protein chip technology and LCM, 172 Protein data bank, 352, 360–361 Protein precipitation, 143–144
402 Protein profiling of human plasma samples , by two-dimensional electrophoresis, 57 coomassie brilliant blue G-250 staining, 68 destaining, in-gel deglycosylation and in-gel tryptic digestion, 61–62, 69 2D gels preparation and analysis, 61, 67–69 difference in gel electrophoresis (DIGE) system, 59 free flow electrophoresis (FFE), samples fractionation by, 60–61, 67 high-abundance proteins depletion, by immunoaffinity column, 59, 63–64 HPPP, 58 IPG gel strip rehydration, 64–65 isoelectric focusing (IEF), with IPG strip, 60, 65 MALDI plating and peptides desalting, 62, 69–71 mass spectrometry (MS), 58–59 microscale solution isoelectric focusing, ZOOM® , 60, 65–66 peptide mass fingerprinting, MALDI-TOF and, 62, 71 PTMs profiling, on selected spots, 71–72 samples preparation, 59, 62 TCA/acetone precipitation, 64 Proteomic data, statistical analysis, 327 classical dyes, 339–342 confirmatory univariate data analysis, 333–335 DIGE approach, 342–345 experimental design for, 328 data processing, 330–333 pooling, 330 replicates, 329–330 exploratory multivariate data analysis, 335 marker selection, 338–339 principal component analysis, 336–338 Proteomic mass spectroscopy, 383 statistical classification strategy (SCS) for classifier aggregation, 390 data visualization, 384–385 feature selection/extraction (FSE), 386–388 preprocessing, 385–386 robust classifier development, 388–390 Proteomics analysis, for tissue sample collection formalin fixation, 43–44 hematoxylin staining, 47–48 immunocapture procedure, 46 immunofluorescence staining, 48 laser-capture microdissection (LCM), 44–45 AutoPixTM , 48 PixCell II instrument, 48–49
Index VeritasTM , 48 LCM lysate, 49–50 SELDI-TOF-MS, 46 PSSM. See Position-specific scoring matrix QTC (QT CLUST), 260 Resonance light scattering (RLS), 268 Reverse protein arrays, 268 Rolling-circle amplification (RCA), 268 SCX-LC. See Strong cation exchange liquid chromatography SDS-PAGE. See Sodium dodecyl sulfate-polyacrylamide gel electrophoresis SELDI. See Surface-enhanced laser desorption/ionization SELDI-TOF. See Surface-enhanced laser desorption/ionization time-of-flight Self Organizing Maps (SOM), 259 Self Organizing Tree Algorithm (SOTA), 259 Shapiro-Wilk test, 334, 339 Significance Analysis of Microarrays (SAM), 259 Silver staining, 80, 332–333. See also Laser-capture microdissection and image analysis, 85–86 SIMCA method. See Soft-independent model of class analogy method SKBR-3, breast cancer cell line, 171 Sodium dodecyl sulfate-polyacrylamide gel electrophoresis, 84–85, 94, 96, 104, 110–111 isoelectric focusing (IEF) and, 79–80 PROTEAN II xi Cell system (Bio-Rad) for, 84 Soft-independent model of class analogy method, 301–305, 307–308 Streptavidin-R-Phycoerythrin (SAPE), 267 Strong cation exchange liquid chromatography, 234–235, 238 Strong cation exchange liquid chromatography, of peptides, 233, 234–235, 238 Student’s T-test, 334 2-(4-Sulfophenylazo)-1,8-dihydroxy-3,6naphthalenedisulfonic acid (SPADNS), 60, 67 Support vector machines, 388–389. See also Proteomic mass spectroscopy Surface-enhanced laser desorption/ionization, 9, 13, 125–126, 142, 172, 194 serum protein profiling on, 133 ProteinChip arrays, 134–135 SPA matrix addition, 135 spectra collection on, 135–138
Index Suspension antibody microarrays, 247–248 bead-based multiplex assays processing, 252–256 limit of detection (LOD), 257 miniaturized multiplexed protein assays, analytical performance, 256–259 pattern generation, 259–260 principle of, 249 production, coupling to carboxylated microspheres, 249–252 SVMs. See Support vector machines
TAAs arrays. See Tumor-associated antigen arrays TCA/acetone precipitation, 2DE and, 64 Tissue sample collection, for proteomics analysis formalin fixation, 43–44 hematoxylin staining, 47–48 immunocapture procedure, 46 immunofluorescence staining, 48 laser-capture microdissection (LCM), 44–45 AutoPixTM , 48 PixCell II instrument, 48–49 VeritasTM , 48 LCM lysate, 49–50 SELDI-TOF-MS, 46 Tributylphosphine (TBP), 68 Trichloroacetic acid (TCA) precipitation, 143–144, 146–147, 151 Trifluoroacetic acid (TFA), 182 Tris buffer, 277 TTEST (T-tests), 259 Tumor-associated antigen arrays, 266, 269 Two-dimensional electrophoresis (2DE), 11, 194, 328 biological replicates, 329–330 LCM and, 162–163 for protein profiling of human plasma samples, 57 coomassie brilliant blue G-250 staining, 68 destaining, in-gel deglycosylation and in-gel tryptic digestion, 61–62, 69 2D gels preparation and analysis, 61, 67–69 difference in gel electrophoresis (DIGE) system, 59 free flow electrophoresis (FFE), samples fractionation by, 60–61, 67 high-abundance proteins depletion, by immunoaffinity column, 59, 63–64 HPPP, 58 IPG gel strip rehydration, 64–65 isoelectric focusing (IEF), with IPG strip, 60, 65
403 MALDI plating and peptides desalting, 62, 69–71 mass spectrometry (MS), 58–59 microscale solution isoelectric focusing, ZOOM® , 60, 65–66 peptide mass fingerprinting, MALDI-TOF and, 62, 71 PTMs profiling, on selected spots, 71–72 samples preparation, 59, 62 TCA/acetone precipitation, 64 technical replicates, 329–330 for urine protein profiling. See Urine protein profiling, by 2DE and MALDI-TOF-MS Two-dimensional fluorescence difference gel electrophoresis (2-D DIGE), 78 see also Difference Gel electrophoresis (DIGE) technology Two-dimensional liquid chromatography tandem mass spectrometry (2D-LC-MS/MS), 78, 170 see also liquid chromatography tandem mass spectrometry for HCC and non-HCC hepatocytes isolation, 195–197, 201–202 Two-dimensional polyacrylamide gel electrophoresis (2D PAGE), 162–163, 174 see also 2D gel electrophoresis, 2D gels Two-factor ANOVA (TFA), 259 Ultrafiltration technique, 144 Urine protein profiling, by 2DE and MALDI-TOF-MS, 141–142 analytical/profiling techniques, 145–146 organic solvent precipitation protocol, 145, 147–148 protein precipitation, 143–144 TCA/acetone precipitation protocol, 145–147 ultrafiltration-SPE, 144–145, 148–149 urine SPE, 149 VeritasTM , 48. See also Laser-capture microdissection Web-based tools, for protein classification, 349 BLAST, 352, 358 dot-plot style alignment, of protein sequence, 358–359 EnsEmbl, 352, 356 evolution-based classification schemes, 351 ExPASy, 352 expressed sequence tags (ESTs), 357 GeneScan program, 356
404 InterPro, 352, 361 MEROPS, 361 metalloproteins, 350 PDB, 352, 360–361 Pfam, 352, 360 PRINTS, 361 PROSITE, 361 sequence and structure of proteins and, 352–356
Index SMART, 360 Western blotting protocols, 275 XIC. See Extracted ion chromatogram ZOOM® , MicroSol-IEF, 60, 65–66 Zoom scan triple-play experiment, 214