Methods
in
Molecular Biology™
Series Editor John M. Walker School of Life Sciences University of Hertfordshire Hatfield, Hertfordshire, AL10 9AB, UK
For other titles published in this series, go to www.springer.com/series/7651
Cancer Gene Profiling Methods and Protocols
Edited by
Robert Grützmann and Christian Pilarsky Department of Surgery, University Hospital Carl Gustav Carus, University Dresden, Dresden, Germany
Editors Robert Grützmann Department of Surgery University Hospital Carl Gustav Carus University Dresden Dresden Germany
[email protected] Christian Pilarsky Department of Surgery University Hospital Carl Gustav Carus University Dresden Dresden Germany
[email protected] ISSN 1064-3745 e-ISSN 1940-6029 ISBN 978-1-934115-76-3 e-ISBN 978-1-59745-545-9 DOI 10.1007/978-1-59745-545-9 Springer Dordrecht Heidelberg London New York Library of Congress Control Number: 2009930638 © Humana Press, a part of Springer Science+Business Media, LLC 2010 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Humana Press, c/o Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. While the advice and information in this book are believed to be true and accurate at the date of going to press, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein. Printed on acid-free paper Humana is part of Springer Science+Business Media (www.springer.com)
Preface Science is the agent between imagination and reality (Anonymous) During the last few years, the methods for analysing cancer-related genes on a molecular level have changed rapidly. With the advent of automated sequencing, new and faster investigations have become possible. This has led to the collection of a large number of DNA samples, such as Expressed Sequence Tag (EST) libraries whose entries run into millions. The advances in DNA sequencing technologies resulted in rapid improvements in oligonucleotide synthesising technologies, which has allowed researchers to produce oligonucleotides for each and every imaginable sequence at a low cost. Finally, the method of polymerase chain reaction (PCR), and other improvements in enzymatic in vitro amplification of nucleic acids, gave researchers the opportunity to use low amounts of nucleic acids for analysis. This enabled the research community to investigate the populations of cells in a given tissue. Sometimes it takes only a few advances for a technology to be successful – for example, the concept of arraying biological probes in a reproducible manner, and the use of these arrays instead of a single probe, greatly advanced biomedical research, especially as it was discovered that everything is arrayable. It has also changed the landscape of science in another way: a reduction of costs. The costs of generating such an array are high, but the costs of replicating such arrays are not. This key fact has led to a growth in the number of biotech companies that design and produce arrays, and that today more and more researchers have access to and use. Such unrestricted access to these resources has really been the key to the biomedical research revolution we see today. In this book, we have brought together the experiences of leading scientists in the discipline of cancer gene profiling. We have included several microarray techniques, as well as methods for arraying tissues and proteomics, because cancer genes can be profiled in different ways. Such different approaches are needed to understand the key stages of cancer development, because using only one technique would be insufficient. Therefore, we attempt to give an overview of the state-of-the-art methods that will enable the reader to perform these experiments successfully. It has been written for any student or practitioner with an interest in cancer gene profiling, and can be used in any well-equipped research laboratory. It may also serve as a demonstration of the kind of analysis that is possible today and will be complementary to other textbooks in the area of biomedical research. This book has been divided into five main sections. The first section covers techniques to get clinically relevant cancer material through the best methods of sample collection and storage. The second part begins with an overview of gene expression technology and gives an introduction to the latest cancer gene profiling technologies. Because cancer gene profiling is more then just the profiling of cancer gene expression, we have also included techniques for comparative genomic hybridisation (CGH) arraying and single-nucleotide polymorphism (SNP) analysis, and proteomic techniques.
v
vi
Preface
The third section contains real-life examples for the different technologies, and shows the full potential of cancer gene profiling today. This potential can only be utilised by the use of adequate bioinformatics tools. These tools are covered in the fourth part of the book. Because a cancer gene profiling experiment will most often lead to numerous candidate genes, which, in turn, have to be further validated and analysed, examples of performing post profiling experiments can be found in the final section of the book. It should be noted that all of the chapters in the book are linked by the description of particular successful experiments that were performed within the field of gene expression profiling. We offer our gratitude to all of the contributing authors and the staff of Humana Press – without their help, this book would not have been possible. We also thank our families for their love and patience. Finally, we are indebted to our mentor Hans Detlev Saeger for his unwavering support. Science is not just a profession – it should also be fun. This fun comes from the inception of an idea that goes on to be proven through experimentation, or, as we found in a Chinese fortune cookie: “The impossible is only the untried.” We hope that you will not only be successful, but will also have fun using our book in your research.
Dresden, Germany
Robert Grützmann Christian Pilarsky
Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
v ix
1. Organizational Issues in Providing High-Quality Human Tissues and Clinical Information for the Support of Biomedical Research . . . . . . . . . . . . . . . . 1 Walter C. Bell, Katherine C. Sexton, and William E. Grizzle 2. Manual Microdissection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Glen Kristiansen 3. Laser Microdissection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Anja Rabien 4. Tissue Microarrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Ana-Maria Dancau, Ronald Simon, Martina Mirlacher, and Guido Sauter 5. A Decade of Cancer Gene Profiling: From Molecular Portraits to Molecular Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Henri Sara, Olli Kallioniemi, and Matthias Nees 6. Mining Expressed Sequence Tag (EST)Libraries for Cancer-Associated Genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Armin O. Schmitt 7. Automated Fluorescent Differential Display for Cancer Gene Profiling . . . . . . . . . 99 Jonathan D. Meade, Yong-jig Cho, Blake R. Shester, Jamie C. Walden, Zhen Guo, and Peng Liang 8. Manual Microdissection Combined with Antisense RNA–LongSAGE for the Analysis of Limited Cell Numbers . . . . . . . . . . . . . . . . . 135 Jutta Lüttges, Stephan A. Hahn, and Anna M. Heidenblut 9. Quantitative DNA Methylation Profiling on a High-Density Oligonucleotide Microarray . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Anne Fassbender, Jörn Lewin, Thomas König, Tamas Rujan, Cecile Pelet, Ralf Lesche, Jürgen Distler, and Matthias Schuster 10. Single-Nucleotide Polymorphism (SNP) Analysis to Associate Cancer Risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 Julie Earl and William Greenhalf 11. Application of Proteomics in Cancer Gene Profiling: Two-Dimensional Difference in Gel Electrophoresis (2D-DIGE) . . . . . . . . . . . . . 197 Deepak Hariharan, Mark E. Weeks, and Tatjana Crnogorac-Jurcevic 12. Search for and Identification of Novel Tumor-Associated Autoantigens . . . . . . . . 213 Karsten Conrad, Holger Bartsch, Ulrich Canzler, Christian Pilarsky, Robert Grützmann, and Michael Bachmann
viii
Contents
13. Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 Jose A Martínez-Climent, Lorena Fontan, Vicente Fresquet, Eloy Robles, María Ortiz, and Angel Rubio 14. Cancer Gene Profiling in Pancreatic Cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 Felip Vilardell and Christine A. Iacobuzio-Donahue 15. Cancer Gene Profiling in Prostate Cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 Adam Foye and Phillip G. Febbo 16. Cancer Gene Profiling for Response Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 B. Michael Ghadimi and Marian Grade 17. The EGFR Pathway as an Example for Genotype: Phenotype Correlation in Tumor Genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341 Ulrike Mogck, Eray Goekkurt, and Jan Stoehlmacher 18. Quantitation Of CD39 Gene Expression in Pancreatic Tissue by Real-Time Polymerase Chain Reaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351 Martin Loos, Beat Künzli, and Helmut Friess 19. Functional Profiling Methods in Cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363 Joaquín Dopazo 20. Calibration of Microarray Gene-Expression Data . . . . . . . . . . . . . . . . . . . . . . . . . 375 Hans Binder, Stephan Preibisch, and Hilmar Berger 21. Meta-analysis of Cancer Gene-Profiling Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409 Xinan Yang and Xiao Sun 22. Target Gene Discovery for Novel Therapeutic Agents in Cancer Treatment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427 Ole Ammerpohl, Sanjay Tiwari, and Holger Kalthoff Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447
Contributors Ole Ammerpohl • Clinic for General Surgery and Thoracic Surgery, Division Molecular Oncology, University Hospital of Schleswig, Kiel, Germany Michael Bachmann • Institute for Immunology, Technical University Dresden, Dresden, Germany Holger Bartsch • Institute for Immunology, Technical University Dresden, Dresden, Germany Walter C. Bell • Department of Pathology, University of Alabama at Birmingham, Birmingham, AL, USA Hilmar Berger • Institute for Medical Informatics, Statistics and Epidemiology, University of Leipzig, Leipzig, Germany Hans Binder • Interdisciplinary Centre for Bioinformatics, University of Leipzig, Leipzig, Germany Ulrich Canzler • Institute for Immunology, Technical University Dresden, Dresden, Germany Yong-jig Cho • Department of Cell Biology, Vanderbilt-Ingram Cancer Center, School of Medicine, Vanderbilt University, Nashville, TN, USA Karsten Conrad • Institute for Immunology, Technical University Dresden, Dresden, Germany Tatjana Crnogorac-Jurcevic • Cancer Research UK Molecular Oncology Unit, Barts and The London Queen Mary’s School of Medicine and Dentistry, John Vane Science Centre, London, UK Ana-Maria Dancau • Institute of Pathology, University of Hamburg, Hamburg, Germany Jürgen Distler • Science Department, Epigenomics AG, Berlin, Germany Joaquín Dopazo • Bioinformatics Department, Centro de Investigación Príncipe Felipe, Valencio, Spain Julie Earl • Division of Surgery and Oncology, University of Liverpool, Liverpool, UK Anne Fassbender • Science Department, Epigenomics AG, Berlin, Germany Phillip G. Febbo • Departments of Medicine and Molecular Genetics and Microbiology, Duke Institute for Genome Science and Policy, Duke University, Durham, NC, USA Lorena Fontan • Division of Oncology, Center for Applied Medical Research, University of Navarra, Pamplona, Spain Adam Foye • Departments of Medicine and Molecular Genetics and Microbiology, Duke Institute for Genome Science and Policy, Duke University, Durham, NC, USA Vicente Fresquet • Division of Oncology, Center for Applied Medical Research, University of Navarra, Pamplona, Spain
ix
x
Contributors
Helmut Friess • Department of Surgery, Technische Universität München, Munich, Germany B. Michael Ghadimi • Department of General and Visceral Surgery, University Medical Center Göttingen, Georg-August-University, Göttingen, Germany Eray Goekkurt • Department of Internal Medicine I, University Hospital Carl Gustav Carus, University Dresden, Dresden, Germany Marian Grade • Department of General and Visceral Surgery, University Medical Center Göttingen, Georg-August-University, Göttingen, Germany William Greenhalf • Division of Surgery and Oncology, University of Liverpool, Liverpool, UK William E. Grizzle • Department of Pathology, University of Alabama at Birmingham, Zeigler Research Building, Birmingham, AL, USA Robert Grützmann • Department of Surgery, University Hospital Carl Gustav Carus, University Dresden, Dresden, Germany Zhen Guo • GenHunter Corporation, Nashville, TN, USA Stephan A. Hahn • Molecular GI-Oncology (MGO), Center for Clinical Research (ZKF), Ruhr-University Bochum, Bochum, Germany Deepak Hariharan • Cancer Research UK Molecular Oncology Unit, Barts and The London Queen Mary’s School of Medicine and Dentistry, John Vane Science Centre, London, UK Anna M. Heidenblut • Molecular GI-Oncology (MGO), Center for Clinical Research (ZKF), Ruhr-University Bochum, Bochum, Germnay Christine A. Iacobuzio-Donahue • Department of Pathology, GI/Liver Division, Johns Hopkins Medical Institutions, The Sol Goldman Pancreatic Cancer Research Center, Baltimore, MD, USA Olli Kallioniemi • VTT Medical Biotechnology, Turku, Finland Holger Kalthoff • Clinic for General Surgery and Thoracic Surgery, Division Molecular Oncology, University Hospital of Schleswig-Holstein, Kiel, Germany Thomas König • Science Department, Epigenomics AG, Berlin, Germany Glen Kristiansen • Department of Pathology, University Hospital Zurich, Zurich, Switzerland Beat Künzli • Department of Surgery, Technische Universität München, Munich, Germany Ralf Lesche • Science Department, Epigenomics AG, Berlin, Germany Jörn Lewin • Science Department, Epigenomics AG, Berlin, Germany Peng Liang • Department of Cell Biology, Vanderbilt-Ingram Cancer Center, School of Medicine, Vanderbilt University, Nashville, TN, USA Martin Loos • Department of Surgery, Technische Universität München, Munich, Germany Jutta Lüttges • Institute für Pathology, Saarbrücken Hospital, Saarbrücken, Germany Jose A. Martínez-Climent • Division of Oncology, Center for Applied Medical Research, University of Navarra, Pamplona, Spain
Contributors
xi
JONATHAN D. MEADE • GenHunter Corporation, Nashville, TN, USA MARTINA MIRLACHER • Institute of Pathology, University of Hamburg, Hamburg, Germany ULRIKE MOGCK • Department of Internal Medicine I, University Hospital Carl Gustav Carus, University Dresden, Dresden, Germany MATTHIAS NEES • VTT Medical Biotechnology, Turku, Finland MARÍA ORTIZ • CEIT and TECNUN, University of Navarra, San Sebastián, Spain CECILE PELET • Science Department, Epigenomics AG, Berlin, Germany CHRISTIAN PILARSKY • Department of Surgery, University Hospital Carl Gustav Carus, University Dresden, Dresden, Germany STEPHAN PREIBISCH • Max-Planck-Institute for Molecular Cell Biology and Genetics, Dresden, Dresden, Germany ANJA RABIEN • Research Division, Department of Urology, Charité – Universitätsmedizin Berlin, Campus Charité Mitte, Berlin, Germany ELOY ROBLES • Division of Oncology, Center for Applied Medical Research, University of Navarra, Pamplona, Spain ANGEL RUBIO • CEIT and TECNUN, University of Navarra, San Sebastián, Spain TAMAS RUJAN • Science Department, Epigenomics AG, Berlin, Germany HENRI SARA • VTT Medical Biotechnology, Turku, Finland GUIDO SAUTER • Institute of Pathology, University of Hamburg, Hamburg, Germany ARMIN O. SCHMITT • Institute for Animal Sciences, Humboldt-Universität zu Berlin, Berlin, Germany MATTHIAS SCHUSTER • Science Department, Epigenomics AG, Berlin, Germany KATHERINE C. SEXTON • Comprehensive Cancer Center, University of Alabama at Birmingham, Birmingham, AL, USA BLAKE R. SHESTER • GenHunter Corporation, Nashville, TN, USA RONALD SIMON • Institute of Pathology, University of Hamburg, Hamburg, Germany JAN STOEHLMACHER • Department of Internal Medicine I, University Hospital Carl Gustav Carus, University Dresden, Dresden, Germany XIAO SUN • Division of Bioinformatics, State Key Laboratory of Bioelectronics (ChienShiung Wu Laboratory), Southeast University, Nanjing, China SANJAY TIWARI • Division Molecular Oncology, Clinic for General Surgery and Thoracic Surgery, University Hospital of Schleswig-Holstein, Kiel, Germany FELIP VILARDELL • Department of Pathology, GI/Liver Division, Johns Hopkins Medical Institutions, The Sol Goldman Pancreatic Cancer Research Center, Baltimore, MD, USA JAMIE C. WALDEN • GenHunter Corporation, Nashville, TN, USA MARK E. WEEKS • Cancer Research UK Molecular Oncology Unit, Barts and The London Queen Mary’s School of Medicine and Dentistry, John Vane Science Centre, London, UK XINAN YANG • Division of Bioinformatics, State Key Laboratory of Bioelectronics, (Chien-Shiung Wu Laboratory), Southeast University, Nanjing, Nanjing, China
Chapter 1 Organizational Issues in Providing High-Quality Human Tissues and Clinical Information for the Support of Biomedical Research Walter C. Bell, Katherine C. Sexton, and William E. Grizzle Summary Superior-quality human tissues are required to support many types of biomedical research. To be useful optimally in supporting research, not only must these tissues be accurately diagnosed, but also the specific aliquots of tissue supplied to investigators must be accurately described as part of the quality control analysis of the tissue. Tissues should be collected, processed, and stored uniformly. Some tissues are provided to investigators from tissue banks for which tissues have been collected and processed according to standard operating procedures (SOPs) of the tissue bank. Other tissues provided to support research are collected and processed according to SOPs modified to meet investigator needs and requirements, i.e., prospective collection/processing. These different models of tissue collection require different goals, designs, and SOPs. The objectives of tissue repositories also vary based on the types of tissues provided (e.g., fresh tissue aliquots, fixed paraffin-embedded tissue, paraffin tissue sections, etc.) and how the tissues are to be used in research. For example, the potential use of tissues affects the need for extensive annotation of the specimen including both clinical information (e.g., clinical outcomes) and demographics. Specifically, if the tissues are to be used for extraction of proteins or basic studies of disease processes, less clinical information, if any, may be needed than if the tissues are to be used for the correlation of an aspect of the disease process with clinical outcome or response to a specific therapy. In this review, we describe, based on our experience, the major issues that should be addressed in designing and establishing a tissue repository. Key words: Human tissue, Tissue banking, Tissue repository, Research infrastructure, IRB, HIPAA Abbreviations CHTN DCIS DMSO GMP HIPAA ISBER ISO
Cooperative Human Tissue Network Ductal carcinoma in situ Dimethyl sulfoxide Good manufacturing practices Health Insurance Portability and Accountability Act International Society for Biological and Environmental Repositories International Organization for Standardization
Robert Grützmann and Christian Pilarsky (eds.), Cancer Gene Profiling: Methods and Protocols, Methods in Molecular Biology, vol. 576, DOI 10.1007/978-1-59745-545-9_1, © Humana Press, a part of Springer Science + Business Media, LLC 2010
1
2
Bell, Sexton, and Grizzle
LCIS Lobular carcinoma in situ LN2 Liquid nitrogen NCI National Cancer Institute OCT Optimal cutting temperature (compound for embedding specimens prior to cryosectioning) PHI Protected health care information PSA Prostatic-specific antigen QA Quality assurance/management QC Quality control SOP Standard operating procedure
1. Introduction Modern biomedical research requires access to superior-quality specimens of human tissue and bodily fluids with or without extensive clinical annotation (1–2). Different types of organizations devoted to supplying tissues for research have varying goals selected to meet different tissue and informational needs (3). In this review, we discuss multiple models of tissue repositories and, based on our experience, several of the more important issues affecting the design and operations of a tissue repository. A detailed discussion of most of these issues is beyond the scope of this chapter. Thus, we have referenced articles that provide additional information on these topics (1–19). We have also provided examples of standard operating procedures (SOPs); one for the processing of blood and another for the processing of tissue.
2. Models of Tissue Collection Obtaining human tissues and bodily fluids to support biomedical research may utilize an organized or disorganized approach. “Catch as catch can” is the best designation of the approach in which a surgeon, pathologist, or other medical personnel provides tissues to investigators via an unorganized approach. Such specimens characteristically have not been collected, processed, or stored using SOPs and usually are not associated with quality control; thus, these specimens may be of poor quality and their diagnosis may be incorrect. More worrisome is that such specimens may be obtained without oversight of Privacy Boards or Institutional Review Boards (IRBs) and may violate the Common Rule and/or the Health Insurance Portability and Accountability Act (HIPAA) regulations.
Organizational Issues in Providing High-Quality Human Tissues
3
2.1. Prospective Collection
An organized approach in which investigators specify exactly the tissue specimen they need as well as how the specimens are to be processed and stored is designated as the “prospective collection model.” The clear disadvantage of prospective collection versus a banking model is that large numbers of specimens are not readily available immediately when requested. In addition, outcome data are not available because the specimens are collected when requested, so the patients’ clinical outcomes may take several years to develop after collection. The advantage of the prospective collection model is that the investigator receives exactly what is requested (e.g., fresh uninvolved kidney minced in RPMI media). The investigator must, however, wait for specimen availability (weeks to months) and, when needed, on clinical outcome, which may take years.
2.2. Banking
Another approach to obtaining superior human tissues to support biomedical research is to utilize a “banking model.” In a banking model, SOPs are followed for obtaining, processing, and storing human tissues. For example, a bank may store only frozen tissues and/or paraffin blocks. “Specially processed” tissues as well as fresh, unfrozen samples usually will not be available from such a bank. One of the major disadvantages is that the specimens may not meet specific requirements of the investigator for specific parameters such as aliquot size, percentage of tumor, and processing and/or storage methods (3). Advantages of the “banking model” are that large numbers of specimens may be immediately available and clinical and demographic information including clinical outcome also are readily available.
2.3. Specimens Associated with Clinical Trials
A “clinical trial model” is a type of banking model in which the remnants of the tissues/bodily fluids collected from one or more clinical trials are banked to support future studies. The problems with this specific banking model compared with a general banking model may be magnified in that the original consent form of the clinical trial may not clearly state that the specimens can be used for research in addition to the clinical trial. Similarly, the institution’s IRB may prohibit the utilization of specimens for a different type of research. In addition, the remnants of the clinical study may not meet the needs of a wide range of investigators, and remnant tissues may not be available from all original patients of the clinical trial.
2.4. Combination
The “tissue repository model” uses a combination of the approaches of the prospective and banking models, including the advantages of each of the models. The main potential problem in the operation of a tissue repository is that there are complex and numerous administrative requirements as well as the need for a more complex bioinformatics system. This chapter focuses primarily on a tissue repository model.
4
Bell, Sexton, and Grizzle
3. Matching Tissues to Tissue Requirements 3.1. Identification of Specimens
3.2. Difficult to Fulfill Requests
Correct identification of specimens is of critical importance to providing superior-quality tissues to support biomedical research (3, 6, 12, 16, 17). A labeling method should be used that (1) minimizes the label separating from the specimen, (2) prevents mislabeling due to errors by personnel, and (3) avoids problems with reading the specimen identification (e.g., poor handwriting). For most tissue repositories, this is best accomplished by the use of bar codes that link the specimen to a database containing pertinent clinical, demographic, and historical information regarding the individual who was the source of the specimen. It should be understood that a bar code is only a “number” and this number contains no other information; it is only via the link of the number of the bar code to software that information regarding the bar-coded specimen is actually identified. Thus, unless the identical software is used at a second site, the second site can only read the specimen number and does not have access to a link with the information in the database. Any other information on the printed label, e.g., race, age, etc., comes from the software via the bar code and not directly from the bar code number. Requests for very specific tissues become more difficult to meet as more requirements are placed on the request (3, 16). Obviously, a request for any breast carcinoma is less difficult to supply than a request for well-differentiated breast carcinoma from an African American man younger than 40 years of age, because breast cancers are rare in males and in relatively young individuals. Another investigator requirement that makes a request difficult to meet is a request for very large amounts of tissue (e.g., 5 g) from tumors that are typically small. This includes tumors of the breast and prostate, which are usually small due to screening methods. Because some cancers (e.g., breast, prostate, pancreas) are in great demand by investigators, requests from multiple investigators, each requiring a small amount of tissue (e.g., 0.1 g), will more likely be filled than will a request for a large amount of tissue from these tumors (3, 16). Similarly, for tumors in high demand, such as prostate or pancreas tumors, which tend to be small, requests for large numbers of cases within 1 year (e.g., 100) are unlikely to be met. Requests for large numbers of relatively rare tumors or tumors or other tissues that are not typically removed surgically cannot be provided (see Subheading 4.2.4). Most tissue repositories try to provide tissues equitably among those investigators requesting the same tissues. Because efforts (time) devoted to supplying specific tissue requests must also be divided equitably, tissue requests requiring extensive effort (e.g., removal of a vertebral column from a body or processing hundreds of
Organizational Issues in Providing High-Quality Human Tissues
5
specimens by a complex protocol) cannot easily be met by a busy tissue repository. As discussed in Subheading 3.4, it is important for tissue repositories to educate investigators regarding which of their requirements make their requests difficult or impossible to meet. This includes unreasonable times for processing and freezing after the specimen is removed surgically (as discussed in Subheading 3.3). 3.3. Time Interval Between Surgery and Tissue Storage
In general, remnant diagnostic tissues should be reviewed by a pathologist or their designate to assure that the diagnostic integrity of the specimen is uncompromised. Unreasonable requirements of investigators for rapid processing and freezing of samples after surgery will reduce the availability of tissue to investigators for several reasons. Freezing samples in the operating room within minutes of removal from the patient may jeopardize the ability of a pathologist to review the material to ensure that it is not required for diagnosis. In addition, although some aliquots of tissue may be collected, processed, and frozen within 15 min of surgical removal, this usually requires special dedicated personnel and resources that many tissue repositories may not have. It is, however, important to record the time intervals between the removal of operative specimens or the transfer of these specimens from the operating room to the tissue repository, and these intervals should be minimized as much as possible. After specimens are removed from patients, they should be maintained unfrozen, at approximately 4°C rather than at room temperature, while awaiting diagnostic review. Specimens provided for research should then be processed as rapidly as practicable; however, delays in processing may occur when multiple specimens must be processed simultaneously. In such cases, one or two aliquots from each specimen could be rapidly snap frozen in liquid nitrogen (LN2) vapor and other aliquots could be collected and subsequently frozen. The scientific importance of rapid collection and processing of tissue after a long period of warm ischemia in vivo (i.e., while blood vessels are compromised during surgery, see Subheading 4.2.3) is controversial. This is because many molecules will be affected by enzymes that function optimally at body temperature of warm ischemia. Thus, numerous molecular changes may occur before operative tissues are removed from the body. Huang et al. (20) evaluated the effects of in vitro ischemia on 2,400 genes in human tissue specimens using spotted arrays and found that less than 14% of the genes changed by more than 50%. Most genes at the messenger RNA (mRNA) level showed relatively modest increases (5 min after surgery versus 60 min after surgery). Similarly, Dash et al. (21) reported that less than 1% of genes demonstrated changes after 1 h of removal of prostate tissue from the body. In addition, Spruessel et al. (22) reported that 80% of genes
6
Bell, Sexton, and Grizzle
changed less than twofold within 30 min after removal from the body. Based on these studies of mRNA, very rapid removal (30 of cells
Some of the arrayed tissues may show falsely negative or inappropriately weak IHC staining intensity due to variations in tissue processing (e.g., fixation medium and time). The large number of tissues included in a TMA will often compensate for this phenomenon, which is also encountered in large-section IHC analyses. At least a fraction of tissue spots yielding false negative IHC staining results can be identified in control experiments assessing the antigen integrity of the samples, e.g., IHC detection of tissue type-specific antigens like cytokeratins or vimentin. For tissues with a reasonable proliferative activity, Ki67 (MIB1) is an optimal quality control antibody (see Note 7). It is highly recommended to use freshly cut sections for IHC analysis. The time span between sectioning and immunostaining should be less than 2 weeks. Studies have shown that staining intensity decreases significantly with time for many antibodies (7, 8). 3.2.3. Fish
Because biopsies are all treated individually at the time when they are removed, fixed, and subsequently paraffin embedded, one must expect a certain degree of heterogeneity with respect to protein and nuclear acid preservation.
Tissue Microarrays
59
The proof of this assumption is best illustrated in the outcome of FISH analyses. Similar to results seen with largesection studies, TMA FISH analyses yield interpretable results in only approximately 60–90% of the analyzed tumors (depending on the quality and size of the FISH probe) at the first attempt. Again, similar to the case with large-section studies, it is possible to achieve interpretability in a fraction of initially non-informative cases by changing experimental conditions. For example, an increased proteinase concentration for slide pretreatment will result in interpretable signals in some initially non-informative cases at the cost of overdigestion of some previously interpretable samples. In general, we do not attempt to improve the fraction of FISH-informative cases by changing experimental conditions. Because of the high number of tumors on our TMAs (usually >500), we prefer to tolerate a fraction of non-interpretable tumors than to use too many precious TMA sections for additional experiments. 3.2.4. Summary
The TMA methodology is now an established and frequently used tool for tissue analysis. The equipment is affordable and easy to use in places where pathology expertise is available. Basically all kinds of in situ analyses, such as IHC, in situ hybridization, and in situ polymerase chain reaction (PCR) assays may be adapted to TMAs with only slight (if any) modifications of the respective large-section protocols.
4. Notes 1. Often TMA users realize that one critical control tissue has been forgotten only after completion of the TMA block. 2. It is advisable to have a freshly hematoxylin and eosin (H&E)stained section if the actual block surface is not well reflected on the available stained section. 3. For unequivocal identification of individual samples on TMA slides, it is important to avoid a fully symmetrical TMA structure. 4. Therefore, a special type of paraffin is needed with a melting temperature between 55°C and 58°C (“Peel-A-Way” paraffin, see Subheading 2). 5. However, a location of the tissue cylinder that is too superficial is less problematic than a position that is too deep, because protruding tissue elements can – to some extent – be leveled out after finishing the punching process. The use of a magnifying lens facilitates precise deposition of samples, especially for beginners. With the use of a glass slide, protruding
60
Dancau et al.
tissue cylinders are then gently pressed deeper into the warmed TMA block. 6. This can be done by precooling the needle with a piece of dry ice before punching and while dispensing the tissue core into the recipient block. Needles may easily bend or break. To prevent needle breakage, coring must be performed slowly with minimal pressure. 7. MIB1, which leads to strong staining in all mitoses, is often falsely negative in suboptimally processed tissues. References 1. Kononen, J., Bubendorf, L., Kallioniemi, A., Barlund, M., Schraml, P., Leighton, S. et al. (1998) Tissue microarrays for high-throughput molecular profiling of tumor specimens. Nat Med 4:844–7. 2. Bubendorf, L., Kononen, J., Koivisto, P., Schraml, P., Moch, H., Gasser, T. C. et al. (1999) Survey of gene amplifications during prostate cancer progression by high-throughout fluorescence in situ hybridization on tissue microarrays. Cancer Res 59:803–6. 3. Hoos, A. and Cordon-Cardo, C. (2001) Tissue microarray profiling of cancer specimens and cell lines: opportunities and limitations. Lab Invest 81:1331–8. 4. Simon, R., Struckmann, K., Schraml, P., Wagner, U., Forster, T., Moch, H. et al. (2002) Amplification pattern of 12q13-q15 genes (MDM2, CDK4, GLI) in urinary bladder cancer. Oncogene 21:2476–83.
5. Abbott, R. T., Tripp, S., Perkins, S. L., Elenitoba-Johnson, K. S. and Lim, M. S. (2003) Analysis of the PI-3-Kinase-PTEN-AKT pathway in human lymphoma and leukemia using a cell line microarray. Mod Pathol 16:607–12. 6. Fejzo, M. S. and Slamon, D. J. (2001) Frozen tumor tissue microarray technology for analysis of tumor RNA, DNA, and proteins. Am J Pathol 159:1645–50. 7. Bertheau, P., Cazals-Hatem, D., Meignin, V., de Roquancourt, A., Vérola, O., Lesourd, A. et al. (1998) Variability of immunohistochemical reactivity on stored paraffin slides. J Clin Pathol 51:370–4. 8. Jacobs, T. W., Prioleau, J. E., Stillman, I. E. and Schnitt, S. J. (1996) Loss of tumor markerimmunostaining intensity on stored paraffin slides of breast cancer. J Natl Cancer Inst 88:1054–9.
Chapter 5 A Decade of Cancer Gene Profiling: From Molecular Portraits to Molecular function Henri Sara, Olli Kallioniemi, and Matthias Nees Summary Cancer gene profiling has greatly profited from the progress in high-throughput technologies, including microarray-, sequencing-, and bioinformatics-based methods. The flood of data generated during the last decade has provoked a panel of “-omics” fields that significantly changed our understanding of malignant diseases. However, while the terms “-omics” and “-ome” in principle refer to the completeness of a genetic approach, we are in fact far from a complete understanding of cancer progression. We may understand gene expression patterns better and successfully use gene signatures for outcome prediction and prognosis, but truly promising molecular targets still have to find their way into novel therapeutic concepts. In this chapter, we will show how more comprehensive strategies, integrating multiple layers of genetic information, might in the future provide a more profound functional understanding of cancer. Key words: Microarray, Expression, CGH, Comparative genomic hybridization, Sequencing
1. Introduction: Arrays and Sequences for the Masses
Cancer is a genetic disease, and mutations are the key for understanding the disease mechanisms and developing novel therapeutic concepts. Somatic mutations and DNA copy number alterations, but also epigenetic changes, represent the basis for cancer progression, and result in altered messenger RNA (mRNA), alternative splicing patterns, or differential microRNA and protein expression. High-throughput (HTS) genomic technologies have taken the field by storm since their inception more than 10 years ago. Array technologies, in particular, have revolutionized mole cular cancer biology, clinical diagnosis and prognosis, and have created a multitude of different “-omics” approaches that would
Robert Grützmann and Christian Pilarsky (eds.), Cancer Gene Profiling: Methods and Protocols, Methods in Molecular Biology, vol. 576, DOI 10.1007/978-1-59745-545-9_5, © Humana Press, a part of Springer Science + Business Media, LLC 2010
61
62
Sara, Kallioniemi, and Nees
otherwise not be imaginable. Microarray technology has two major applications: gene expression analysis and genetic variation analysis. Both will be addressed here from the cancer-related point of view. In particular, cancer gene expression profiling has generated thousands of publications in the last decade (Fig. 1, Table 1), not even considering the “invisible” impact on private and corporate research not reflected by peer-reviewed publications. This large body of information has contributed greatly to our understanding of gene expression patterns that differ dramatically between normal and malignant tissues. Recent developments, such as the discovery of microRNAs and the option of globally profiling microRNA expression patterns, have shown that array and other HTS technologies will continue to contribute significantly to our improved understanding of cancer biology. There is also no end in sight for continued technological innovation and further miniaturization, as reflected by the most recent ultrahighdensity expression array platforms that have entered the market, and will be discussed here. Simultaneous with the success story of gene expression profiling technologies, comparable genetic “mapping” approaches such as comparative genomic hybridization (CGH) and singlenucleotide polymorphism (SNP) arrays have been developed for the genome-wide analysis of DNA copy number alterations. A panel of high-throughput genomic technologies is now broadly available that enables researchers to explore the cancer genome with an unprecedented throughput and accuracy, at greatly reduced cost. Apart from the success of the array technologies, high-throughput DNA resequencing approaches will increasingly play a role in future “-omics” research, complementing or even competing with hybridization-based technologies. Although resolution cannot be further increased beyond the singlenucleotide level, the fundamental question will be at which price the “next-generation” sequencing technologies will come and what their throughput might eventually be. Sequencing the “1,000 dollar cancer genome,” however, might soon become a reality (1, 2). The idea of fully characterizing the cancer genome(s) has therefore gained significant momentum. As a consequence, a number of large-scale projects have been recently launched to map the cancer genome in its entirety by resequencing, combined with other HT gene-profiling technolo gies. The incentives for the formation of an International Cancer Genome Consortium (ICGC) were in principle outlined at an International Cancer Genomics Meeting in Toronto, Canada, in October 2007. The ICGC will aim at the complete and comprehensive description of genomic, transcriptomic, and epigenomic changes in 50 different tumor types, including subtypes. During an initial explorative phase that will precede full-thrust efforts, the main focus will be on ten cancer types.
A Decade of Cancer Gene Profiling
63
Fig. 1. Statistics of peer-reviewed research articles, including reviews, indicating the occurrence of the term “microarray” or “microarrays” in the title, abstract, or MeSH (medical subject headings) term. Upper panel: Microarray-related publications for the years 1995–2007, in total (filled circle), cancer-related microarray studies (open square), and for spotted (filled square) or synthetic oligonucleotide arrays (filled triangle). Lower panel: Articles containing the combined keywords/MeSH headings tissue microarrays (open diamond), array–CGH (open square), ChIP-on-chip or chromatin immunoprecipitation on microarrays (filled square), alternative splicing arrays (open triangle), microRNA arrays (filled triangle), or protein and antibody arrays (open circle).
3
1 1
Mapping, chromosomal
Infection
1
8
25
3
5
Transplantation medicine
2
5
2
13
13
30
20
9
17
38
8
128
2002
5
5
2
2
5
11
6
2
14
21
5
58
2001
Reproduction
Microbiology
Alternative splicing
1
Cardiovascular
3
3
5
8
4
3
7
11
6
16
2000
Development
1
Mental health
1
2
2
1
2
2
5
8
1999
Metabolic/degenerative diseases
1
1
Drug development/toxicology
2
2
Immune/inflammatory system
2
1998
1
1
1996
SNP genotyping
Oncology
Human
6
11
43
4
15
45
18
69
37
18
33
66
12
231
2003
12
17
57
1
22
56
14
54
49
27
62
91
24
296
2004
6
21
46
4
32
97
78
52
64
37
94
127
30
411
2005
14
14
12
3
31
107
52
46
91
43
141
192
54
781
2006
8
8
8
14
38
47
49
60
61
75
84
85
108
653
2007
Table 1 Peer-reviewed publications that have made use of Affymetrix’ GeneChip technology, based on a detailed literature search in PubMed (http://www.ncbi.nlm.nih.gov/Entrez), indicating the major fields of biomedical research in which these were applied, for the two most strongly represented genomes (human and mouse)
64 Sara, Kallioniemi, and Nees
3
2000
Transplantation medicine
2
5
1
Reproduction
8
5
2
22
Microbiology
1
Cardiovascular
4
9
33
4
3
6
20
2
20
2002
2
2
Development
5
19
5
3
4
12
5
2001
Alternative splicing
3
8
Mental health
Metabolic/degenerative diseases
2
2
Mapping, chromosomal
Infection
2
Drug development/toxicology
3
1
1
1999
Immune/inflammatory system
1998
2
1
1996
SNP genotyping
Oncology
Mouse
2
11
4
1
17
60
23
54
11
5
10
40
39
2003
3
14
11
2
19
66
5
65
19
3
30
66
1
56
2004
6
15
20
3
10
113
12
54
19
15
25
82
2
59
2005
6
3
15
2
23
174
23
67
42
13
58
125
3
127
2006
2
5
5
4
30
100
22
45
28
13
22
66
7
113
2007
A Decade of Cancer Gene Profiling 65
66
Sara, Kallioniemi, and Nees
Key to participation in ICGC will be the comprehensive nature of the proposed studies, and compliance to the commonly agreed-on guidelines and data exchange policies. Considering the immense volume of DNA that needs to be sequenced, and the data that need to be stored and processed, these efforts easily equal the scale of thousands of human genome projects (3). The amount of mRNA gene expression data generated (including exon-level data) will be equally overwhelming. In this chapter, we provide a synopsis of the pros and cons of existing technologies, discuss the power of data integration, and outline the translational opportunities arising from “deep sequencing” and transcriptional profiling of the cancer genome.
2. Array-Based Expression Profiling: Spotted Versus Synthetic Arrays
The idea to review the entire body of literature on microarrays is an impossible task (compare Fig. 1, Table 2). For this chapter, we will therefore use statistics based on this large body of scientific literature, with the purpose of giving an overview of at least the most important trends and developments that have occurred during the last 10 years. The concept of microarray technologies has clearly derived from the Southern blotting method dating back to 1975. The use of collections of distinct DNA fragments in “macro” arrays for expression profiling (basically “dot blots”) was in general use throughout the 1980s. These early arrays were made by spotting complementary DNAs (cDNAs) onto filter paper with a pin-spotting device. The then upcoming large-scale expressed sequence tag (EST) projects, generating millions of cDNA clones or “tags,” provided an additional basis by generating a massive amount of sequence information and making clone libraries freely available to researchers. Polymerase chain reaction (PCR) technology (since 1986) also had a very significant impact. However, it required the development of specialized robotics, computerization, and miniaturization during the mid to late 1990s to bring microarray technologies – as we know them – significantly for ward. From the beginning, two ver y different microarray concepts were developed in parallel – and have been “competing” ever since. The use of miniaturized cDNA microarrays for gene expression profiling, based on cDNA clones and PCR-amplified DNA fragments, was first reported in 1995 (4–6). The first application of microarrays in cancer research was published in 1996 (7). A large number of cDNA platforms have been generated since, almost exclusively by the academic research community. In the pioneer days, generating a cDNA array was frequently performed under heroic “home-brewing” circumstances, handling crude spotting robotics and large clone
A Decade of Cancer Gene Profiling
67
Table 2 Number of individual microarray samples, as submitted to the GEO database, as of December 15, 2007 Table 2a Specifies mRNA gene expression analyses performed on the most prevalent commercial platforms, provided by Affymetrix, Agilent, NimbleGen and Illumina (bead arrays) Type
Array platform
Oligo array Affymetrix GeneChip Human Cancer Array HC-G110
Samples in GEO
GEO accession
23
Affymetrix, Inc. GPL74
476
Affymetrix, Inc. GPL80
Oligo array Affymetrix GeneChip Human 35K Array Hu35k-A to D
40
Affymetrix, Inc. GPL98
Oligo array Affymetrix GeneChip Human HG-Focus Target Array
1,129
Affymetrix, Inc. GPL201
Oligo array Affymetrix GeneChip Human Genome U95A to E
5,543
Affymetrix, Inc. GPL92
Oligo array Affymetrix GeneChip Human Genome U133A Early Access
464
Affymetrix, Inc. GPL4685
Oligo array Affymetrix GeneChip Human Genome U133A 2.0 Array
754
Affymetrix, Inc. GPL571
Oligo array Affymetrix GeneChip Human Genome U133 Plus 2.0 Array
10,878
Affymetrix, Inc. GPL570
Oligo array Affymetrix GeneChip Human Array HuGeneFL
Oligo array Affymetrix GeneChip Human X3P Array
218
Affymetrix, Inc. GPL1352
Oligo array Affymetrix Human Exon 1.0 ST Array (Transcript level)
401
Affymetrix, Inc. GPL5160
Oligo array Affymetrix Human Exon 1.0 ST Array (Exon level)
29
Affymetrix, Inc. GPL5188
2,548
Affymetrix, Inc. GPL5234
Oligo array Affymetrix Human Phase3 v1.0 (transcript mapping) Oligo array Agilent-012097 Human 1A Microarray G4110B
933
Agilent
GPL887
Oligo array Agilent-011521 Human 1A Microarray G4110A
210
Agilent
GPL885
Oligo array NimbleGen Human Expression array
13
NimbleGen Inc. GPL5465
Bead array
Illumina Sentrix HumanRef-8 Expression BeadChip
427
Illumina Inc.
GPL2700
Bead array
Illumina Sentrix Human-6 Expression BeadChip v1 + 2
735
Illumina Inc.
GPL2507
Bead array
Bead-based microRNA profiling platform version 1–3
453
BROAD Institute
GPL1986
68
Sara, Kallioniemi, and Nees
Table 2b Samples contained in GEO based on SNP arrays and CGH arrays, as of December 15, 2007 Type
Array platform
SNP
Affymetrix GeneChip Mapping 10K Array (Xba131 SNP)
SNP
Affymetrix GeneChip Mapping 10K 2.0 Array (Xba142 SNP)
SNP
Samples
Provider
Accession
280
Affymetrix, Inc.
GPL1266
7,233
Affymetrix, Inc.
GPL2641
Affymetrix GeneChip Human Mapping 50K Hind
64
Affymetrix, Inc.
GPL2014
SNP
Affymetrix GeneChip Human Mapping 50K Xba
64
Affymetrix, Inc.
GPL2015
SNP
Affymetrix GeneChip Mapping 100K Set Array (50K Hind240 SNP)
934
Affymetrix, Inc.
GPL2004
SNP
Affymetrix GeneChip Mapping 100K Set Array (50K Xba240 SNP)
951
Affymetrix, Inc.
GPL2005
SNP
Affymetrix GeneChip Mapping 500K Early Access (250K Sty SNP)
306
Affymetrix, Inc.
GPL3812
SNP
Affymetrix GeneChip Mapping 500K Set Array (250K Nsp SNP)
352
Affymetrix, Inc.
GPL3718
SNP
Affymetrix GeneChip Mapping 500K Set Array (250K Sty2 SNP)
682
Affymetrix, Inc.
GPL3720
SNP
Sentrix BeadChip Array HumanHap300 Genotyping BeadChip
110
Illumina, Inc.
GPL5711
CGH
Vysis GenoSensor CGH Array 300
12
Vysis, Inc.
GPL3709
CGH
Agilent-012750 Human Genome CGH Microarray 44A G4410A + B
207
Agilent Technologies, Inc.
GPL2873
CGH
Agilent-014693 Human Genome CGH Microarray 244A (G4411B)
92
Agilent Technologies, Inc.
GPL4544
CGH
NimbleGen Human HG18 WG CGH 389K array
5
NimbleGen, Inc.
GPL5941
CGH
NimbleGen Human HG17 ENCODE tiling array
74
NimbleGen, Inc.
GPL3514
CGH
CGH array LUMC, 1 Mb clone set
98
LUMC
GPL1506
CGH
CGH-CHROM14 2K versions 1 + 2
41
Sanger Institute
GPL3892
CGH
CGH-SANGER 3K versions 1–5
116
Sanger Institute
GPL4003
CGH
CGH-SANGER 4K version 1
54
Sanger Institute
GPL4939
CGH
CGH-SANGER 5K version 2
2
Sanger Institute
GPL3887 (continued)
A Decade of Cancer Gene Profiling
69
Table 2b (continued) Type
Array platform
Samples
Provider
Accession
CGH
DKFZ Homo sapiens array–CGH 6k BAC array
90
DKFZ
GPL5685
CGH
MHP Human Chromosome 1 tile path CGH array version 1
12
University of Cambridge
GPL5055
CGH
MHP Human Chromosome 1 tile path CGH array version 2
96
University of Cambridge
GPL5056
CGH
MPIMG Homo sapiens 44K ArrayCGH
68
MPIMG Berlin
GPL5114
collections stacking up in freezers, high-throughput PCR and DNA purification strategies, as well as complex surface chemistry, postprocessing, and optimization of sample labeling and hybridization, all at the same time. Larger genome institutes, such as the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI), not forgetting Stanford University, early on were able to accrue larger funding and personnel and managed to semi-industrialize array production in comparably large-scale array core facilities. A survey in the public array data sets submitted to the gene expression omnibus (GEO) array database shows that a total of 350 different cDNA array platforms have been deposited since 2001, the majority of these represent spotted arrays. These arrays, taken together, target altogether no less than 240 different species, with man, mouse, and rat ranking in the first positions (Fig. 2), followed by Arabidopsis, yeast, and Drosophila. In situ-synthesized oligonucleotide microarrays are produced by generating 25–60mer sequences directly onto a planar array surface, using light-directed chemical synthesis of nucleic acids and technologies, borrowed from semiconductor manufacturing. Photolithographic synthesis uses a chemically activated silica substrate, and light-sensitive masking agents construct the sequence one nucleotide at a time across the entire array. Affymetrix’ GeneChip® technology was invented in the late 1980s by a team of scientists led by Stephen P. A. Fodor, who cofounded Affymetrix Inc. in 1992. The company initiated in 1991 from a small unit called Affymax N.V. in Fodor’s group, which had, in the late 1980s, already developed methods (and patents) for fabricating the first small synthetic DNA arrays. The company’s first product, an HIV genotyping chip, was introduced in 1994 (8). The company eventually went public in 1996. Oligonucleo tide microarrays are now primarily associated with the name Affymetrix, which in 2007 was the undisputed market leader.
70
Sara, Kallioniemi, and Nees
Fig. 2. Statistics of microarray samples submitted to the GEO public data repository (http://www.ncbi.nlm.nih.gov/geo/), indicating the number of samples per species (top 12 species among 240 in total represented in GEO), submitted annually since the beginning of the project in 2001.
However, modified synthesis technologies have been introduced more recently. The “maskless array synthesis” protocol from NimbleGen Systems makes use of the more flexible digital micromirror devices (DMD), borrowed from light processors used in optical presenters. DMDs employ an array of miniature aluminum mirrors to pattern between 786,000 and 4.2 million individual pixels. These “virtual masks” replace the physical masks used in Affymetrix technology – the major advantage is an increased flexibility and turnover in array design. An altogether different concept underlies bead arrays, with Illumina Inc. being the market leader in this field. For bead arrays, an optical “imaging” fiber is etched such that a bead can fit into the resulting micron-sized etched wells right on the tip of the fiber. Different oligonucleotide sequences are attached to each bead, and thousands of beads can be self-assembled onto the fiber bundle. A subsequent decoding process is carried out to determine which bead occupies which well. Complementary oligonucleotides present in the
A Decade of Cancer Gene Profiling
71
s ample bind to the beads, and bound oligonucleotides are measured by using a fluorescent label. Illumina sells these arrays in two formats: Sentrix Array Matrix and Sentrix BeadChips. Affymetrix, NimbleGen, and Illumina represent so-called “single-channel” or “one-color” microarrays and give estimations of the absolute levels of gene expression, in contrast to most cDNA arrays (including spotted oligonucleotide arrays) that require a dual-channel hybridization strategy. Absolute values of gene expression may be compared with other genes within a sample or with the same gene across a large panel of array hybridizations. In contrast to cDNA arrays, data are more readily normalized and compared with arrays from different experiments or even different array generations, a task that is virtually impossible considering the hundreds of different cDNA array platforms. The absolute values of gene expression may be compared between studies conducted months and years apart, or even across researchers from all over the planet using the same (commercial) array platforms. Even between entirely different array concepts, such as bead-based and planar oligonucleotide chips, comparisons are possible and are in fact facilitated by the singlechannel principle. Considering the large amount of data generated by the international academic cancer research community, this now turns out to be one of the major advantages. Figure 1 illustrates the number of publication abstracts or MeSH headings containing the keyword “microarrays”. The number of publications has been exponentially increasing since the year 2000, with more than 6,000 peer-reviewed publications for the year 2007 alone – across the entire field of biomedical and pharmaceutical research. A combined search for the terms “cancer” and “microarrays” results in roughly 2,000 publications each for the last 2–3 years; indicating that approximately one third of all microarray-related publications were applied to cancer research. The data also suggest a significant saturation effect, with the curve approaching a plateau. This may indicate that the majority of cancers have already been sufficiently profiled using high-density microarrays, although the level of new publications remains high. The publication statistics also indicate that oligonucleotide arrays dominate the market. The number of publications based on cDNA arrays, in contrast, has not been increasing after 2002, and is actually declining since 2005. This is also illustrated in Table 2, summarizing the large number of publications based on the Affymetrix GeneChip arrays, for both mouse and human, and across all fields of biomedical research. It is intriguing to see that, in both species, the largest number of publications is related to cancer research/oncology, followed by SNP array genotyping. Unimpressed by the plateau effect, many novel array technologies are still rapidly evolving. As illustrated in the lower panel of Fig. 1, there still is an almost exponential
72
Sara, Kallioniemi, and Nees
increase in the number of new publications on tissue microarrays, array–CGH, and chromatin immunoprecipitation on chips (so-called ChIP-on-chip technologies), as well as microRNA arrays. Protein and antibody arrays, however, have not been a success story or a commercial breakthrough as of yet, at least according to the low number of publications. Alternative splicing arrays, on the other hand, are only now becoming widely available and the future will show their impact. The immense amount of data available generated an urgent need for standardization, which was addressed by a number of community efforts by 2001 (9), prior to the flood of data. Currently, the MicroArray and Gene Expression (MAGE) group continues to work on the standardization of the representation of gene expression data and relevant annotations, aiming at facilitated exchange of data sets (http://www.mged.org). The Microarray Gene Expression Data (MGED) society has taken on the Minimum Information About a Microarray Experiment (MIAME) checklist, which was initially intended to define the level of information submitted together with a microarray experiment (9). It has since been adopted by many journals as a minimal requirement for the submission of papers incorporating microarray results. MIAME, however, is not a unified format, and is therefore of limited use. MGED, therefore, has recently launched an updated MIAME 2.0 version. In parallel, the MicroArray Quality Control (MAQC) project is conducted by the US Food and Drug Administration (FDA) with the aim of developing standards and quality control metrics that will eventually allow the use of array data in drug discovery, clinical practice, and regulatory decision making. The latest version MAQC protocols encompass two stages (MAQC I and II, and MAQC III addresses guidelines for next-generation sequencing (MAQC III). The wish for standardization and the MIAME checklist were the primary reason for, and have since provided the “food” for, the building of large public data repositories, such as GEO, EBI ArrayExpress, and the Stanford Microarray database (for web addresses, see Table 3). As already indicated by the literature trends shown in Fig. 1, commercial in situ array platforms also dominate the public data submitted to such repositories. Tables 2a and 2b summarize the number of array samples deposited in the largest of these databases, GEO, by the end of 2007. The number of arrays (22,500 arrays) based on any one of the Affymetrix platforms contrasts with only 1,150 arrays based on all other commercial platforms combined, such as Agilent and NimbleGen, or 1,650 from Illumina bead arrays. Noncommercial spotted array platforms also provide a large number of samples, but are distributed over dozens of very different platforms. The largest individual studies, with currently >3000 samples hybridized, such as the Expression Project for Oncology (expO) project
A Decade of Cancer Gene Profiling
73
Table 3 List of web addresses for microarray databases, including primary data repositories, reference gene expression databases, and a selection of meta-analysis databases intended to assign functional information to gene expression patterns Primary microarray data repositories
Web URL
ArrayDB, NHGRI Array Database
genome.nhgri.nih.gov/arraydb/
ArrayExpress, EBI Microarray Database
http://www.ebi.ac.uk/arrayexpress/
ArrayTrack, FDA Microarray Database
http://www.fda.gov/nctr/science/ centers/toxicoinformatics/ArrayTrack
BROAD Institute Microarray Database, MIT
http://www.broad.mit.edu/cancer/ datasets.html
caArray Standards-Based Array Database, NCI
caarraydb.nci.nih.gov/caarray
CIBEX Microarray Database
cibex.nig.ac.jp
CleanEx Microarray Database
http://www.cleanex.isb-sib.ch/
GAN Gene Aging Nexus
gan.usc.edu
GEO Gene Expression Omnibus, NCBI/NIH
http://www.ncbi.nlm.nih.gov/geo/
LAD Longhorn Array Database
http:// www.longhornarraydatabase. org/
LMD Lung Microarray Database
lungmicroarray.org
maxD Array Database, Manchester University UK
http://www.bioinf.man.ac.uk/ microarray/maxd
MESA Microarray Database, Burnham Institute
bsrweb.burnham.org/metadot
MUSC Array Database, Medical University of South Carolina
proteogenomics.musc.edu/pss
NKI Microarray Database, the Netherlands Kanker Institut
microarrays.nki.nl/
NOMAD Array Database, UCSF
ucsf-nomad.sourceforge.net/
NYUmad, NYU Microarray Database
http://www.bioinformatics.nyu.edu/ Projects/nyumad
PhenoGen Informatics, U. of Charleston, South Carolina
phenogen.uchsc.edu/PhenoGen
PumaDB, Princeton University Array Database
puma.princeton.edu
SMD Stanford Microarray Database
genome-www5.stanford.edu/
UNC MicroArray Database
genome.unc.edu
YMD Yale Microarray Database
http://www.med.yale.edu/microarray/
Reference Gene Expression Databases
Web URL (continued)
74
Sara, Kallioniemi, and Nees
Table 3 (continued) Primary microarray data repositories
Web URL
caGEDA Cancer Gene Expression Database
http://bioinformatics.upmc.edu/GE2/ GEDA.html
Connectivity Map BROAD Institute, MIT
http://www.broad.mit.edu/cmap
EMAGE Expression Database (mouse)
genex.hgu.mrc.ac.uk/Emage/database/
GeneX Open Source Gene Expression Database
genex.sourceforge.net/
GEPIS Expression Database, UCSF
http://www.cgl.ucsf.edu/Research/ genentech/gepis/gepis.html
GXD Mouse Gene Expression Database, Jackson Lab
http://www.informatics.jax.org/ mgihome/GXD
HuGE Index Human Gene Expression Index
zlab.bu.edu/HugeSearch
ITTACA Tumor Gene Expression Database
bioinfo.curie.fr/ittaca
PEDP Prostate Expression database
http://www.pedb.org/
PEPR Public Expression Profiling Resource
pepr.cnmcresearch.org
RAD RNA Abundance Database, UPenn
http://www.cbil.upenn.edu/RAD
RefExA Reference Database for Gene Expression Analysis
157.82.78.238/refexa
SIEGE Lung Gene Expression Database
pulm.bumc.bu.edu/siegeDB
Symatlas, Novartis Institute/GNF
symatlas.gnf.org/SymAtlas/
tmaDB Tissue Microarray Database, Leeds UK
http://www.bioinformatics.leeds.ac.uk/ tmadb/
MetaSearch Expression Databases
Web URL
GeneLogic Toxicogenomics Inc.
http://www.genelogic.com
GeneVestigator, ETH Zurich
http://www.genevestigator.ethz.ch.
GENOMICA, Eran Segal Lab, Weitzmann Inst.
genomica.weizmann.ac.il/
Module Map, Daphne Koller Lab/Stanford
robotics.stanford.edu/~erans/cancer/
Oncomine
http://www.oncomine.org
SPELL Serial Pattern of Expression Levels Locator, Princeton
function.princeton.edu/SPELL
TMM Gene CoExpression Database, Pavlidis’ Lab
microarray.cpmc.columbia.edu/tmm
(http:// www.intgen.org/expo.cfm; maintained by the International Genomics Consortium [IGC]), are almost exclusively based on commercial platforms. Although these numbers may not directly correlate with the sales figures, they nevertheless illustrate the huge amount of experimental data freely available to the community. Unfortunately, corporate experimental data
A Decade of Cancer Gene Profiling
75
are not usually submitted to such repositories and are therefore missing. Furthermore, data are often released years after the actual experiments, there is a significant lag phase in database submissions. In any case, this treasury of array data has spawned a number of efforts aiming at the generation of integrative databases, allowing the mining of these data in a meta-analysis mode. Oncomine, (http://www.oncomine.com), Genesapiens (www. genesapiens.org),andGeneVestigator (www.genevestigator.ethz.ch) are the most popular of these databases and illustrate the concept others are summarized in Table 3. This trend toward meta-analyses reflects the need for a more uniform and comprehensive, functional understanding of the data, aiming less at the identification of differentially expressed genes or “markers,” and focusing instead on understanding the pathways and mechanisms that drive cancer progression. Simultaneous with the rise of immense databases and array repositories, bioinformatics has gained an immensely important role, and got a major boost in significance by the dire need to normalize, handle, and interpret the bulk of data. It is not possible to even briefly sketch the impact of bioinformatics on the life sciences due to space limitations. As a surrogate for this, Tables 4 and 5 (generated by clustering PubMed literature data on microarrays) are primarily intended to illustrate the predominance of bioinformatics-related
Table 4 Meta-analysis of the co-occurrence of keywords and MeSH headings together with the terms “microarray*” and “breast cancer.” The resulting list of additional keywords retrieved in this “literature clustering” was then ranked according to the total number of co-occurrences. Search terms pointing primarily to bioinformaticsrelated topics are in bold Breast cancer/keywords/MeSH headings
#
Keywords/MeSH headings
#
Mammary
65
Image analysis
17
Estrogen receptor
58
Normal tissues
17
Basal-like Subtype
54
Sections, tissue
17
Amplification/copy number changes
50
DNA methylation
15
Prediction of outcome/predictive markers
60
COX-2
15
Ductal carcinoma
48
Endothelial cells
15
MCF-7
46
Mutations
14
Ovarian cancer
44
Molecular classification/signature
12
Prognostic markers/signature
41
Early-stage breast cancer
12
BRCA1
34
Differentially expressed
11 (continued)
76
Sara, Kallioniemi, and Nees
Table 4 (continued) Breast cancer/keywords/MeSH headings
#
Keywords/MeSH headings
#
Formalin-fixed, paraffin-embedded
31
Databases
11
Tamoxifen
29
EGFR, EGF receptor
10
Estrogen
28
Adjuvant chemotherapy
10
FISH, fluorescence in situ hybridization
27
Proteomic analysis
10
P53, tumor suppressor protein p53
25
Biomarkers
10
Metastasis
24
False, estimates
10
ERBB2, Her-2
23
Fine needle aspiration
10
Classification, class
21
Lobular carcinoma
10
Comparative genomic hybridization
20
Prostate and breast cancer
10
Hypoxia
19
Apoptosis
10
Model
19
Subtypes of breast cancer
10
Tissue microarray
17
Locally advanced breast cancer
10
Year
# publications
1999
7
2000
19
2001
56
2002
113
2003
155
2004
267
2005
323
2006
387
2007
420
questions in microarray publications. In Table 4, MeSH headings and keywords were ranked according to the frequency of co-occurrence in publications focusing on microarrays in breast cancer research (for which the largest number of microarrayrelated studies are available). It becomes immediately obvious that keywords such as “subtypes,” “outcome prediction,” “prognostic markers,” “molecular signature,” and “tumor classification” feature prominently. Similar trends are then further exemplified by the ranking of keywords/MeSH headings most frequently co-occuring with the search terms “cancer,” “microarrays,” and “bioinformatics” (Table 5). According to this survey, bioinfor-
A Decade of Cancer Gene Profiling
Table 5 Literature clustering and analysis of keywords/MeSH headings that most frequently co-occur in conjunction with the terms “microarray*,” “cancer,” and “bioinformatic*”. This list summarizes many of the aims and procedures where bioinformatics is primarily applied in the microarray field MeSH: microarrays/ cancer/bioinformatics
n = 496
Classification
83
Clustering, cluster analysis
44
Physiology
39
Biomarkers
26
Statistics and numerical data
22
Survival
17
NCI-60 panel
16
Proportional hazards models
15
Trends
15
Standards
10
Gene interactions
9
FDR, false discovery rate
8
Logistic regression
8
Proteomic technology
7
Immunology
7
SNP array
6
Recursive feature
6
Functional modules
5
Chromosomal regions
5
Gene selection algorithm
5
Copy number changes
4
Family-wise error rate
4
Gene co-expression
4
Multiclass classification
4
77
78
Sara, Kallioniemi, and Nees
matics is the fundament for the identification of “biomarkers,” but also “functional modules,” “gene interactions,” and “targets,” to mention a few. Again, this list illustrates the need of the research community for tools that functionally annotate gene expression data, or help classifying tumors in different subclasses according to expression patterns or clinical data, such as survival. Hundreds of biomarkers and gene sets have been identified that correlate more or less significantly with stage and progression of the disease(s), but few of these may also represent promising novel targets for therapeutics. Based on the cancer classification/subclass concept, the idea of diagnostic and prognostic gene signatures has been introduced, a concept that has triggered a new “industry” of diagnostic companies that offer gene expression services for improved patient stratification and personalized therapeutic decisions (e.g., Agandia’s MammaPrint® test, http://www.agendia. com). Individualized medicine, therapeutic decisions, and more accurate patient stratification based on gene expression profiles have already become a reality. Agendia’s In Vitro Diagnostic Multivariate Index Assay (IVDMIA) was granted market clearance in 2007 by the FDA, which provides the legal basis for offering this service in the United States. The test has been sold in Europe since 2005.
3. Array-Based Genetic Mapping Part 1: CGH Arrays
The genomic landscape of tumors encompasses a broad spectrum of genetic events. The scale of copy number alterations ranges from microdeletions/amplifications of a few bases to megabases of DNA and entire chromosomes. DNA copy number variations (CNV) are most rapidly addressed by array-based technologies, primarily array–CGH and SNP arrays. Array-CGH has come a long way since the first description of chromosomal CGH in 1992 (14) and the first array–CGH (15) based on spotted DNA from BACs. It has become a common tool in cancer genome analyses, reflected by the fact that, by 2007, all of the more frequent tumor types have been covered by at least one array– CGH study in the literature. These platforms allow the rapid and reliable detection of increasingly smaller microdeletions and amplifications. Oligonucleotide-based CGH (offered by companies such as Agilent and NimbleGen, Table 2b) and SNP arrays (Affymetrix, Illumina, next paragraph) have more or less replaced bacterial artificial chromosome (BAC)-based CGH arrays as a method of choice for larger-scale genomic profiling of cancer. For example, NimbleGen offers a 384K array–CGH platform based on 50mer-long oligos, similar to Agilent’s CGH
A Decade of Cancer Gene Profiling
79
platforms (60mer) that have been recently upgraded from 44,000 to 244,000 elements. This increased density offers greatly improved resolution, down to exon-level detection of focused deletions. CGH arrays now represent highly reliable, standardized technology platforms; they have become much more affordable, and have greatly facilitated access to genomic profiling technologies for many clinical laboratories. Array service providers such as NimbleGen increasingly “do the jobs,” requiring the researcher simply to provide purified tumor DNA – and handle the data with the help of bioinformatics. Integration or “layering” of expression and CGH data generates additional insights into cancer biology. By 1999, cDNA microarrays were introduced as a plausible platform for CGH, thereby facilitating the integration of both DNAand RNA-level data (16). Now, bioinformatics also facilitates the layering of CGH array and mRNA expression data between different platforms (17–19). In breast cancer, for example, CGH array data basically reflect the subtype classification schemes defined based on expression profiling alone (10–12). This does not apply to all the subtypes described, however, and partially overlapping alternative tumor classes based on genomic features such as chromosomal instability and gene amplifications have been suggested (20). In another breast cancer study, mRNA gene expression profiling, clinical outcome data, and BAC-based CGH array data were integrated (18). This report confirmed that expression- and CGH-based tumor classification overlap to a large degree and that patient stratification and prognosis may be significantly improved by using this combined strategy. Furthermore, this study identified 66 genes with recurrent high-level amplifications, resulting in gene overexpression (“amplified/over-expressed genes”) that may represent novel cancer targets. Nine of these candidates would be generally considered as druggable. A second report, also from Joe Gray’s group (19), analyzed a panel of 51 breast cancer cell lines by CGH and expression arrays, including a set of 145 primary tumors. Protein expression and clinical data were also integrated. Again, the cell lines confirmed that most of the genetic changes found in tumors are by and large represented in the cell lines. This confirms that the use of cell lines in cancer research, although much debated, is indeed justified and reflects cancer biology to a large degree. Interestingly for cancer therapeutics, Herceptin response and resistance in the cell line panel correlated with expression of a number of protein markers that may be of clinical value for patient selection – larger panels of cell lines might give more robust insights. Data integration as exemplified above is clearly a powerful strategy with the potential to gain functional insights, and for the identification of cancer drug targets or therapies.
80
Sara, Kallioniemi, and Nees
4. Array-Based Genetic Mapping Part 2: SNP Arrays
The currently available SNP array platforms cover between 10,000 (10K) and 500,000 (500K) markers on a single array or bead array. At this high density, and due to the existence of haplotypes in the human genome resulting in the phenomenon of linkage disequilibrium, there is no need to hybridize normal samples to analyze CNVs – different from CGH arrays. Haplotypes are segments of chromosomes that have not been “broken up” by recombination, and are separated by the sites of recombination. Haplotypes in particular enable geneticists to search for genes involved in cancer and many other diseases, and facilitate the use of SNP arrays for the detection of CNV, also allowing researchers to identify putative target genes located in precisely mapped minimal amplification/deletion intervals. Approximately 3.1 million SNPs have been mapped in the human HapMap consortium (http://www.hapmap.org), and are readily available for improved SNP array design. Accordingly, both Illumina and Affymetrix have very recently launched larger SNP platforms covering more than one million SNPs (Affymetrix SNP array 6.0 with 1.8M variation markers; and the Illumina Human1M BeadChip). SNP arrays have rapidly replaced older technologies, such as microsatellite markers, and compete with CGH arrays for genetic profiling. Controlled clinical studies including hundreds or thousands of cancers are feasible, providing robust statistics for the detection of recurrent somatic alterations. The mapping resolution is comparable to that provided by CGH arrays. In one of the most extensive cancer genotyping studies to date, performed on 528 lung adenocarcinomas, the NKX2-1 or TITF1 gene was identified as the most likely target gene of a recurrent amplification at 14q13.3 (21). Analogous to findings related to the MITF gene in malignant melanoma (22), NKX2-1 is a typical lineage-specific transcription factor. Both NKX2-1 and MITF represent unique proto-oncogenes activated and over-expressed in a significant portion of lung adenocarcinomas and melanomas. In the same lung cancer study, a number of additional candidate tumor suppressor genes were allocated to recurrent deletions, such as the tyrosine phosphatase PTPRD at 9p23 and the phosphodiesterase PDE4D at 5q11.2; pointing to these as important functional pathways frequently inactivated in cancers. SNP arrays naturally continue to play an important role in vast linkage analysis projects aiming to identify novel cancer susceptibility genes. In breast cancer, a recent genome-wide association study (23) comprised more than 4,400 tumors and 4,300 control samples, followed by an even larger panel of samples from >21,000 cancer cases and 22,500 healthy control donors for subsequent validation. In
A Decade of Cancer Gene Profiling
81
this huge cohort, Affymetrix 500K SNP arrays were used to identify a panel of putative cancer-predisposing gene variants. It is interesting to note that, in GEO, SNP array data are highly over-represented compared with CGH. As of December 2007, GEO contained more than 11,000 SNP array data, compared with only 745 CGH arrays (Table 2b). Although Affymetrix SNP arrays have been commercially available for several years, BAC-based CGH platforms have previously been a rather exclusive technology produced only at a few large genome centers. This is now significantly changing, with many commercial providers of CGH arrays entering the market. A literature research in NCBI PubMed (not shown) revealed that both SNP and CGH arrays are both mentioned in >500 publications. However, while CGH arrays are almost exclusively being applied in cancer-related projects, SNP arrays were traditionally and primarily linked to the mapping of metabolic and neurological diseases, only a small fraction addressed neoplasia. Nevertheless, SNP arrays are rapidly gaining a strong foothold in the cancer field as well, definitely helped by the dramatic increase in density. An aspect that is generally poorly addressed in mapping studies is that of chromosomal translocations. However, balanced and reciprocal chromosomal translocations are extremely frequent in most types, and can result in the generation of fusion genes with novel oncogenic properties. The problem is that balanced translocations do not usually lead to massive loss or gain of DNA at the recombination site, and are therefore not readily detectable by SNP or CGH arrays. Not surprisingly, almost two thirds of the genes in the Sanger Centre Cancer Gene Census (24) represent reciprocal fusion partners. These are primarily found in bloodrelated and mesenchymal cancers, but hardly any were described in epithelial cancers that represent 90% of the entire cancer burden. The fusion-gene concept has now attained renewed attention in epithelial cancers, mainly due to the discovery of recurrent fusions of TMPRSS2 and ERG and other ETS family genes in prostate cancer (25, 26). This translocation represents one of the most frequent alterations in cancer as a whole, and it is surprising in retrospect that it was only identified in the year 2005.
5. Deep Sequencing Technologies
While the resolution of CGH and SNP arrays has dramatically increased in less than 10 years, resequencing approaches have effectively reduced the resolution to the single-nucleotide level. High-density tiling array platforms may in principle achieve the same goal, but at a comparable high cost and effort. Array-based
82
Sara, Kallioniemi, and Nees
sequencing, or “sequencing by hybridization,” is an old idea that goes back to the early 1990s (27). NimbleGen Inc., for example, offers their Comparative Genome Sequencing (CGS) technology as a viable alternative for the analysis of bacterial genomes. CGS provides an efficient, high-throughput, and cost-effective method for genome-wide analysis, but it is restricted to genomes in the 3- to 5-Mb range, and thus not suitable for eukaryotic (and cancer) genomes that are 1,000 times larger. The largest number of oncogenic alterations in cancers is probably attributed to somatic point mutations that result in proteins with gain-of-function such as activated oncogenes, or inactivate tumor suppressor/caretaker genes. Point mutations represent the technically most demanding end of the spectrum of cancer-relevant changes, because they require massive automated DNA-sequencing technologies. PCR-based, massively parallel sequencing technologies (MPSS) such as Solexa (Illumina Inc.), SOLiD (Applied Biosystems ABI), and 454 (Roche Inc.) (28, 29) seriously compete with array-based technologies in both price and throughput. Currently, pricing for these sequencing platforms is in the range of $1–6/per base. A single run costs between $8,000 and $10,000, and generates up to 1 Gbp of sequence data. Sequencing with 454 generates larger fragments (250 bp compared with 25 bp or 35 bp for SOLiD and Solexa), but only 0.1 Mbp per run in total. The amount of data generated is immense, as is the requirement for data storage capacity. Most large-scale sequencing efforts to date have focused on genes or gene families that are over-represented in cancer. This includes, naturally, panels of known oncogenes/tumor suppressor genes, or the kinases and phosphatases as the most frequently mutated functional protein families. Early on, kinase screens performed by the cancer genome project (CGP) of the Sanger Institute (http://www.sanger.ac.uk/CPG) yielded a spectacular, more than encouraging hit – BRAF mutations mutated in 70% of all melanomas (30). Parallel approaches at the Johns Hopkins Cancer Center (http://www.hopkinskimmelcancercenter.org), covering both the kinome (31) and protein tyrosine phosphatases (32) in breast, colorectal, and gastric cancers, were equally encouraging. The PIK3CA gene was identified as one of the most frequently mutated genes in breast cancers (33) and a panel of cancer cell lines. The “deep sequencing” of cancer cell lines (34), although frequently criticized, represents a valid alternative to cancer samples. As already outlined in the previous section on CGH, cell lines not only continue to harbor the same mutations and CNV as primary tumors, they also provide researchers with an opportunity to functionally characterize the impact of somatic alterations for drug development. Cell lines also offer the advantage of limited heterogeneity and less “contamination” with nontumor cells such as tumor stroma,
A Decade of Cancer Gene Profiling
83
improving sensitivity issues when it comes to the identification of somatic point mutations. Larger-scale resequencing studies on the 518 human kinases, conducted at both centers, were rewarded with a large number of somatic mutations (35–37), confirming once again that kinases and phosphatases represent the most frequently mutated gene families in cancer. The largest of these studies to date resulted in more than 1,000 somatic kinase mutations identified in 210 tumors (38). Another recent PCR-based sequencing screen covered 238 known oncogene mutations in 14 frequently mutated oncogenes across >1,000 tumor samples and 17 different tumor types (39). This study revealed not only low-frequency mutation rates in cancer types previously not associated with many of these oncogenes, but also a previously unknown and widespread “partnering of mutations” for functionally related pairs of oncogenes. It had been previously assumed that a single mutation within a critical pathway (such as the RAS pathway) is sufficient for a functional activation or inactivation within that pathway, excluding additional mutations in related genes. Their obviously recurrent existence, however, points to a potential complementary function of some of the most recurrent mutations, and confirms that pathways may in fact be targeted by more than one hit. Encouraged by these successes, unbiased large-scale tumorresequencing approaches were launched – studies not targeting specific gene families in particular. However, an initial attempt covering 1,811 exons of 470 genes identified only three somatic mutations in colorectal cancers (40). It became immediately clear that much larger-scale sequencing exercises were needed. Recent studies have sequenced basically all genes in the RefSeq database, that is, 20,857 transcripts and 18,191 genes, in a set of 11 breast and 11 colon cancers (41, 42). Genes were PCR amplified, one exon at a time, and subsequently sequenced. This effort has resulted in a set of 280 genes that were mutated in at least one tumor, indicating clearly that cancers do contain a large number of point mutations. The large number of mutations, averaging 90 per sample, came as a surprise, and created the rather difficult task to distinguish “driver mutations,” which are positively selected for in cancer progression, from “passenger mutations.” Using statistical methods, candidate genes were selected that most likely represent driver genes, to be followed-up in additional cancer cohorts. A different gene selection strategy was based on pathways previously implicated in cancers; those were used for validation based on a second “test set” of 96 tumors. From these data, a surprising variety of mutations becomes apparent; this allows a true comprehensive fine-mapping of somatic alterations in cancers. A number of additional tumor resequencing projects have been launched, for example, the cancer gene atlas (TCGA) (cancergenome.nih.gov) at the NIH, and the tumor sequencing
84
Sara, Kallioniemi, and Nees
project (TSP) (http://www.genome.gov) at NHGRI, which will be jointly funded together with the TCGA later on. Both projects will also become principal components of the ICGC initiative during 2008.
6. Future Outlook and Perspectives During the past decade, combined efforts have identified hundreds of genes mutated in cancers – collected at the COSMIC database (http://www.sanger.ac.uk/genetics/CGP/cosmic). The various mutations identified in most of the studies described in the previous section are compiled, in a searchable format, at http: // cbio.mskcc.org/cancergenes. Intriguing as the vast list of cancer genes may seem, nobody would assume this list to be complete yet. There is ample evidence that we will continue to identify novel cancer genes, particularly in anticipation of the planned large-scale sequencing projects. Even genes mutated at high frequencies (>5%) in certain malignancies most likely still await discovery, as demonstrated by the recent findings of recurrent BRAF and MITF mutations in melanoma (22, 30) or PIK3CA in breast cancer (33). Up to 70% of all melanomas contain a BRAF mutation, representing one of the highest mutation frequencies ever found, only rivaled by “classic” targets such as the p53 tumor suppressor gene. Such novel cancer genes represent excellent vantage points for future cancer therapeutics, highlighting possible oncogene dependencies and vulnerabilities that may be specifically targeted by “designed” drugs. These strategies are in principle modeled after the poster-child success stories of drugs targeting genes such as BCR/ABL and c-Kit (Gleevec), ERBB2 (Herceptin/trastuzumab), EGFR (Iressa, Tarceva), or VEGFR (avastin). There may not be a lack of novel targets after all. It is primarily mechanistic information that is missing so that cancer genomics could direct the drug discovery process. For this purpose, integrative approaches, combining different genome-wide technologies such as mRNA gene expression and DNA copy number alterations, have proven to be extremely powerful and have already resulted in greatly improved mechanistic insights. Such “overlay” strategies have already helped to identify and validate novel “driver” genes and pathways in cancer biology (18, 19). However, such “holistic” efforts need to become more routine strategy, and there is no lack of different aspects that might be taken into the big picture. The contribution of microRNA arrays (43) is only now emerging, and so is the role of genomewide analyses of epigenetic alterations (44–46). Transcriptional profiling technologies also continue to advance. Alternatively spliced mRNA variants are now routinely detectable by exon
A Decade of Cancer Gene Profiling
85
arrays (e.g., Affymetrix Exon 1.0 arrays, and other providers such as ExonHit and JIVAN Inc.). The large number of transcriptomics data available to the research community already now needs to be mined in a more comprehensive fashion as well. Large-scale initiatives to mine this information are only now beginning, with search engines such as Oncomine, GENOMICA, and geneVestigator allowing the identification of cancer-specific functional modules in cancer (47). Metabolic and proteomic fingerprints as well as the mathematical analysis and modeling of “-omics” data may complete our comprehensive understanding of the molecular deregulation of cancer cells in vitro and in vivo. Last but not least, high-throughput small interfering RNA (siRNA) and short hairpin RNA (shRNA) technologies are primarily intended toward gaining knowledge about gene function. Functional profiling technologies have a particularly strong potential when combined with mapping of the physical cancer genome or transcriptome. This was recently exemplified by ref. (48), integrating mRNA expression and genome mapping with functional shRNA screening data. They identified IKBKE as a recurrent target of amplifications in breast cancer, pointing once again to the functional activation of the NFkB pathway in tumorigenesis – a finding shared by the largest-scale sequencing efforts to date, but based on an almost entirely different set of genes (42). As the result of many cancer genome analyses already existing to date, a core panel of only 15–20 critical cancer pathways is emerging, which includes a much larger number of cancer genes that are mutated or silenced in neoplasias (49). These findings are in principle confirmed by large-scale transcriptomics studies that identify the same activated pathways by bioinformatics tools such as clustering and functional gene annotation (amigo.geneontology.org; david.abcc.ncifcrf.gov), gene set enrichment analysis (http://www.broad.mit.edu/gsea), or use a systematic approach for the discovery of functional connections among diseases, genetic perturbation, and drug action (Connectivity Map; http://www.broad.mit.edu/cmap). A relatively small set of “usual suspects,” genes that form outstanding peaks on the cancer genome map according to their extraordinary mutation frequency, have been identified in the past. However, most genes are found to be mutated at a very low frequency, which creates the principal problem of sorting the wheat from the chaff. The most powerful strategies for this purpose might be based on integrative “overlay” and functional studies. It has already become apparent that the abundance of somatic mutations most likely impacts on many genes, but only a small set of functional pathways. The identification of the spectrum of somatic alterations in the cancer genomes therefore represents only half of the way toward the goal of improved cancer therapy. The most difficult part, translating these data into knowledge and novel therapies, still lies ahead.
86
Sara, Kallioniemi, and Nees
References 1. Bennett, S.T. et al. (2005) Toward the 1,000 dollars human genome. Pharmacogenomics 6, 373–382 2. Church, G.M. (2006) Genomes for all. Sci. Am. 294, 46–54 3. Collins, F.S. and Barker, A.D. (2007) Mapping the cancer genome. Pinpointing the genes involved in cancer will help chart a new course across the complex landscape of human malignancies. Sci. Am. 296, 50–57 4. Schena, M. et al. (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270, 467–470 5. Shalon, D. et al. (1996) A DNA microarray system for analyzing complex DNA samples using two-color fluorescent probe hybridization. Genome Res. 6, 639–645 6. Schena, M. et al. (1996) Parallel human genome analysis: microarray-based expression monitoring of 1000 genes. Proc. Natl. Acad. Sci. USA 93, 10614–10619 7. DeRisi, J. et al. (1996) Use of a cDNA microarray to analyse gene expression patterns in human cancer. Nat. Genet. 14, 457–460 8. Lipshutz, R.J. et al. (1995) Using oligonucleotide probe arrays to access genetic diversity. BioTechniques 19, 442–447 9. Brazma, A. et al. (2001) Minimum information about a microarray experiment (MIAME)toward standards for microarray data. Nat. Genet. 29, 365–371 10. Perou, C.M. et al. (1999) Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. Proc. Natl. Acad. Sci. USA 96, 9212–9217 11. Sorlie, T. et al. (2001) Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc. Natl. Acad. Sci. USA 98, 10869–10874 12. van ‘t Veer, L.J. et al. (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 13. van de Vijver, M.J. et al. (2002) A geneexpression signature as a predictor of survival in breast cancer. N. Engl. J. Med. 347, 1999–2009 14. Kallioniemi, A. et al. (1992) Comparative genomic hybridization for molecular cytogenetic analysis of solid tumors. Science 258, 818–821 15. Pinkel, D. et al. (1998) High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nat. Genet. 20, 207–211
16. Pollack, J.R. et al. (1999) Genome-wide analysis of DNA copy-number changes using cDNA microarrays. Nat. Genet. 23, 41–46 17. Bergamaschi, A. et al. (2006) Distinct patterns of DNA copy number alteration are associated with different clinicopathological features and gene-expression subtypes of breast cancer. Genes Chromosomes Cancer 45, 1033–1040 18. Chin, K. et al. (2006) Genomic and transcriptional aberrations linked to breast cancer pathophysiologies. Cancer Cell. 10, 529–541 19. Neve, R.M. et al. (2006) A collection of breast cancer cell lines for the study of functionally distinct cancer subtypes. Cancer Cell 10, 515–527 20. Fridlyand, J. et al. (2006) Breast tumor copy number aberration phenotypes and genomic instability. BMC Cancer 6, 96 21. Weir, B.A. et al. (2007) Characterizing the cancer genome in lung adenocarcinoma. Nature 450(7171), 893–898 22. Garraway, L.A. et al. (2005) Integrative genomic analyses identify MITF as a lineage survival oncogene amplified in malignant melanoma. Nature 436, 117–122 23. Easton, D.F. et al. (2007) Genome-wide association study idntifies novel breast cancer susceptibility loci. Nature 447, 1087–1093 24. Futreal, P.A. et al. (2004) A census of human cancer genes. Nat. Rev. Cancer 4, 177–183 25. Tomlins, S.A. et al. (2005) Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science 310, 644–648 26. Tomlins, S.A. et al. (2007) Distinct classes of chromosomal rearrangements create oncogenic ETS gene fusions in prostate cancer. Nature 448, 595–599 27. Lipshutz, R.J. (1993) Likelihood DNA sequencing by hybridization. J. Biomol. Struct. Dyn. 11, 637–653 28. Margulies, M. et al. (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376–380 29. Emrich, S.J. et al. (2007) Gene discovery and annotation using LCM-454 transcriptome sequencing. Genome Res. 17, 69–73 30. Davies, H. et al. (2002) Mutations of the BRAF gene in human cancer. Nature 417, 949–954 31. Bardelli, A. et al. (2003) Mutational analysis of the tyrosine kinome in colorectal cancers. Science 300, 949 32. Wang, Z. et al. (2004) Mutational analysis of the tyrosine phosphatome in colorectal cancers. Science 304, 1164–1166
33. Samuels, Y. et al. (2004) High frequency of mutations of the PIK3CA gene in human cancers. Science 304, 554 34. Ikediobi, O.N. et al. (2006) Mutation analysis of 24 known cancer genes in the NCI-60 cell line set. Mol. Cancer Ther. 5, 2606–2612 35. Stephens, P. et al. (2005) A screen of the complete protein kinase gene family identifies diverse patterns of somatic mutations in human breast cancer. Nat. Genet. 37, 590–592 36. Futreal, P.A. et al. (2005) Somatic mutations in human cancer: insights from resequencing the protein kinase gene family. Cold Spring Harb. Symp. Quant. Biol. 70, 43–49 37. Davies, H. et al. (2005) Somatic mutations of the protein kinase gene family in human lung cancer. Cancer Res. 65, 7591–7595 38. Greenman, C. et al. (2007) Patterns of somatic mutation in human cancer genomes. Nature 446, 153–158 39. Thomas, R.K. et al. (2007) High-throughput oncogene mutation profiling in human cancer. Nat. Genet. 39, 347–351 40. Wang, T.L. et al. (2002) Prevalence of somatic alterations in the colorectal cancer cell genome. Proc. Natl. Acad. Sci. USA 99, 3076–3080
A Decade of Cancer Gene Profiling
87
41. Sjoblom, T. et al. (2006) The consensus coding sequences of human breast and colo rectal cancers. Science 314, 268–274 42. Wood, L.D. et al. (2007) The genomic landscapes of human breast and colorectal cancers. Science 318(5853), 1108–1113 43. Calin, G.A. and Croce, C.M. (2006) MicroRNA signatures in human cancers. Nat. Rev. Cancer 6, 857–866 44. Stransky, N. et al. (2006) Regional copy numberindependent deregulation of transcription in cancer. Nat. Genet. 38, 1386–1396 45. Barski, A. et al. (2007) High-resolution profiling of histone methylations in the human genome. Cell 129, 823–837 46. Taylor, K.H. et al. (2007) Ultradeep bisulfite sequencing analysis of DNA methylation patterns in multiple gene promoters by 454 sequencing. Cancer Res. 67, 8511–8518 47. Tomlins, S.A. et al. (2007) Integrative molecular concept modeling of prostate cancer progression. Nat. Genet. 39, 41–51 48. Boehm, J.S. et al. (2007) Integrative genomic approaches identify IKBKE as a breast cancer oncogene. Cell 129, 1065–1079 49. Vogelstein, B. and Kinzler, K.W. (2004) Cancer genes and the pathways they control. Nat. Med. 10, 789–799
Chapter 6 Mining Expressed Sequence Tag (EST) Libraries for Cancer-Associated Genes Armin O. Schmitt Summary Originally established in the beginning of the 1990s as a direct route to gene finding, expressed sequence tags (ESTs) still lend themselves as a means to analyze gene expression in almost all human tissues. The type of questions that can be addressed using public EST libraries ranges from tissue-specific gene profiling to the comparison between tissues in diseased and healthy states. Thanks to a multitude of web-based online bioinformatics resources, mining in EST libraries is not restricted to experts in the field of data analysis, but can readily be performed by the medical or life scientist. In this chapter, a couple of cases studies are presented that guide the scientist to the most useful online resources so that they can conduct their own research. Key words: Gene expression, Differential gene expression, cDNA library, One-pass sequencing, Expressed sequence tag (EST), dbEST, Cancer genes, Online web tools, Statistical analysis, Bioinformatics
1. Introduction The only way to collect an organism’s full gene complement is, of course, to determine its genomic sequence. The technology of full-genome sequencing was developed in the late 1990s and culminated in the sequencing of the human genome in 2001. However, even before that era, scientists knew approximately the range of genes that can be found in a eukaryotic organism. This knowledge was gained primarily thanks to a technology called expressed sequence tags (ESTs) (1). An EST is a short sequence, approximately 400- to 600-bases long, that is obtained by sequencing a messenger RNA (mRNA) just once. This leads Robert Grützmann and Christian Pilarsky (eds.), Cancer Gene Profiling: Methods and Protocols, Methods in Molecular Biology, vol. 576, DOI 10.1007/978-1-59745-545-9_6, © Humana Press, a part of Springer Science + Business Media, LLC 2010
89
90
Schmitt
to a relatively high error rate of approximately 2–5% because possible errors cannot be corrected which would be done with multiple sequencing. The principal feature of an EST is that it represents a gene despite its short length and its relative inaccuracy. It can be associated uniquely and safely with exactly one gene due to the still high sequence similarity. An EST can thus be seen as a gene’s signature, as a pars pro toto. Furthermore, no intronic sequence, the noncoding part between the exons, will be copied into mRNA, and, hence, into an EST, although introns are generally considered an integral part of genes. With the exception of 3¢ ESTs, which often overlap so called untranslated regions (UTRs), ESTs generally represent coding sequence, i.e., sequence that is used as construction plan for proteins. A whole EST library is generated by randomly picking clones from a cDNA library that was produced from a well-defined tissue and by sequencing these clones from the 5¢ end to the 3¢ end. It is in the nature of the construction of EST libraries that the genes whose mRNA is frequent in the tissue under investigation will be represented with many copies in the EST library. And likewise, the genes that are rare in the tissue will be represented with few copies in the EST library. We can bona fide assume proportionality between the expression strength of a gene in a certain tissue or cell type and the number of ESTs with which the gene is represented in an EST library. Briefly, in an ideal EST library, the number of mRNA copies of an mRNA type is reflected by its number of ESTs in the EST library. A shortcoming of the EST approach is certainly that rare genes in a tissue will very likely not be present in the EST library at all. It is, therefore, not valid to conclude that a gene is not expressed in a certain tissue from the fact that it is not represented in an EST library. In order to reduce the complexity of an EST library and, thus, to facilitate gene discovery, subtracted and normalized EST libraries can be generated (2). These are methods to get rid of the range of few, but highly expressed genes, so-called housekeeping genes, that are well studied and not relevant for further analyses. It must, however, be noted that for analysis of gene expression ratios, such subtracted or normalized libraries are not suitable because the number of ESTs in them is in general not proportional to their copy number in the tissue. The most prominent collection of public EST data is the database dbEST (3, 4), maintained by the National Center for Biotechnology Information (NCBI) in the United States (http://www.ncbi.nlm.nih.gov/ dbEST). Currently, dbEST comprises more than 62 million ESTs from several hundred organisms, the highest numbers for individual organisms being eight million for human, almost five million for mouse, and more than one million for eleven other organisms including economically important domestic species such as cattle, pig, rice, maize, and wheat.
Mining Expressed Sequence Tag (EST) Libraries for Cancer-Associated Genes
91
Nowadays, in the whole genome sequencing era, the EST approach has nothing lost of its significance. EST libraries are being produced in cases in which sequencing of a whole genome is difficult because of its large size or because of its repetitive structure (5). In combination with genomic data, EST libraries are an indispensable resource to predict genes and to determine gene boundaries and gene structures. For these purposes, ESTs are aligned against the genomic sequence; exons are identified whenever an EST or a part of an EST can be aligned sufficiently well with a stretch of genomic DNA (6). ESTs are furthermore particularly helpful in the detection of alternative splicing (7). Before the advent of cheap and massive microarrays, EST libraries were the only high-throughput method for differential gene expression studies in, for instance, healthy and cancerous tissue (8). Analysis of gene expression using EST libraries ab initio involves a complex experimental and analytical pipeline. At the beginning of such a pipeline, EST libraries have to be generated. This includes preparation of the tissue, cloning mRNAs, and sequencing of the clones, as described, e.g., in ref. (9). Once sequenced, the ESTs have to be processed in many ways to remove experimental artifacts and to control their quality. This includes the removal of vector sequences and low-quality sequences and the masking of repetitive sequences (10). In a next step, ESTs derived from one and the same mRNA can be assembled, i.e., for each EST of an EST library, all other ESTs from that or any other library that have sufficiently, usually more than 95%, sequence similarity with it are searched by the sequence comparison program BLAST (11). The group of all ESTs that are sufficiently similar to each other is called an EST cluster. A consensus sequence can be determined from such an EST cluster by finding the most frequent nucleotide for each position of the cluster. This consensus sequence is typically longer than the sequence of the individual ESTs obtained from a clone because the ESTs are not derived from exactly the same parts of the clone. An iterative search and assembly strategy was suggested that in many cases allowed one to obtain the full-length sequence of a gene (8). All publicly available ESTs are disposable as clusters in the Unigene databases (12, 13). Because each of the above-sketched analysis steps demands expertise in molecular biological or bioinformatics knowledge, the detailed descriptions would go beyond the scope of this chapter. For the biomedical investigator who is interested in generating and analyzing EST libraries of their own, the most important tools for the processing are reviewed in ref. (10). Luckily, a wealth of stateof-the-art prepared and documented EST libraries is publicly accessible so that, together with online web-based analysis tools, enough material is provided for analyses of various kinds. In the remainder of this chapter, we will therefore focus on analytical methods that can be applied with an ordinary browser on a standard PC.
92
Schmitt
2. Materials All that is needed to carry out the analyses sketched under Subheading 3 is a PC with a browser such as Firefox and an internet connection.
3. Methods One of the most important types of analyses that are possible with publicly available EST libraries and tools is the search for genes that are differentially expressed between cancerous and healthy tissues. The Cancer Genome Anatomy Project (CGAP) serves exactly this purpose (14, 15). Currently, CGAP comprises 276 human EST libraries, 176 of which were derived from cancerous tissues, the rest being shared by healthy or uncharacterized tissues (see Note 1). Forty-one tissue types are currently represented, with the most represented tissues being brain, colon, lung, and ovary. Two closely related tools to identify differentially expressed genes are offered: the xProfiler (http://cgap.nci.nih.gov/Tissues/ xProfiler) and the cDNA Digital Gene Expression Displayer (DGED) (http://cgap.nci.nih.gov/Tissues/GXS). The analysis sessions for the two tools are practically identical, only the output will be different (see Note 1). In the following, we are interested in finding genes that could play a role in human colon cancer. 3.1. Identification of Differentially Expressed Genes
We would like to obtain very reliable results and include therefore only libraries gained from microdissected tissues. Furthermore, we would like to calculate expression ratios, i.e., numbers that tell us how many times a gene occurs more (or less) often in a cancerous tissue than in the healthy tissue. We therefore confine our analysis to non-normalized libraries because only this type guarantees proportionality between mRNA abundance and EST number. 1. Point your browser to http://cgap.nci.nih.gov/Tissues/ XProfiler (see Note 2). 2. Select the organism homo sapiens (default). 3. Select the library group “All EST libraries” (default). Unless there are very good reasons why you would like to exclude EST libraries from a specific center or project, it is recommended to include all EST libraries to gain statistical power. 4. Select the minimum number of sequences/library. I recommend not excluding any EST library just because it is small. Therefore, it is advisable to change the default (10 for profiler and 1,000 for DGED) to 0.
Mining Expressed Sequence Tag (EST) Libraries for Cancer-Associated Genes
93
5. Do not alter the default setting in “List libraries by.” 6. In Pools A and B, choose “Colon” in “Tissue Type.” Be sure that the “Include” radio button is activated. 7. In Pools A and B, choose “Microdissected” in “Tissue Preparation.” 8. In Pools A and B, choose “Non-normalized” in “Library Protocol.” 9. In Pool A, choose “Normal” and in Pool B, choose “Cancer” in “Tissue histology.” If you would like to include tissues in a precancerous state, choose “Pre-cancer” in addition to “Cancer.” This is achieved by first pressing “Control” on your keyboard and then by clicking on “Pre-cancer.” 10. Do not enter anything into “Library Name” for Pool A and B. This field should be used only if you want to compare two specific EST libraries whose exact identifiers have to be known to you. This is not the case in a typical exploratory analysis, such as is described here. 11. Next, click the “Submit Query” button and be patient. The server is easily busy for a minute or more depending on the time of the day. 12. As an intermediate result, the list of libraries fulfilling your requirements is presented. Go carefully through the short description that is provided. The names of the libraries are linked to a more detailed description in case you need more information. At any time, you can return from the detailed description via the return button of your browser. Make sure that you want to include the presented libraries in your analysis. Otherwise, exclude the library from the pool by clearing (deselecting) the corresponding box. 13. If you are convinced that the assignment of the libraries to Pools A and B is correct, submit the query again. 14. After a while, Xprofiler presents its results as a table containing numbers of genes. Genes are classified as “Unique,” which means that they occur exclusively in colon (regardless of the tissue histology) or “Non-Unique”; and as “Known,” which means that they were characterized in earlier studies and that their function is at least partially known or as “Unknown.” The number of the such-classified genes is given in pool A and B separately, in the union of A and B, in the intersection of A and B, and in the differential subsets (in A, but not in B; in B, but not in A). 15. Clicking on the highlighted numbers will provide the corresponding lists of genes. Clicking on “Gene info” will offer you a plethora of information related to a gene in which you are interested.
94
Schmitt
A session with Xprofiler gives you information about the presence in and absence from libraries for genes. The most informative part of the output is probably the genes hidden behind “A minus B” and “B minus A,” because they seem to be strictly related to cancer in the sense that they are either activated by cancer or suppressed by cancer. The output does not tell you how many times a gene is represented in one of the pools, which could be a valuable information for you. In the case of presence in both pools, you have no possibility to judge in which pool it is prevailing. To address questions of this type, DGED is the correct tool. As aforementioned, the analysis is analogous to that with Xprofiler. To start DGED, point your browser to http://cgap.nci.nih. gov/Tissues/GXS and proceed until step 11 as described above (see Note 2). As an intermediate result, you now receive, in addition to the list of libraries with their assignment to the pools, three more criteria that you can use to filter your results. First, you have to decide on the expression ratio. The default value of 2 means that genes that are at least twice overexpressed in either the healthy or the cancer pool will be shown. Second, you have to decide which statistical significance is appropriate for your analysis. The default value of 0.05 means that a distribution of ESTs of a gene as uneven or more uneven than the one that you observe between the two pools can happen by chance in 5% of all cases. Last, you can confine your search to a given chromosome. In an initial search, it is advisable to leave the default settings unaltered. A search for genes differentially expressed in colon with the default settings would lead to no result (“No tags were found”). Obviously, our criteria were too strict. We can repeat the last step of the analysis via the reverse button of your browser and then setting F and P equal to 1. This shows us all 132 genes. An inspection of the list that we obtained indicated that all genes were represented by very few ESTs (column “Sequences”). The most significant P value is 0.15 for the gene RPL13, a ribosomal protein. We can loosen our criteria in many different ways. If we decide to include bulk tissue, for example, we have to repeat the analysis and alter the selection in step 7. This time, 47 genes fulfill the criteria of differential expression and statistical significance. The “best” gene is now COX6C, the cytochrome c oxidase subunit Vic, which occurs 12 times in the pool of healthy tissue and which is absent from the pool of cancerous tissue. The P value for this partition is smaller than the precision provided in the list, therefore, it is indicated as 0.00. It must, however, be noted that P values can never attain the value of 0 exactly. In the column marked “Seq Odds A:B,” we see the symbol NaN, which stands for “not a number” (see Note 3). This is because we have no representative ESTs for this gene in one of the pools. Because this value is calculated as the ratio of the relative frequencies of an EST type in the two pools, this would mean division by 0, which is, of course, not defined. In such cases, the P value has to serve as
Mining Expressed Sequence Tag (EST) Libraries for Cancer-Associated Genes
95
the sole criterion to judge the occurrence pattern of a gene. In cases of very clear P values (very close to 0), this is no considerable limitation. Along similar lines, the two above-presented tools, Xprofiler and DGED, can be used to compare different tissues and to extract, e.g., genes whose expression differs between colon and prostate. A very useful type of analysis would be the search for genes that are specific for a tissue. This can be easily realized by setting up one pool with the tissue under investigation and the other pool with all other tissues using the “Exclude” radio button mentioned under step 6 in Subheading 3.2. A very similar tool is Digital Differential Display (DDD), which can be started by pointing your browser to the Unigene home page http://www.ncbi.nlm.nih.gov/sites/entrez (see Note 2), and then choosing “DDD” from the left menu bar. The main advantage over Xprofiler and DGED is that you can assign names to your pools and that the results are highlighted visually. The disadvantage is that you cannot determine thresholds for the expression ratio and the P value and that these two values are not presented in the output (see Note 4). You can, however, determine the expression ratio easily yourself by dividing the two relative frequencies of ESTs in the pools for a given gene. Examples for medically important results obtained with DDD are presented, e.g., in refs. (16, 17). 3.2. Mining EST Libraries for Genes That Are Coexpressed with an Interesting Gene
Another interesting way to exploit EST libraries is to use them to predict gene function for novel genes. The rationale behind it is that a priori unknown genes that behave similarly to wellcharacterized genes could have a comparable function or could take part in the same pathway. By “similar behavior,” we mean similar expression profiles across a multitude of different tissues. Put simply: the unknown genes are highly expressed in the same tissues and are barely or not expressed at all in the same tissues as the well-characterized gene. This idea was termed “guilt by association (GBA)” and was described in ref. (18). An application example how Parkinson’s disease genes could be found by GBA is given in ref. (19). You can carry out GBA analyses using the GBA server provided by the University of Peking. In our case study, we are interested in finding genes that are co-regulated with the well-known breast cancer gene BRCA1. Unfortunately, the web interface is not very comfortable because you have to enter the Unigene ID, but the “GBA Gene Matcher” offered on the right menu bar is not active. However, we can readily find out the Unigene ID for our gene of interest thanks to another project, called GeneCards (20). 1. Point your browser to http://www.genecards.org (see Note 2). 2. Beneath “SEARCH,” enter “brca1” into the field and click “Symbol only” under “Search by.” 3. Under “Options,” activate “Show microcards only” and “Sort microcards alphabetically.”
96
Schmitt
4. Press the “Go” button. 5. Look for the text string “unigene cluster” using the search function of your browser. The ID of the structure “Hs. XYZ,” where XYZ is a number, is the Unigene ID of your gene. It is Hs.194143 for BRCA1. 6. Now the actual GBA analysis can start. Point your browser to http://gba.cbi.pku.edu.cn:8080/gba (see Note 2). 7. Find the “GBA Engine” button on the left side of the browser and click on it. 8. Enter the Unigene ID in the field next to “UniGeneID” and your e-mail address into the field next to “Your Mail Address.” 9. Leave all other default settings as they are and press the “Search” button. 10. The result will be shown on the screen and be sent to you by e-mail after a while. 11. The result page shows you a list of 30 genes that occur preferentially in the same EST libraries as BRCA1 and that, furthermore, preferentially do not occur in the same EST libraries where BRCA1 does not occur. This list is ordered by statistical significance, i.e., the top-ranking gene is the one with the smallest P value (see Note 4). 12. Choosing the “Show” button for “more information on co-expressed genes” presents you a list with a short description of the gene function and the usual gene name. 13. Clicking on the highlighted UnigeneIDs in the result page mentioned in step 11 provides you with very detailed information about the genes, such as, for instance, the LocusLink entry or gene ontology (GO) terms (21) associated with the genes. 14. The “Run GBA” button will start a new GBA search for the given gene from the list (not for BRCA1 again).
4. Notes 1. Molecular biological methods evolve at a very fast pace, and so do the techniques to analyze and organize any data derived from the high-throughput application of them. This holds also true for the content behind and the appearance of online web tools such as those described here. So be prepared that you probably will not be able to reproduce exactly the examples given in this chapter. Due to the increase or removal of EST libraries, the results will change accordingly. This is, however, very natural and does not mean that anything went “wrong” in your analysis.
Mining Expressed Sequence Tag (EST) Libraries for Cancer-Associated Genes
97
2. In addition, be prepared that the internet address (URL) that leads you to the online web tools can alter due to reorganization of the great molecular biological research centers such as NCBI. If this has happened, there is still a great chance to locate the tool using a general search engine such as Google. Typing in keywords like “xprofiler, est” will probably lead you quickly to the new URL. 3. Unfortunately, terminology used in the field of biomolecular analyses, as in many other fields of science, sometimes is not unambiguous. For example, the term “odds ratio” is used in the DGED tool in a different way than it is used in medical statistics, where it is used to quantify risks (see Subheading 3.1). 4. Statistical issues are an integral part of bioinformatics analyses such as those described here. Of particular importance is the P value, which tells you how likely it is to observe a result (such as differential gene expression) by chance, i.e., if in reality there is no differential expression, but your finding has been rather due to an untypical sampling procedure. Data mining typically produces lists of genes (or any other biological entities) rather than singular genes. If the analysis includes statistical testing, i.e., the calculation of P values, then statisticians speak of “multiple testing.” In such a case, it has to be considered that the P values are calculated as if only one gene was analyzed. To obtain P values that are more realistic, they have to be corrected. The simplest such correction method is the so-called Bonferroni correction. It says that all P values obtained in multiple testing should be multiplied by the number of tests performed. Therefore, if you obtained a list including 12 genes in an analysis, multiply each P value by 12. Notice that some applications provide the possibility of correcting P values, such as, for instance, the GBA engine (see step 11 in Subheading 3.1). References 1. M.D. Adams, M.B. Soares, A.R. Kerlavage, C. Fields, and J.C. Venter. Rapid cDNA sequencing (expressed sequence tags) from a directionally cloned human infant brain cDNA library. Nat. Genet., 4:373–380, 1993. 2. M.F. Bonaldo, G. Lennon, and M.B. Soares. Normalization and subtraction: two approaches to facilitate gene discovery. Genome Res., 6: 791–806, 1996. 3. M.S. Boguski, T.M. Lowe, and C.M. Tolstoshev. dbEST – database for “expressed sequence tags”. Nat. Genet., 4:332–333, 1993. 4. M.S. Boguski. The turning point in genome research. Trends Biochem. Sci., 20:295–296, 1995.
5. J.L. Bennetzen. Mechanisms and rates of genome expansion and contraction in flowering plants. Genetica, 115:29–36, 2002. 6. Z. Kan, E.C. Rouchka, W.R. Gish, and D.J. States. Gene structure prediction and alternative splicing analysis using genomically aligned ESTs. Genome Res., 11:889–900, 2001. 7. B. Modrek, A. Resch, C. Grasso, and C. Lee. Genome-wide detection of alternative splicing in expressed sequences of human genes. Nucleic Acids Res., 29:2850–2859, 2001. 8. A.O. Schmitt, T. Specht, G. Beckmann, E. Dahl, C.P. Pilarsky, B. Hinzmann, and A. Rosenthal. Exhaustive mining of EST libraries for genes differentially expressed in normal
98
9.
10.
11. 12. 13.
Schmitt and tumour tissues. Nucleic Acids Res., 27: 4251–4260, 1999. M.D. Adams, J.M. Kelley, J.D. Gocayne, M. Dubnick, M.H. Polymeropoulos, H. Xiao, C.R. Merril, A. Wu, B. Olde, and R.F. Moreno. Complementary DNA sequencing: expressed sequence tags and human genome project. Science, 252:1651–1656, 1991. S.H. Nagaraj, R.B. Gasser, and S. Ranganathan. A hitchhiker’s guide to expressed sequence tag (EST) analysis. Brief. Bioinform., 8:6–21, 2007. S.F. Altschul, W. Gish, W. Miller, E.W. Myers, and D.J. Lipman. Basic local alignment search tool. J. Mol. Biol., 215:403–410, 1990. M.S. Boguski, and G.D. Schuler. ESTablishing a human transcript map. Nat. Genet., 10:369–371, 1995. G.D. Schuler, M.S. Boguski, E.A. Stewart, L.D. Stein, G. Gyapay, K. Rice, R.E. White, P. Rodriguez-Tomé, A. Aggarwal, E. Bajorek, S. Bentolila, B.B. Birren, A. Butler, A.B. Castle, N. Chiannilkulchai, A. Chu, C. Clee, S. Cowles, P.J. Day, T. Dibling, N. Drouot, I. Dunham, S. Duprat, C. East, C. Edwards, J.B. Fan, N. Fang, C. Fizames, C. Garrett, L. Green, D. Hadley, M. Harris, P. Harrison, S. Brady, A. Hicks, E. Holloway, L. Hui, S. Hussain, C. Louis-Dit-Sully, J. Ma, A. MacGilvery, C. Mader, A. Maratukulam, T.C. Matise, K.B. McKusick, J. Morissette, A. Mungall, D. Muselet, H.C. Nusbaum, D.C. Page, A. Peck, S. Perkins, M. Piercy, F. Qin, J. Quackenbush, S. Ranby, T. Reif, S. Rozen, C. Sanders, X. She, J. Silva, D.K. Slonim, C. Soderlund, W.L. Sun, P. Tabar, T. Thangarajah, N. Vega-Czarny, D. Vollrath, S. Voyticky, T. Wilmer, X. Wu, M.D. Adams, C. Auffray, N.A. Walter, R. Brandon, A. Dehejia, P.N. Goodfellow, R. Houlgatte, J.R. Hudson, S.E. Ide, K.R. Iorio, W.Y. Lee, N. Seki, T. Nagase, K. Ishikawa, N. Nomura, C. Phillips, M.H. Polymeropoulos, M. Sandusky, K. Schmitt, R. Berry, K. Swanson, R. Torres, J.C. Venter, J.M. Sikela, J.S. Beckmann,
14.
15.
16.
17.
18.
19.
20.
21.
J. Weissenbach, R.M. Myers, D.R. Cox, M.R. James, D. Bentley, P. Deloukas, E.S. Lander, and T.J. Hudson. A gene map of the human genome. Science, 274:540–546, 1996. R.L. Strausberg, S.F. Greenhut, L.H. Grouse, C.F. Schaefer, and K.H. Buetow. In silico analysis of cancer through the Cancer Genome Anatomy Project. Trends Cell Biol., 11:66–71, 2001. R.L. Strausberg. The Cancer Genome Anatomy Project: new resources for reading the molecular signatures of cancer. J. Pathol., 195:31–40, 2001. D. Scheurle, M.P. DeYoung, D.M. Binninger, H. Page, M. Jahanzeb, and R. Narayanan. Cancer gene discovery using digital differential display. Cancer Res., 60:4037–4043, 2000. H.L. Yang, E.Y. Cho, K.H. Han, H. Kim, and S.J. Kim. Characterization of a novel mouse brain gene (mbu-1) identified by digital differential display. Gene, 395:144–150, 2007. M.G. Walker, W. Volkmuth, E. Sprinzak, D. Hodgson, and T. Klingler. Prediction of gene function by genome-scale expression analysis: prostate cancer-associated genes. Genome Res., 9:1198–1203, 1999. M.G. Walker, W. Volkmuth, and T.M. Klingler. Pharmaceutical target discovery using Guilt-by-Association: schizophrenia and Parkinson’s disease genes. Proc. Int. Conf. Intell. Syst. Mol. Biol., 282–286, 1999. M. Rebhan, V. Chalifa-Caspi, J. Prilusky, and D. Lancet. GeneCards: integrating information about genes, proteins and diseases. Trends Genet., 13:163, 1997. M. Ashburner, C.A. Ball, J.A. Blake, D. Botstein, H. Butler, J.M. Cherry, A.P. Davis, K. Dolinski, S.S. Dwight, J.T. Eppig, M.A. Harris, D.P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J.C. Matese, J.E. Richardson, M. Ringwald, G.M. Rubin, and G. Sherlock. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet., 25:25–29, 2000.
Chapter 7 Automated Fluorescent Differential Display for Cancer Gene Profiling Jonathan D. Meade, Yong-jig Cho, Blake R. Shester, Jamie C. Walden, Zhen Guo, and Peng Liang Summary Since its invention in 1992, differential display (DD) has become the most commonly used technique for identifying differentially expressed genes because of its many advantages over competing technologies such as DNA microarray, serial analysis of gene expression (SAGE), and subtractive hybridization. A large number of these publications have been in the field of cancer, specifically on p53 target genes. Despite the great impact of the method on biomedical research, there had been a lack of automation of DD technology to increase its throughput and accuracy for systematic gene expression analysis. Many previous DD work has taken a “shotgun” approach of identifying one gene at a time, with a limited number of polymerase chain reactions (PCRs) set up manually, giving DD a low-tech and low-throughput image. We have optimized the DD process with a platform that incorporates fluorescent digital readout, automated liquid handling, and large-format gels capable of running entire 96-well plates. The resulting streamlined fluorescent DD (FDD) technology offers an unprecedented accuracy, sensitivity, and throughput in comprehensive and quantitative analysis of gene expression. These major improvements will allow researchers to find differentially expressed genes of interest, both known and novel, quickly and easily. Key words: Fluorescent differential display, DD, FDD, Differential gene expression, Automation, Cancer gene profiling, Differential display on automated sequencer
1.Introduction The complete sequencing of the 3 billion base pair (bp) human genome was an amazing accomplishment, but the hardest work is still ahead of us. Of the estimated 20,000–25,000 genes embedded in our genome, only a fraction of them, perhaps 10–15%, Robert Grützmann and Christian Pilarsky (eds.), Cancer Gene Profiling: Methods and Protocols, Methods in Molecular Biology, vol. 576, DOI 10.1007/978-1-59745-545-9_7, © Humana Press, a part of Springer Science + Business Media, LLC 2010
99
100
Meade et al.
are “turned on” (expressed as messenger RNAs [mRNAs] for protein synthesis) at any given time in each of our cells. Thus, interpretation of the genomic instructions in the post-genome era will have to rely, at least in large part, on tools that can allow us to determine when and where a gene is to be turned on or off in a cell as it divides, differentiates, and ages. Such tools are also important for the detection of when and where a seemingly precise interpretation of genomic instructions goes awry, which underlies many disease states including cancer. Because so many genes are involved in cancer, it is very difficult to study and understand. Making it even more difficult is that many oncogenes or tumor-suppressor genes are signaling molecules themselves, each of which functions to control the expression of a subset of downstream genes (1, 2). So, the analysis of differential gene expression – also known as gene profiling, expression genetics, or functional genomics – has become one of most widely used strategies for discovering and understanding the molecular circuitry underlying cancer. Differential display (DD) technology (3) is one of the major tools that has already helped thousands of researchers all over the world interpret gene expression in diverse biological systems ranging from yeast, fungi, plants, insects, worms, fish, reptiles, amphibians, to mammals (4–6). Since the invention of DD in 1992, the number of publications using DD has exploded to more than 3,900, outnumbering the publications using other competitive methodologies such as DNA microarrays (7, 8), serial analysis of gene expression (SAGE) (9), and subtractive hybridization (10) (see Table 1). Hundreds of these DD publications have been in the field of cancer. Many oncogene targets have been identified by DD, including genes that are regulated by RAS (11, 12), v-REL (13), and ERBB (14).
Table 1 Impact of major technologies in differential gene expression analysis Method
# of citations
Original publication
Differential display
3,965
Science 1992, 257:967–971
DNA microarrays
3,462
Science 1995, 270:467–470
SAGE
2,036
Science 1995, 270:484–487
Oligo arrays
905
Science 1996, 274:610–614
Number of citations is the number of times the original publication has been cited by other papers, which reflects the number of times each technique has been used for publications Search done with ISI Web of Knowledge Citation Search. Search conducted on January 25, 2008 at http://isi15.isiknowledge.com/portal.cgi?DestApp= WOS&Func=Frame
Automated Fluorescent Differential Display for Cancer Gene Profiling
101
One of the RAS target genes was shown to be a new cytokine, now known as interleukin (IL)-24 (15). The power of DD can be illustrated by using p53, the main tumor suppressor gene, as an example. Increasing numbers of candidate p53 target genes are being identified (16) and, amazingly, approximately half of the better understood p53 target genes were identified by DD (see Table 2) (17–43).
Table 2 List of the major potential p53 target genes identified by different technologies Gene(s)
Definition/function
Methoda
Reference(s)
Mdm-2
p53 negative regulator
Candidate
(17)
p21
Cdk2 inhibitor
SH, SAGE
(18)
14-3-3 sigma
Growth inhibition
SAGE
(17)
GADD45
DNA repair
SH
(17)
Bax
Apoptosis
Candidate
(19)
Cyclin G
Cell cycle regulator
DD
(20)
IGFBP-3
IGF binding protein, growth inhibition
SH
(21)
PIG3
NADPH-quinone oxidoreductase
SAGE
(22)
KILLER/DR5
Apoptosis
SH, SAGE
(23)
Ei24/PIG8
Novel gene, apoptosis
DD, SAGE
(22, 24)
PAG608
Novel zinc finger protein, apoptosis
DD
(25)
DDA3
Novel gene, growth inhibition
DD
(26)
TP53TG1
Novel gene, DNA-damage
DD
(27)
TP53TG3
Novel gene, cell cycle checkpoint
DD
(28)
p53R2
Ribonucleotide reductase
DD
(29)
PERP
Novel gene, pro-apoptotic
SH
(30)
PIR121
Novel gene, RNA binding
Array
(31)
Noxa
Novel gene, pro-apoptotic BH3 protein
DD
(32)
Pidd
Novel gene, death-domain protein
DD
(33)
p53AIP1
Novel gene, apoptosis, p53 phosphorylation
DD
(34)
p53DINP1
Novel gene, apoptosis, p53 phosphorylation
DD
(35)
PUMA
Novel gene, pro-apoptotic BH3-protein
SAGE, array
(36, 37)
Pirh2
Ubiquitin ligase, p53 negative regulator
DD
(38) (continued)
102
Meade et al.
Table 2 (continued) Gene(s)
Definition/function
Methoda
Reference(s)
Pac1
Protein phosphatase, pro-apoptotic
Array
(39)
Fas/APO-1
Cell death receptor
Candidate
(40)
Apaf-1
Apoptosis
Array
(41)
PTEN
Tumor suppressor
Candidate
(42)
Bid
Apoptosis
Array
(43)
a DD differential display, Array DNA microarray, SAGE serial analysis of gene expression, SH subtractive hybridization, Candidate candidate screening
It is clear that the rapid and successful adoption of differential display has been largely attributed to the simplicity of the method. Simplicity ensures a higher probability of success and few artifactual differences caused by experimental errors. Essentially, starting from the RNA samples being compared, only two steps, reverse transcription (RT) and polymerase chain reaction (PCR), are needed before signals generated are analyzed on a gel matrix. No additional steps such as second-strand DNA synthesis, purification of complementary DNA (cDNA), restriction enzyme digestion, adapter primer ligation, probe labeling/normalization, hybridization, or washing steps are required, because each of these steps could introduce and amplify errors or lead to the loss of mRNAs being detected. DD takes advantage of three of the most simple, powerful, and commonly used molecular biological methods: RT-PCR, DNA sequencing gel electrophoresis, and cDNA cloning (3–6, 44, 45). The DD methodology, also referred to as DDRT-PCR or DD-PCR in PCR nomenclature (46, 47), begins with total RNA being harvested from the cells/tissues of interest. A researcher will study at least two samples, but many more can be studied if the experiment suggests so. These samples will have morphological, genetic, or other experimental differences for which the researcher wishes to study the gene expression patterns, hoping to elucidate the root cause of the particular difference or specific genes that are affected by the experiment. Samples can be from any eukaryotic organism, including plants, fish, amphibians, reptiles, insects, yeast, fungi, and mammals. DD can be adapted for prokaryotic systems, but is more often used with eukaryotes. The messenger RNAs (mRNAs) within the total RNA population are used as the templates for DD-PCR after first-strand cDNA synthesis by reverse transcription. The current methodology makes use of three “anchored” oligo-dT primers that target the poly-adenylation (poly-A) site of eukaryotic mRNA and
Automated Fluorescent Differential Display for Cancer Gene Profiling
103
have the form H-T11M, where H is a HindIII restriction site (AAGCTT), T11 is a string of 11 Ts (although the first two Ts come from the HindIII site), and M is G, C, or A (48). They are referred to as “anchor” primers because the non-T base after the string of 11 Ts enables the primer to be anchored to the same spot for each round of amplification, in contrast to standard oligo-dT primers that only contain a string of Ts and will anneal in multiple spots, creating a smear (see Note 1). The HindIII restriction site was added to the anchor primer design to make the primers longer and more efficient in annealing to the targeted poly-A site, as well as improving downstream applications such as cDNA cloning. Using the current anchor primer design, the cDNA populations are subsequently divided into three subpopulations that represent one third of the potential mRNA expressed in the cell at any given time. Previous work indicated using anchor primers of the type T11VN, where V can be A, G, or C and N can be any of the four nucleotides, as well as anchors of the type T12MN, where M is a degenerate mixture of A, G, or C and N is any of the four nucleotides (3). Both of those primer designs result in larger subfractions of the mRNA population (12 for type T11VN and 4 for T12MN), which unnecessarily increases the amount of FDD-PCRs for the same level of gene coverage versus the H-T11M primer design. The next step in DD is the PCR amplification of the cDNA subpopulations utilizing a combination of anchor primers (called H-T11M) with a set of “arbitrary” primers that are random and short in length. The design of these arbitrary 13-mers (H-AP primers) utilized in DD technology also includes a HindIII restriction site (AAGCTT) and a 7-bp backbone of random base combinations. The HindIII restriction site is included in both the anchor and arbitrary primers for more efficient primer annealing and easier downstream manipulation of the cDNA (48). The primers used in DD represent a random selection from more than 16,000 (47) bp combinations. Additionally, the length of an arbitrary primer is so designed that, by probability, each will recognize 50–100 mRNAs under a given PCR condition (49). As a result, mRNA 3¢ termini defined by any given pair of anchored primer and arbitrary primer are amplified and displayed by denaturing polyacrylamide gel electrophoresis. A mathematical model of estimated gene coverage utilizing various combinations of anchor and arbitrary primers was developed shortly after the advent of differential display technology (49). This mathematical model indicated that approximately 240 primer combinations (3 anchor primers with 80 arbitrary primers) were needed to approach the level of estimated genome-wide screening for eukaryotes (~95%). However, a new mathematical model (50) predicts that more primer combinations are required to give that level of coverage – using 480 primer combinations (3 anchor primers
104
Meade et al.
with 160 arbitrary primers) would provide ~93% coverage based on the new model. DD was originally optimized with radioactivity using 35S (3). 33 P labeling was then developed (48) for better sensitivity and resolution and has been the most commonly used for publications. However, fluorescent differential display (FDD) (see Fig. 1) was
Fig. 1. Fluorescent mRNA differential display. Three fluorescently labeled one-base anchored oligo-dT primers with 5′ HindIII sites are used in combination with a series of arbitrary 13-mers (also containing 5′ HindIII sites) to reverse transcribe and amplify the mRNAs from a cell.
Automated Fluorescent Differential Display for Cancer Gene Profiling
105
the next logical progression. In the development of FDD, it was crucial that the new platform have similar sensitivity to traditional DD with isotopic labeling, as well as other advantages that would make the platform a viable and improved alternative to the established DD methodology. FDD, optimized using fluorochrome-labeled anchor primers (generically called FH-T11M) and higher dNTP concentrations in PCR, was shown to be essentially identical in both sensitivity and reproducibility to that of conventional DD (44) (see Fig. 2). Improvements such as elimination of radioactivity, digital data acquisition, and increased assay speed were goals that were successfully reached by the establishment of the FDD platform, representing a marked improvement over conventional DD. After PCR amplification, gel electrophoresis is performed to separate the resulting PCR products by size. Reactions are run side-by-side so that the samples being compared are next to one another for each primer combination. Comparison of the cDNA patterns between or among relevant RNA samples reveals differences in the gene expression profile for each sample (see Fig. 3). Electrophoresis can be performed with denaturing polyacrylamide sequencing gels (3, 51), non-denaturing polyacrylamide gels (46), or with agarose gels (52, 53). Sequencing gels are the most commonly used method and are recommended here because they offer the best band resolution and allow for easy and efficient recovery of genes. In addition, their ability to accommodate a large number of reactions reduces the number of gels that must be run for FDD analysis. Because the resulting cDNAs are fluorescently labeled, the use of a fluorescent imager scanner is required for this technology. Here the FMBIO® laser imager series (MiraiBio, Alameda, CA) is recommended for digital acquisition of the cDNA profiles. Although this is the recommended imager, other fluorescent scanners, such as the Typhoon® (GE Healthcare, Piscataway, NJ) and FLA-5000 (FUJIFILM Medical Systems, Stamford, CT) can also be used for FDD with similar sensitivity. Another option for visualization of PCRs is to run samples on an automated sequencer. Our group has successfully used the Applied Biosystems ABI3130xl, a capillary array-based automated sequencer, for FDD band detection with several different fluorophores. These capillary electrophoresis (CE) machines have a laser at a fixed point and, when a fluorescently labeled product passes the laser, a signal is detected. The results of FDD are seen as a series of spectral peaks for each lane, which can be compared to show differences in a very sensitive and reproducible way (see Fig. 4). The use of CE can dramatically cut down on the time and labor required for large-scale FDD screenings. However, the major drawback and bottleneck for using this technology with FDD is that, at this point, there is no way to retrieve bands from
106
Meade et al.
Fig. 2. Comparison of radioactive and fluorescent differential display (FDD). DNA-free RNA from normal (N ) and ras oncogene transformed (T ) rat embryo fibroblasts were compared in duplicate by either conventional differential display with 33Plabeled (alpha) dATP or FDD with fluorescein-labeled anchor primer under identical PCR conditions. The autoradiogram (a) and fluorescent images in grayscale (b) were compared in sensitivity and reproducibility as indicated. Reproducible differences are marked by arrows. The anchored primer, H-T11G, was used in combination with arbitrary 13-mer, H-AP29.
Automated Fluorescent Differential Display for Cancer Gene Profiling
107
Fig. 3. Automated FDD result. Four RNA samples (before, and 6, 9, and 12 h after a drug treatment) were compared with one anchor primer in combination with 24 arbitrary primers (only 21 shown) using automation in liquid handling, 132-lane electrophoresis unit, and digital acquisition of gel image. Grey arrows indicate reproducible differences worthy of pursuit.
Fig. 4. Capillary electrophoresis of FDD reactions. RNA samples without (−) and with (+) p53 activation are compared by FDD and samples are run on ABI3100 capillary electrophoresis instrument. A candidate p53 target gene shows up-regulation in the + p53 sample at approximately 305 bp.
108
Meade et al.
the CE results. One would still have to run a gel and detect bands using an alternate method. The most sophisticated attempt to solve this bottleneck was the development of a prototype computer-controlled CE system for positive band identification and retrieval by fraction collection by the Hitachi Japan group (54). However, to our knowledge, no further progress or commercialization has been made. After completion of the gene expression profiles by gel electrophoresis, the next step is to begin characterization of the potential differentially expressed genes of interest. Bands are excised from the gel matrix and reamplified with the same primer combination as the original FDD-PCR and under the same reaction conditions. Generally, a PCR-product cloning step is recommended before differential gene confirmation and sequencing, but this is up to the preferences of the researcher. The PCR-TRAP® Cloning System (GenHunter Corporation, Nashville, TN) is recommended because it is designed specifically for cloning differential display bands and employs highly efficient positive-selection cloning. Because of the potential that more than one distinct cDNA is contained within an excised band, more than one colony should be screened for the correct size before it is characterized. Furthermore, if the screening results indicate that more than one cDNA is present in the colony population, each of the different cDNAs should then be further characterized. Characterization of each potential gene includes sequencing of the cloned cDNAs of interest, with the results giving an indication of whether the cDNA is a known or unknown sequence. As with any differential gene expression technology, one has to be sure that the characterized sequences are actually differentially regulated, i.e., a “real difference,” and not a false positive. A variety of confirmation techniques, including Northern blot analysis (55), reverse Northern blot analysis (56), quantitative RT-PCR (qRT-PCR) (55), or real-time PCR (55) can be used. Although each has its own distinct advantages and disadvantages, Northern blot analysis is considered the gold standard for gene expression confirmation and, therefore, is recommended. Despite being labor intensive, time consuming, and requiring a significant amount of RNA, the Northern blot is by far the most accepted tool for confirmation. Northern blots have a distinct advantage over other confirmation methodologies in sensitivity, because both high- and low-level mRNA expression can be validated with this standard assay. The optimized FDD technology is now able to compete with other gene expression tools such as DNA microarray technology because of improved high-throughput capabilities, while maintaining its inherent advantages over microarrays. Interestingly, although several candidate p53 target genes have been isolated by microarrays (see Table 2), few of these genes have actually
Automated Fluorescent Differential Display for Cancer Gene Profiling
109
been confirmed by Northern blot analysis or functionally characterized, making it unclear whether they are real or, in fact, just “noise” (false positives) from the method itself. Because the DD approach to differential gene expression analysis relies on randomly generated primers, no prior knowledge of the mRNA sequences is required, making the gene screening systematic, non-biased, and with the ability to find unknown genes. In addition, DD allows researchers to study more than two samples simultaneously, with only 10–20 mg total RNA required for a “complete coverage.” Disadvantages of microarray technology as compared with FDD are reproducibility, probe sensitivity, nonlinearity in signal detection (57), probe cross-hybridization due to homologous cDNA sequences (58), and data management (59). Depending on the amount of desired gene coverage, FDD methodology enables quicker results when compared with traditional isotopic DD or other DD-related technologies, yet ensures more reliable results when compared with microarray or other competing, non-DD technologies. Combined with robotics and digital data analysis, FDD has been shown to be even more accurate and high throughput (4–6, 44, 45, 60). Elimination of manual reaction set up, through the use of a robotic liquid dispenser, not only ensures reproducibility by reduction of pipetting errors, but, in combination with the elimination of conventional DD autoradiography, also decreases the amount of time required for a differential gene expression screening. DD technology allows researchers to quickly and easily find the truly differentially expressed genes in their project so they can spend their time and effort on the downstream functional characterizations, where the true relevance of genes can be identified. These characterizations can be painstakingly long and difficult. Remember that the p53 tumor suppressor gene was discovered in the 1970s and has been worked on since by tens of thousands of laboratories throughout the world. Although much progress has been made in understanding the function of p53, the exact molecular nature of how p53 acts as this crucial tumor suppressor remains a mystery. Nevertheless, with tools such as DD, we think that the answers are coming, in cancer gene profiling as well as in all other fields of life science.
2. Materials 2.1. Total RNA Isolation
1. Phosphate-buffered saline (PBS). 2. RNA isolation reagent: a phenol–guanidinium monophasic solution such as RNApure® (GenHunter Corporation, Nashville, TN) is recommended.
110
Meade et al.
3. Chloroform. 4. Polytron™ Homogenizer for RNA extraction from tissue (Biospec Products Inc., Bartlesville, OK). 5. Diethyl pyrocarbonate (DEPC)-treated water (GenHunter, Cat R105). 6. Isopropanol. 7. 100% ethanol. 8. 70% ethanol in DEPC-treated dH2O. 9. 1.7-mL microfuge (Denville Scientific, Metuchen, NJ). 2.2. Removal of Genomic DNA from Total RNA
1. MessageClean® DNA Removal Kit (GenHunter, Cat. No. M601) including RNase-free DNase I (10 U/mL), 10 × reaction buffer [100 mM Tris–HCl, pH 8.4, 500 mM KCl, 15 mM MgCl2, and 0.01% gelatin], 3 M sodium acetate (pH 5.5), DEPC-treated water, and RNA Loading Mix. 2. Agarose, UltraPure (Invitrogen, Carlsbad, CA). 3. Distilled water (double distilled and autoclaved). 4. Phenol/chloroform (3:1) solution, Tris saturated: 30 mL melted crystalline phenol, 10 mL chloroform, 10 mL 1M Tris-HCl, pH 7.0. 5. 10 × MOPS buffer: 0.2 M MOPS, 0.05 M sodium acetate, 0.01 M ethylenediamine tetraacetic acid (EDTA), pH 6.5. 6. 12.3 M (37%) formaldehyde, pH > 4.0.
2.3. Single-Strand cDNA Synthesis by Reverse Transcription
1. RNAspectra™ Fluorescent Differential Display Kit (GenHunter) including distilled water, 5× RT buffer (125 mM Tris–HCl, pH 8.3, 188 mM KCl, 7.5 mM MgCl2, and 25 mM dithiol threonine [DTT]), dNTP Mix (FDD), oligo-dT anchor primers (H-T11M, 2 mM), and MMLV reverse transcriptase (100 U/mL). 2. 0.2-mL thin-walled PCR tube, RNase-free (GenHunter). 3. Thermal cycler. The GeneAmp PCR System 9600 (Applied Biosystems, Foster City, CA).
2.4. FDD-PCR
1. RNAspectra™ Fluorescent Differential Display Kit (GenHunter) including distilled water, 10× PCR buffer (100 mM Tris–HCl, pH 8.4, 500 mM KCl, 15 mM MgCl2, and 0.01% gelatin), FDD dNTP mix, fluorescent anchor primers (R-HT11M or F-H-T11M), and arbitrary primers (2 mM H-AP). 2. Taq DNA polymerase (Qiagen, Valencia, CA). 3. 0.2-mL thin-walled PCR tube, RNAse-free (GenHunter) or 96-well PCR plates (Thermo-Fast® 96 Detection Plate, ABgene Inc., Rochester). 4. Liquid-handling robot. GenHunter uses the Biomek 2000 (Beckman Coulter Inc., Fullerton, CA).
Automated Fluorescent Differential Display for Cancer Gene Profiling
2.5. Gel Electrophoresis
111
1. Gel apparatus with low-fluorescent (borosilicate) glass plates such as Horizontal or Vertical FDD Electrophoresis Systems (GenHunter). 2. 5 M KOH. 3. 50% ethanol (EtOH). 4. Sigmacote® (Sigma, St. Louis, MO) or similar product. 5. 6% denaturing gel solution such as Sequagel 6 Ready-To-Use 6% Sequencing Gel® (National Diagnostics, Atlanta, GA). 6. 10× TBE: 0.89 M Tris–borate, pH 8.3; 20 mM disodium ethylenediamine tetraacetic acid (Na2EDTA). 7. 10% ammonium persulfate (APS). 8. FDD Loading Dye from RNAspectra™ Kit (GenHunter): 99% formamide, 1 mM EDTA, pH 8.0, 0.009% xylene cyanole FF, and 0.009% bromophenol blue. 9. Fluorescent Laser Scanner. The FMBIO® II or III Series (MiraiBio, Alameda, CA) is recommended. 10. UV-transparent plastic wrap. Standard Glad® Cling Wrap (The Glad Products Company, Oakland, CA) or Saran Wrap work well. 11. FDD locator dye (GenHunter). 12. 5% bleach solution, in dH2O.
2.6. Reamplification of Selected Differentially Expressed Bands
1. Distilled water. 2. 3 M sodium acetate (pH 5.5) from GenHunter MessageClean Kit. 3. 10 mg/mL glycogen (GenHunter, Catalog No. S301). 4. 100% ethanol. 5. 85% ethanol. 6. Unlabeled anchor primers (H-T11G, H-T11A, H-T11C; 2 mM, from GenHunter). 7. Arbitrary primers (H-AP1 to H-AP80, 2 mM, from GenHunter RNAspectra Kit). 8. Taq DNA polymerase (Qiagen). 9. dNTP Mix (FDD) from RNAspectra Kit. 10. 10× PCR buffer (GenHunter): 100 mM Tris–HCl, pH 8.4, 500 mM KCl, 15 mM MgCl2, and 0.01% gelatin. 11. Agarose. 12. 10× agarose DNA loading dye (40% sucrose, 0.1% bromophenol blue, 0.1% xylene cyanole FF, and 2.5 mM EDTA, pH 8.0, in distilled water). 13. 0.2-mL thin-walled, RNAse-free PCR tube (GenHunter).
112
Meade et al.
2.7. Cloning of Reamplified PCR Products
1. PCR-TRAP® Cloning System (GenHunter) including insertready PCR-TRAP® cloning vector, 200 U/mL T4 DNA ligase, distilled water, 10× ligase buffer (500 mM Tris–HCl, pH 7.8, 100 mM MgCl2, 100 mM DTT, 10 mM ATP, 500 mg/mL BSA), 2 mM Lgh/Rgh primers, Colony Lysis Buffer (1× TE with 0.1% Tween 20), 10× PCR buffer, 250 mM dNTP, 20 mg/mL tetracycline, and GH competent cells. 2. LB media. Make 1 L LB with 10 g Bacto-tryptone, 5 g Bacto-yeast extract, 10 g NaCl, and bring volume up to 1 L with dH2O. 3. LB-Agar-TET plates. Make 1 L LB-Agar-TET plates with 10 g Bacto-tryptone, 5 g Bacto-yeast extract, 9 g NaCl, 15 g Bacto-agar, and bring volume up to 1 L with dH2O. Microwave until melted and add 1 mL of 20 mg/mL tetracycline when liquid cools to approximately 50°C. Pour plates. 4. Bacterial polystyrene petri dish.
2.8. Sequencing of Cloned PCR Products
1. AidSeq Primer Set C (GenHunter): includes Lseq and Rseq primers.
2.9. Confirmation of Differential Gene Expression by Northern Blot
1. HotPrime® DNA Labeling Kit (GenHunter) including 1 U/mL Klenow DNA polymerase, 10× labeling buffer, 500 mM dNTP (−dATP) or 500 mM dNTP (−dCTP), stop buffer, and distilled water. 2. QIAEX™ II Gel Extraction Kit. 3. Lock-top microfuge (USA Scientific, Ocala, FL). 4. Alpha-[32P] dATP (3,000 Ci/mmol) (PerkinElmer Life Sciences, Boston, MA). 5. Sephadex G50 column [Roche Applied Science, Indianapolis, IN). 6. 10 mg/mL salmon sperm DNA (GenHunter). 7. Nylon membrane: Nytran SuperCharge Nylon Transfer Membrane (Schleicher & Schuell, Keene, NH). 8. Paper towels. 9. UV-transparent plastic wrap. 10. Single emulsion scientific imaging film. Kodak Biomax MS (Kodak-Eastman, Rochester, NY) is recommended. 11. 20× saline–sodium citrate (SSC): 3 M NaCl, 0.3 M trisodium citrate · 2H2O. Adjust pH to 7.0 with 1 M HCl. 12. Formamide prehybridization/hybridization solution (GenHunter). 13. 1× SSC, 0.1% sodium dodecyl sulfate (SDS) (w/v). 14. 0.25× SSC, 0.1% SDS (w/v).
Automated Fluorescent Differential Display for Cancer Gene Profiling
113
3. Methods 3.1. Total RNA Isolation
Although FDD takes advantage of the poly-adenylation (poly-A+) site of eukaryotic mRNA, total RNA is preferred over poly-A+ RNA (mRNA) for several reasons. These reasons include the overall ease of purification, the ability to verify RNA integrity, and the cleaner background signal (see Note 2). To this end, total RNA is suggested for FDD analysis. If one is planning to do a 240-primer combination screening with FDD, approximately 12 mg of “cleaned” total RNA is required. The term “cleaned” refers to being clean of DNA achieved by DNase I treatment described in Subheading 3.2. Generally, 50–80% of the beginning amount of total RNA can be retrieved after cleaning. In addition, it is important to make sure there is plenty of RNA left over for whatever confirmation step is chosen. To ensure there is enough RNA for all steps, it is suggested to isolate approximately 50 mg of total RNA. The amount of total RNA that can be isolated from a sample can vary widely depending on the tissue/cell type, procedure used, organism, and proficiency at that particular procedure. However, using a reagent based on the standard phenol/guanidine thiocyanate technique such as RNApure®, one can achieve an average yield of 50 mg of total RNA from 25 mg of tissue or 2.5 × 106 cells (see Note 3).
3.1.1. RNA Extraction from Various Sources
1. If using regular “attached” cells, pour off medium. Set the plate on ice. If the cells are in suspension, spin down the cells, remove the medium, then move on to step 4.
3.1.1.1. Extraction of RNA from Tissue Cultures
2. Rinse the cells with 10–20 mL of cold PBS. 3. Pour off the PBS and remove the residual PBS with a 1,000-mL pipette (see Note 4). 4. Add 2 mL of RNApure® RNA isolation reagent per 100- to 150-mm plate to lyse the cells. Spread the solution by shaking the plate. This volume is sufficient for one to ten million cells. 5. Let the plate sit on ice for 10 min. 6. Pipette the lysate into two labeled 1.5-mL microfuge tubes.
3.1.1.2. Extraction of RNA from Tissues
1. Add at least 2 mL of RNApure® RNA isolation reagent to the tissue in a 50-mL conical tube on ice. Ideally, the volume ratio of RNA isolation reagent to tissue should be at least 10:1. 2. Homogenize the tissue with a Polytron™ homogenizer until the tissue is dispersed. 3. Let sit on ice for 10 min. 4. Transfer 1-mL aliquots of the lysate into labeled 1.5-mL centrifuge tubes.
114
Meade et al.
3.1.1.3. Extraction of RNA from Blood
3.1.2. RNA Purification
1. Spin down the blood products and remove the plasma. 2. Follow the instructions in “Extraction of RNA from Tissues” above. 1. Add 150 mL of chloroform per milliliter of lysate. Vortex for 10 s. The protocol can be stopped here by placing the lysates at −80°C. 2. Centrifuge the tubes at 4°C at maximum speed for 10 min (see Note 5). 3. Carefully remove the upper phase (see Note 6) into a clean, labeled 1.5-mL centrifuge tube. If RNA is being isolated from tissues, a second extraction is generally recommended to remove any RNases (see Note 7). 4. Add an equal volume of isopropanol. Mix vigorously or vortex for 30 s. Let sit on ice for 10 min. 5. Centrifuge for 10 min at 4°C at maximum speed. 6. Rinse the RNA pellet with 1 mL of cold 70% ethanol (in DEPCtreated water). Centrifuge 2 min at 4°C at maximum speed. 7. Remove the ethanol. Spin briefly and remove the residual wash solution with a pipette. 8. Resuspend the RNA in DEPC-treated water. The amount used for resuspension will depend on the amount of RNA isolated, but the RNA should be at a concentration greater than 1 mg/mL, so adjust accordingly. Do not use SDS in resuspension if using RNA for any PCR application. 9. Measure the concentration by taking 1 mL of the RNA (using a P10 pipette) and diluting to 1 mL of water (a 1:1,000 dilution). Read the concentration at 260 nm. 1 OD260 = 40 mg. 10. Move on to next steps and store RNA that will not be “cleaned” in aliquots at −80°C until the next use.
3.2. Removal of Genomic DNA from Total RNA
For the purposes of FDD gene expression analysis, as well as any other RNA-based gene expression technologies, contaminating genomic DNA must be removed before single-strand cDNA synthesis by reverse transcription and subsequent PCRs. If left unchecked, any primers with matching sequence to the contaminating DNA will anneal during the FDD-PCRs, causing amplification of DNA sequences and leading to a higher false-positive rate. Therefore, the following protocol is one of the most important procedures in preventing irregularities or artifacts during the FDD-PCRs by removal of the contaminating genomic DNA. It is important to note that one will typically retrieve 50–80% of the total RNA put into the reaction, so the amount to be cleaned must be adjusted to the amount needed for FDD.
Automated Fluorescent Differential Display for Cancer Gene Profiling
3.2.1. DNase I Digestion of Total RNA
115
1. If necessary, dilute the desired amount of RNA to be digested (maximum of 50 mg) with DEPC-treated water to a volume of 50 mL. 2. In a 1.5-mL centrifuge tube, add the following in order (the total reaction volume is 56.7 mL): Total RNA (10–50 mg)
50 mL
10× reaction buffer
5.7 mL
10 U/mL DNase I
1.0 mL
3. Mix gently and incubate at 37°C for 30 min (see Note 8). 3.2.2. Extraction and Ethanol Precipitation of DNA-Free RNA
1. Prepare phenol/chloroform solution (see Note 9) by melting crystalline phenol at 65°C. 2. Add 30 mL melted phenol to 10 mL chloroform and mix well. 3. Add 10 mL 1M Tris–HCl, pH 7.0, and mix well. Allow the saturation phase to form before using. 4. Add 40 mL of phenol/chloroform solution to each DNase I reaction (see Note 10). Vortex for 30 s. 5. Let sit on ice for 10 min. 6. Centrifuge at maximum speed for 5 min at 4°C. 7. Collect the upper phase (see Note 6) and place it into a clean, labeled, 1.5-mL microfuge tube. 8. Add 5 mL of 3 M sodium acetate and 200 mL of 100% ethanol. Mix well. 9. Let sit for at least 1 h at −80°C. Overnight to a few days at −80°C is also fine. 10. Centrifuge at 4°C for 10 min at maximum speed to pellet the RNA. 11. Carefully remove the supernatant and rinse the RNA pellet with 0.5 mL of 70% ethanol (in DEPC-treated water). Do not disturb the pellet. 12. Centrifuge for 5 min at 4°C at maximum speed and remove the supernatant. Centrifuge again briefly, removing the residual liquid without disturbing the RNA pellet. 13. Resuspend the RNA in 10–20 mL of DEPC-treated water.
3.2.3. RNA Quantification and Integrity Verification
After cleaning, it is crucial to be able to determine both the quantity and quality of the RNA retrieved. The amount can easily be quantified by OD260. The quality/integrity of the RNA is determined most accurately by running the RNA on an “RNA gel” and looking for the appearance of sharp ribosomal RNA bands.
116
Meade et al.
1. Quantitate the RNA amount by OD260 after 1:1,000 dilution of the DNA-free RNA sample with distilled water. RNA concentration (in mg/mL) = OD260 (1:1,000 dilution) × 40. 2. Prepare an “RNA gel” (denaturing formaldehyde agarose gel with MOPS and formaldehyde) by the following protocol: (a) Add the following to a microwave-safe container: i. 10× MOPS
10 mL
ii. Agarose
1–1.5 g
iii. Distilled water
83 mL
(b) Microwave for approximately 3 min or until the agarose is melted. (c) Let the agarose cool to at least 50°C (barely touchable by hand). (d) Add 7 mL of a 12.3 M (37%) formaldehyde solution. Gently mix. (e) Pour into a prepared gel casting plate and add a gel comb. (f) Running buffer (1 L) is made by diluting 100 mL of 10× MOPS with 900 mL of distilled water to a 1× concentration. Cover the agarose gel with running buffer. 3. Check the integrity of the RNA (see Note 11) by resolving 2–3 mg of both pre-DNase and post-DNase RNA samples on a 7% formaldehyde agarose gel with RNA Loading Mix by the following protocol: (a) Add 1–10 mL (2–3 mg) of RNA to 20 mL RNA Loading Mix in a labeled 1.5-mL microfuge tube. Mix well. (b) Incubate at 65°C for 10 min. (c) Centrifuge the sample briefly to collect condensation. (d) Put the samples on ice for 5 min. (e) Load the entire amount onto an RNA gel. (f) Run at 50–60 V for approximately 45 min or until resolution of the ribosomal subunits is achieved. 3.3. Single-Strand cDNA Synthesis by Reverse Transcription
Generally, two RT reactions are done per sample (called in duplicate) to ensure reproducibility and as a way of reducing any false positives. It is recommended to set up separate RT core mixes for each individual H-T11M in 200 mL-volume RT reactions if 240 primer combinations will be performed. Therefore, if two samples are being studied, set up four 200-mL RT reactions for H-T11G, four 200-mL RT reactions for H-T11A, and four 200-mL RT reactions for H-T11C. If smaller
Automated Fluorescent Differential Display for Cancer Gene Profiling
117
or larger numbers of primer combinations are chosen, adjust accordingly. 1. Dilute 40 mL of each RNA sample to a final concentration of 0.1 mg/mL with DEPC-treated water and mix thoroughly. Place on ice. 2. For an RT core mix with two samples in duplicate for one H-T11M primer (H-T11G will be shown here), add the following: 376 mL distilled water 160 mL of 5× RT buffer 64 mL FDD dNTP mix 80 mL H-T11G primer 680 mL total volume
Mix well. 3. Divide the above 680 mL evenly into four tubes labeled with sample name (Example: RTG-1a, RTG-1b, RTG-2a, RTG-2b), aliquoting 170 mL per tube (see Fig. 5 for step-bystep schematic of RT and FDD-PCR setup). 4. Add 20 mL of corresponding total RNA (0.1 mg/mL, freshly diluted, see Note 12) to each tube. For example, add 20 mL of RNA 1 to each of tubes RTG-1a and RTG-1b followed by 20 mL of RNA 2 to each of tubes RTG-2a and RTG-2b. Mix each tube well. 5. Program the thermal cycler to: 65°C for 5 min → 37°C for 60 min → 75°C for 5 min → >4°C soak (see Note 13). 6. Place the tubes on the thermal cycler and begin the program. 7. After the tubes have been at 37°C for 10 min, pause the thermal cycler and add 10 mL of MMLV reverse transcriptase to each tube. Quickly mix well by finger-tipping or pipetting up and down before continuing the incubation program. 8. At the end of the reverse transcription, spin the tube briefly at maximum speed to collect the condensation. Set the tubes on ice or store at −20°C for later use. 9. Repeat steps 1–8 for the H-T11A and H-T11C primers. 3.4. FDD-PCR
This protocol is designed for 240 primer combinations in duplicate per sample using three fluorescent dye-labeled anchor primers (FH-T11M) and 80 upstream arbitrary primers (H-AP). This would yield approximately 74% coverage of all possible genes. For a complete, genome-wide screening, 480 primer combinations or more must be completed per sample. It is ideal to set up PCRs in 96- or 384-well PCR plates using a robot to
118
Meade et al.
Fig. 5. RT and FDD reaction setup. This schematic shows individual steps involved and quantities required for standard reverse transcription (RT) and fluorescent differential display (FDD) reaction setups. These numbers are based on comparing two samples in duplicate (or four samples not in duplicate) with FH-T11M anchor primer in combination with 24 H-AP arbitrary primers. These steps would be repeated ten times until all 240-primer combinations (3 anchor primers and 80 arbitrary primers) have been completed.
Automated Fluorescent Differential Display for Cancer Gene Profiling
119
ensure reproducibility and increase throughput. Depending on the number of samples and the plate being used, one may be able to combine more or less than 24 primer combinations into one experiment. However, for simplicity, a 24-primer combination experiment with one anchor primer and two RNA samples in duplicate using a 96-well plate will be discussed. Therefore, this protocol will need to be repeated ten times using varying anchor-arbitrary primer combinations. 1. A separate FDD-PCR core mix for each individual FH-T11M primer is made. Here, a core mix for all 80 H-AP primers for FH-T11G primer is shown. This will be called “FDD Core Mix G.” 4,080 mL distilled water 800 mL of 10× PCR buffer 640 mL dNTP Mix (FDD) 800 mL FH-T11G primer 6,320 mL total volume
Mix well. 2. Aliquot 1,896 mL of FDD Core Mix G into three separate tubes labeled “FDD Core Mix G” (see Fig. 5 for step-bystep schematic of RT and FDD-PCR setup). Aliquot the remaining amount into a fourth tube labeled “FDD Core Mix G-remainder” (approximately 632 mL). 3. To one of the tubes labeled “FDD Core Mix G,” add 24 mL Taq DNA polymerase. Mix well. Freeze the other three tubes aliquoted above at −80°C for later PCRs. 4. Aliquot 480 mL of “FDD Core Mix G/Taq” mixture to four separate tubes labeled corresponding to the RT reactions. For this example, use FDDG-1a, FDDG-1b, FDDG-2a, and FDDG-2b. 5. Add 60 mL of corresponding cDNA from RT to each of the four tubes. For example, 60 mL of RTG-1a tube would go into the tube labeled FDDG-1a. Mix well. 6. Using either a robot or by hand, add 2 mL of H-AP primers 1–24 to corresponding wells of a 96-well plate (see Fig. 5). 7. Using either a robot or by hand, add 18 mL of corresponding FDD Core Mixes to corresponding wells of a 96-well plate (see Fig. 5). 8. The total reaction volume will be 20 mL. Add 25 mL of mineral oil if needed. 9. Program the thermal cycler to:
120
Meade et al. 94°C for 15 s (see Note 14) 40°C for 2 min 72°C for 60 s for 40 cycles →72°C for 5 min →4°C soak.
10. Put the 96-well plate on the thermal cycler and begin the program. Once completed, store reactions at −20°C in the dark. 11. Steps 3–10 will then be repeated for H-AP primers 25–48 and 49–72. 12. The same process will then be done for H-AP primers 73–80 as follows (see Note 15): (a) Add 8 mL Taq DNA polymerase to the 632 mL of “FDD Core Mix G-remainder.” Mix well. (b) Aliquot 160 mL of that mixture to four separate tubes labeled the same as in step 4. (c) Add 20 mL of cDNA from RT to each of the four tubes corresponding cDNA as in step 5. Mix well. (d) Using either a robot or by hand, add 2 mL of H-AP primers 73–80 to the corresponding wells of a 96-well plate. (e) Using either a robot or by hand, add 18 mL of corresponding FDD Core Mixes to the corresponding wells of a 96-well plate. (f) Follow steps 8–10 above. 13. Repeat steps 1–12 for the FH-T11A and FH-T11C primers. 3.5. Gel Electrophoresis
Because performing a large-scale FDD experiment requires many hundreds of PCRs (960 in the experiments above), one of the areas for improvement in making it more high throughput is in the gels themselves. Using a gel apparatus with many lanes can speed up this process tremendously. One system that has been successfully used is the Horizontal FDD Electrophoresis System with 132 lanes and the “Microtrough System” containing grooved glass plates. This allows one to load at least one entire 96-well plate on one gel. In addition, the Microtrough System allows the researcher to use standard 10-mL pipet tips for sample loading instead of the difficult-to-use flat gel-loading tips required by standard sequencing apparatuses. Hand position during loading is more stable and relaxed with this system.
Automated Fluorescent Differential Display for Cancer Gene Profiling
121
A multichannel pipettor for gel loading has also been tried. Matrix Technologies (Hudson, NH) manufactures several pipettors with width-expandable channels called “Matrix Equalizers.” The 8-channel Matrix Equalizer 384 with 0.5–12.5 mL volume range works fairly well. These pipettors have tips that move independently and can be spaced anywhere from 4.5 to 14.15 mm apart. For the gel loading, the tips were spaced at 9 mm for liquid uptake from a 96-well plate and then collapsed together to 4.5 mm for gel loading. However, this 4.5 mm distance only allows 87 lanes per gel, not enough to load an entire 96-well plate. A pipettor that could contract to 3 mm for gel loading would be ideal, but so far Matrix has not manufactured this. Therefore, using one of these pipettors has trade-offs: while it decreases the time required for gel loading and the chance of incorrect loading, fewer reactions can be run on the same gel. The other option is to load the PCRs using the Matrix pipettor at the 6 mm distance, loading every other well, but this requires reconfiguration of the reaction setup. For the experiments done above that have 960 PCRs on ten 96-well plates, it is recommended to run ten separate gels, each with one 96-well plate. One to two gels can generally be run per day, requiring 5–10 days to run all ten gels. For ease of use, the Sequagel 6 Ready-To-Use 6% Sequencing Gel® is recommended for denaturing gel electrophoresis. However, a general protocol is given here for the 6% denaturing polyacrylamide gel that is recommended for resolution of cDNA profiles. 1. Thoroughly clean both sides of the glass plates to be used with warm water and soap, ensuring that there is no previous gel debris or streaks (see Note 16). Be sure to rinse thoroughly afterward because soap residue may cause problems. KOH can be used occasionally for this purpose to strip off hardto-clean residue. 2. Further clean the glass plates by wiping with a 50% ethanol (EtOH) solution. Make sure the plates are completely dry. 3. Coat the interior surface of one of the plates (usually the notched one) with 500 mL Sigmacote® or similar product using a Kim-Wipe to smoothly spread it over the surface. Let dry for 1 min. This coating step allows the gel to preferentially stick to the non-coated plate after separation of plates for band cutting after running the gel. 4. Use 60 mL of the gel mixture for a 45 × 28 × 0.04-cm gel. 5. Add 0.5 mL of 10% APS solution and mix thoroughly. 6. Pour gel into the sequencing gel cast and let it polymerize for 1–2 h or overnight (see Note 17). 7. After polymerization, load the glass plates into the sequencing apparatus and add 1× TBE buffer to the upper and lower buffer chambers.
122
Meade et al.
8. Flush the urea from the gel wells by using a syringe to inject buffer into each well. Pre-run the sequencing gel in 1× TBE buffer for 30 min. 9. Add 3.5 mL of each FDD-PCR with 2 mL of FDD loading dye. Alternatively, an appropriate ratio of loading dye (8 mL for 20 mL PCRs) can be added directly to the PCR if the reactions will only be used for running gels. Incubate at 80°C for 2 min immediately before loading onto the gel. This step is to denature the DNA samples before gel loading. 10. After heat denaturation, put the samples on ice for 1–2 min. 11. Load the maximum amount of sample (usually 3–4 mL) into wells. It is crucial that all urea be removed from the wells before loading samples (see Note 18). For best results, load four to six lanes and then stop briefly to reflush the wells to remove urea. Load in appropriate groups, usually by primer combination. 12. Electrophorese for 1½ to 3 h at 60 W constant power (voltage not to exceed 2,000 V) until the xylene cyanole dye (the slower moving dye) reaches the bottom of the gel. In a 6% gel, the xylene cyanole will co-migrate with DNA of approximately 106 bp as a reference point. If voltage exceeds 2,000, lower the wattage. The gel should be kept in the dark while running to prevent photobleaching of samples (see Note 19) either by using a dark room or covering the gel apparatus with a cardboard box. 13. Turn off the power supply and remove the plates from the gel apparatus. Take off the gel tape and remove the spacers and comb (see Note 20). Clean the outside of the glass plates very well with warm water and 50% ethanol to remove any residue from the gel or tape. Thorough cleaning is required to reduce background signal (see Note 16). 14. Scan the gel on a fluorescence imager with an appropriate filter, following the manufacturer’s instructions based on the particular fluorophore being used. 3.6. Reamplification of Selected Differentially Expressed Bands
Assuming differentially expressed bands of interest are seen, those bands should be excised from the gel. After excision, the cDNAs will be reamplified using the same anchor-arbitrary primer combinations and reaction conditions as the initial FDD-PCRs. The reamplification products can then be cloned and sequenced for further characterization. 1. Separate the glass plates by taking off the notched/smaller glass plate (see Note 21) leaving the gel attached to the unnotched/larger plate.
Automated Fluorescent Differential Display for Cancer Gene Profiling
123
2. Place a layer of UV-transparent plastic wrap on top of the gel. This prevents contamination of the gel as well as making gel cutting easier. 3. Spot 0.5 mL of FDD Locator Dye at the upper and lower corners of the gel to allow orientation of the picture with the gel. The FDD locator dye, with its combination of fluorescent and visible dyes, can be used to easily align the gel with the printed template for band excision. 4. Rescan the gel with the gel facing up. 5. Print a real-size image on appropriately sized paper (see Note 22) using a quality ink jet or laser printer. This printed image will be used as the template to excise differentially expressed cDNAs. 6. Choose and label any bands to be cut (see Note 23). A logical band-naming nomenclature should be used such as RN-G1A (RN = researcher name; G = FH-T11G anchor primer; 1 = H-AP1 arbitrary primer; A = top differentially expressed band in lane). 7. Place the printout on the tabletop and lay the glass plate on top of the printout. Orient the plate so that the spots on the printout match up with those on the gel. 8. Excise each band with a razor or other sharp utensil and place the band into a 1.5-mL microfuge tube labeled with the corresponding band name. The razor blade should be cleaned between cuts to prevent cross-contamination, with a 70% ethanol or 5% bleach-soaked Kim-Wipe followed by H20soaked Kim-Wipe. 9. For each band being reamplified, add 100 mL of distilled water to the tube containing the corresponding gel slice. 10. Let soak for 10 min at room temperature. 11. Boil the tightly closed tube (with Parafilm or a lock-top tube) for 15 min to elute the cDNA from the gel slice. 12. Spin for 2 min at maximum speed to collect the condensation and pellet the gel. 13. Transfer supernatant to a fresh 1.5-mL labeled microfuge tube. Discard the tube with the gel slice. Add 10 mL of 3 M sodium acetate, 5 mL of glycogen, and 450 mL of 100% ethanol per tube. Let sit for at least 30 min on dry ice or in a −80°C freezer. 14. Spin for 10 min 4°C at maximum speed to pellet the DNA. Remove the supernatant and rinse the pellet with 200 mL of ice-cold 85% ethanol. Spin briefly and remove the residual ethanol. 15. Dissolve the pellet in 10 mL of dH2O.
124
Meade et al.
16. Make a reamplification core mix for each of the anchor primers that is large enough to reamplify all FDD bands with that particular anchor primer:
(a) A standard reamplification reaction will contain: Distilled water
23.3 mL
10× PCR buffer
4.0 mL
dNTP Mix (FDD)
0.3 mL
2 mM H-AP primer*
4.0 mL
2 mM H-T11M (seeNote 24)
4.0 mL
cDNA template*
4.0 mL
Taq DNA polymerase
0.4 mL 40 mL total volume
(b) Determine how many bands of each anchor primer will be reamplified. Multiple each of these by 10% to give a cushion for any pipetting inaccuracies. The number of bands for H-T11G × 10% = g; the number of bands for H-T11A × 10% = a; and the number of bands for H-T11C × 10% = c. (c) Make a reamplification core mix for each H-T11M by multiplying the numbers for a “Standard Reamplification Reaction” by g, a, and c, accordingly.
*However, for the core mixes, the H-AP primers and cDNA templates will not be added, as these will vary with each band. Make the core mix as follows: Distilled water
23.3 mL × g, a, or c
10× PCR buffer
4.0 mL × g, a, or c
dNTP Mix (FDD)
0.3 mL × g, a, or c
2 mM H-T11M
4.0 mL × g, a, or c
Taq DNA polymerase
0.4 mL × g, a, or c 32 mL × g, a, or c (total volume)
(d) As an example, if there were 20 bands chosen for reamplification from FH-T11G, g would be 22 (20 × 10% = 22). A core mix should be made for 22 bands by multiplying the numbers from step c above by 22: Distilled water
23.3 mL × 22 = 512.6 mL
10× PCR buffer
4.0 mL × 22 = 88 mL
dNTP Mix (FDD)
0.3 mL × 22 = 6.6 mL
2 mM H-T11M
4.0 mL × 22 = 88 mL
Taq DNA polymerase
0.4 mL × 22 = 8.8 mL 32 mL × 22 = 704 total volume
Automated Fluorescent Differential Display for Cancer Gene Profiling
125
(e) Make appropriate amounts of core mixes for both FHT11A and FH-T11C. 17. After core mixes are made, aliquot 32 mL into 0.2-mL tubes (individually, as strip tubes, or in a 96-well plate) labeled with the proper band name. 18. Add 4 mL of the corresponding cDNA template from step 11 above as well as 4 mL of the corresponding H-AP primer. 19. Place the reamplification reactions on the thermal cycler and perform using the same conditions as for FDD-PCR. 20. Make a 1.5% agarose gel with ethidium bromide by adding 1.5 g of agarose to 100 mL of 1× TAE. When the agarose/1 × TAE mix cools to approximately 50°C (barely touchable by hand), add 3 mL of ethidium bromide, swirl to mix, and pour the solution into a plastic agarose-casting tray. 21. Add 30 mL of the reamplification reaction to 5 mL of agarose DNA loading dye in a 0.5-mL microfuge tube. Load the 35 mL volume onto the 1.5% agarose gel. Save the remaining 10 mL of the PCR samples at −20°C for future cloning. 22. Electrophorese at 70 V for approximately 45–60 min. 23. Confirm correct cDNA reamplification by visualizing the gel using a UV transilluminator. The reamplified band should be approximately the same size as the band cut from the original FDD gel. After successful reamplification, each band must be confirmed to be a “real” difference by Northern blot or other technique. In addition, the band will need to be sequenced to determine whether it is a known or novel sequence. The order in which these are done can vary and is generally up to the preference of the researcher. Direct sequencing of the reamplified PCR products can sometimes be done here (see Note 25), but a cloning step is recommended first. The following steps are presented in the recommended order, but this can be modified based on the situation. 3.7. Cloning of Reamplified PCR Products
Clone differentially expressed cDNAs into recommended PCRTRAP® cloning vector (see Note 26), or other suitable cloning vector, following the manufacturer’s protocol.
3.8. Sequencing of Cloned PCR Products
If using the PCR-TRAP® Cloning System, sequencing can be performed utilizing vector-specific primers such as Lseq/Rseq or Lgh/Rgh. If using a cloning vector other than the one recommended, consult the manufacturer’s guidelines for sequencing instructions.
126
Meade et al.
3.9. Confirmation of Differential Gene Expression by Northern Blot
To confirm differential expression of the selected cDNAs, Northern blot analysis (55) is suggested rather than other confirmation techniques such as reverse Northern hybridization (56), quantitative RT-PCR (55), or real-time PCR (55). The Northern blot technique is technically simple and straightforward in approach, requiring no manipulation of the RNA sequences from which differential gene expression has been detected. Additionally, Northern blot analysis is the most accepted confirmation technique for differential gene expression, often being referred to as the gold standard of gene expression confirmation assays. If using the recommended PCR-TRAP® cloning vector, the probe template is produced by a PCR of the cDNA construct within the cloning vector. The required primers are supplied with the cloning system. Additionally, the HotPrime® DNA Labeling Kit, a random prime labeling kit with major improvements over the traditional random priming kit, is suggested. It is specifically designed to efficiently label DNA probes isolated from differential display for Northern blot analysis. This method makes use of random decamers, rather than the traditional hexamers used in random priming, incorporates the anchored oligo-dT primers (H-T11M) into the labeling buffer to ensure full-length antisense cDNA probe labeling, and uses radioactive dATP to take advantage of the ATrich nature of DD bands. These improvements greatly increase the chance for signal detection on the Northern blot analysis. After performing DD, most of the bands found will be confirmed to show differential gene expression. Those that are confirmed are considered “real” differences as opposed to any “false positives.” If a band chosen from DD does not show differential expression on a Northern blot, it does not mean that it is necessarily a “false positive.” There have been several examples where bands show no noticeable differential expression on a Northern blots, but on review, something else is involved, such as a polymorphism at the primer binding site, a short sequence deletion/insertion (61), a splicing difference, etc. If these types of changes occur at the exact site where a primer anneals (or within the gene sequence produced) during DD, a difference would be revealed, whereas a Northern blot may still show no difference because the probe could still bind to the RNA despite these small sequence differences. The message is that if a band looks convincing on the DD gel, but does not show differential expression by Northern, it could be a false positive, but it could also be something very interesting and worth pursuit.
4. Notes 1. Non-anchored oligo-dT primers have been used for Differential Display (DD), but their disadvantages far outweigh the
Automated Fluorescent Differential Display for Cancer Gene Profiling
127
advantage of needing only one primer for RT and PCR. Without the non-T base at the 3¢ end of the primer to “anchor” their position, they can anneal anywhere on the poly-A tail for PCR and will thus create many different size DNA fragments for the same exact cDNA species. This leads to a background smear, which is aesthetically unappealing, but more importantly will create problems for downstream reamplification of the wrong cDNA. 2. Although poly-A+ RNA (mRNA) is what is actually being reverse transcribed in DD, it is rarely used as the RNA input. It can be purified and used for DD, but it provides no significant advantages and therefore total RNA is the preferred RNA source for DD for a number of reasons. First, it is much easier to purify than poly-A+ (mRNA) because simple RNA isolation reagents exists from many commercial sources, including RNApure® (GenHunter). Most of the protocols for purifying mRNA require purification of total RNA first, so it requires additional steps. Second, total RNA allows for easy evaluation of overall RNA integrity by running an “RNA gel” and visualizing the ribosomal RNA bands. If these bands are sharp and without a background smear, it can be assumed that the mRNA is also intact. There are ways to evaluate mRNA integrity, but they require expensive and sophisticated instruments such as the Agilent Bioanalyser. Finally, the methods used for mRNA purification generally require an oligo-dT binding step so that only the mRNA will be captured. This always leads to some oligo-dT contamination in the RNA sample, which will cause problems for the same reasons listed in Note 1. For all of these reasons, total RNA is the RNA type of choice for DD. 3. The RNApure® reagent from GenHunter is a simple monophasic solution for rapid isolation of intact total RNA that is similar to other phenol/guanidine thiocyanate-based RNA isolation products, but has several major advantages. These include special cell lysis chemicals giving better yield, a yellow color allowing easier visualization during phase separation, and better stability with less corrosiveness. The high-quality RNA isolated can be used for differential display, Northern, and reverse Northern blot analysis, and for other applications. 4. During RNA isolation from cells, it is crucial to completely remove any residual PBS after rinsing. Otherwise, the ratio of RNApure to cells will be altered. Let the plate sit on angle for 1 min and remove the residual PBS with a 1,000-mL pipette. 5. During RNA purification steps, many of the centrifugation steps are done at 4°C. We put our centrifuge in the refrigerator a few hours before these steps will be done. However, we have noticed that if you leave the centrifuge in the refrigerator continuously, it will not spin as fast. We assume this would be caused by either temperature or moisture. Therefore, if you are
128
Meade et al.
using a standard lab centrifuge designed for room temperature use, do not keep the centrifuge in the refrigerator long term. 6. When removing the upper phase, it is crucial that you do not touch the interphase, which may contain proteins including RNases/DNases. It is much better to lose some RNA, but ensure that what RNA you do retrieve will be free of RNase/ DNase, than to try to get as much of the upper phase as possible and risk RNase/DNase contamination. 7. Because tissues generally contain higher amounts of RNases than cells, we have noticed that a second extraction phenol extraction step will significantly improve the DD results in terms of reproducibility and overall quality. This second extraction can be done directly after taking the upper phase and using more RNApure® reagent. Just add 1 mL of RNApure® reagent per 100 mL of upper phase and follow the protocol starting at Subheading 3.1.2 again. 8. For the DNase I digestion step at 37°C, we recommend sticking to this 30-min time as closely as possible in case there is any RNase contamination. However, it is also crucial to do the full 30-min incubation to completely digest all DNA. 9. We have found that phenol/CHCl3 (3:1) is superior to phenol/CHCl3 (1:1) or phenol/CHCl3/isoamyl alcohol (25:24:1). However, these other options can be used, but the extraction should be repeated twice to ensure complete removal of proteins. Phenol/CHCl3/isoamyl alcohol is normally used for DNA or plasmid purification. It is recommended that all reagents for RNA work be separated from DNA work to avoid RNase contamination. 10. There are non-phenol/chloroform-based based protocols to inactive or remove DNase, including heat inactivation, chemical inactivation, or column-based purification. However, phenol/chloroform-based purification is the gold standard for protein removal and the only way to ensure that all DNase is removed. The other protocols may inactivate or remove most of the DNase, but for RT-PCR applications, even minute amounts of DNase will cause major problems with your results. Therefore, we only recommend phenol/chloroformbased purification. 11. To check for RNA integrity, look for the clear appearance of the ribosomal RNA bands, with little to no smearing. RNA from different species can look significantly different, but mammalian RNA should have 28S and 18S ribosomal RNA (rRNA) bands in close proximity at the top of the gel and a 5S rRNA band lower. If the RNA appears degraded, this can be caused by many things: RNA was degraded before treatment with DNase I. Check the integrity at all stages (before digestion, after digestion,
Automated Fluorescent Differential Display for Cancer Gene Profiling
129
after phenol/CHCl3 extraction, etc.). Make sure that RNA is stored at −80°C at concentrations of at least 1 mg/mL.
DNase I was contaminated. DNase I from many vendors contain detectable RNase contamination. The DNase I from the MessageClean Kit is guaranteed to be RNase-free. RNA was degraded by reagents or equipment. Make sure all solutions and buffers are made with DEPC-treated dH2O and all vessels including tubes, tips, and gel boxes are free of RNase.
The RNA sample itself is contaminated with RNase. This is a common problem with RNA extracted from large amounts of tissue, which is why at least two extraction steps are recommended for tissues. To confirm RNase contamination, incubate RNA with 1–2 mM MgCl2 in Tris–Cl, pH 8.0, at 37°C for 30 min. This will activate any RNase in the RNA. If this is confirmed, if enough “uncleaned” RNA remains, do an additional phenol/CHCl3 extraction with RNA sample following the same procedures in Subheading 3.2.2. If not, start a new RNA extraction and increase the RNA extraction solution (RNApure®) to tissue ratio and do an additional phenol/CHCl3 extraction step.
The RNA sample sometimes appears to be degraded after agarose gel analysis, when the actual problem is the pH of the buffer, too much salt in the RNA, or bad loading dye, which has caused the ribosomal RNAs (28S and 18S) to migrate strangely. We recommend using RNA Loading Mix (GenHunter). Confirm the pH of the MOPS buffer, which should be between 6.5 and 7.0. Also, make sure formaldehyde is added to the gel and the RNA sample is denatured by incubating in RNA Loading Mix at 65°C for 10 min before loading.
12. RNA samples should be freshly diluted with dH2O or DEPC-treated H2O to 0.1 mg/mL directly before RT reaction set up. Do not reuse the diluted RNA after freezing and thawing because the RNA will be degraded and yield poor results. 13. For the reverse transcription reaction, the initial 65°C incubation is intended to denature the RNA secondary structure. The final incubation at 75°C is to inactivate the reverse transcriptase without denaturing the cDNA/mRNA duplexes. Therefore “hot start” PCR is neither necessary, nor helpful for the subsequent PCRs using cDNAs as templates. 14. If not using the recommended thermal cycler, you may need to adjust the denaturation (94°C) time to 30 s. 15. The PCR setup for H-AP primers 73–80 can be done at the same time for all three FH-T11M primers so they can all be put on one 96-well plate.
130
Meade et al.
16. Gel debris and streaks on the glass plates will usually fluoresce and can cause major background problems. Therefore, thorough cleaning is required. 17. If overnight gel polymerization is done, plastic wrap, such as Saran Wrap, should be used to prevent the gel from drying out. 18. During sample gel loading, it is crucial that the urea in the wells be completely flushed right before loading your samples. Because urea is heavier than water, it will fall to the bottom of the well fairly quickly. If a sample is loaded without flushing a well, it will sit on top of the urea, which in turn causes strange migration and poor resolution. For best resolution, flush every four to six wells loaded using a syringe or pipet while trying not to disturb samples that have already been loaded. 19. Fluorescent dyes are light sensitive. We recommend keeping primers and samples in the dark or covered with aluminum foil. While running the gel, the apparatus should also be kept in the dark as much as possible. This can be done by running gels in a dark room or using a cardboard box to cover the entire apparatus. 20. When scanning the gel, it is best to remove the gel tape, spacers, and comb, which will fluoresce and can cause background problems. However, if you think you might run the gel longer for better separation, you should do a quick scan before removing gel tape, spacers, and comb to determine whether the gel has been run long enough. 21. To separate the glass plates, we have found that small plastic wedges, which can be purchased from several gel companies, work well. It is important to do this slowly to make sure that the gel is sticking to only one side. 22. When printing out the real-size image, you will need a large enough sized paper to fit the whole gel. We use 11 × 17 paper on an ink-jet printer, which allows plenty of space for the entire gel. If necessary, you could also print the gel on two to three pieces of paper and tape them together. 23. When selecting bands to cut, if there is a chance that a band is worth pursuing, it is recommended to cut it out. Later, a decision can be made whether or not to reamplify that band. However, if one later decides to pursue a band that was not cut, the gel will have to be run again because gels can only be stored for a few days before drying out. When a large quantity of gels are being run, it usually makes sense to run all the gels first, cutting any interesting bands along the way, and storing those bands in the refrigerator. When all gels have been completed, a decision can be made regarding which bands are worthwhile reamplifying and then they can be reamplified together.
Automated Fluorescent Differential Display for Cancer Gene Profiling
131
24. For the reamplification reaction, note that the unlabeled (without 5′ fluorophore) H-T11M primers are used. Otherwise, the fluorophore can interfere with future cloning. 25. Direct sequencing can sometimes be done after successful reamplification. If the reamplified product is a single, clean band, direct sequencing with the H-AP primer can work, generally approximately 50% of the time. However, if the reamplified product has multiple bands, a cloning step will have to be done first. 26. The PCR-TRAP® Cloning System is by far the most efficient cloning method for PCR products that we have tested. It utilizes a third-generation cloning vector that features positive selection for DNA inserts. Only recombinant plasmids confer antibiotic resistance. The principle of this unique cloning system is based on the phage Lambda repressor gene, cI, which is cloned on the PCR-TRAP® vector and codes for a repressor protein. The repressor protein binds to the Lambda right operators Or1 to Or3 of the cro gene, thereby turning off the promoter that drives the TetR gene on the plasmid. Therefore, cloning of the PCR products directly, without any post-PCR purification, into the cI gene leads to inactivation of the repressor gene, thus turning on the TetR gene. This allows the Escherichia coli containing recombinant plasmids to grow on Tet plates. References 1. Sager, R. (1997) Expression genetics in cancer: shifting the focus from DNA to RNA. Proc. Natl. Acad. Sci. USA 94,952–955. 2. Vogelstein, B., Lane, D., and Levine, A.J. (2000) Surfing the p53 network. Nature 408, 307–310. 3. Liang, P., and Pardee, A.B. (1992) Differential display of eukaryotic messenger RNA by means of the polymerase chain reaction. Science 257, 967–971. 4. Liang, P. (2002) A decade of differential display. Biotechniques 33, 338–346. 5. Liang, P., and Pardee, A.B. (2003) Analysing differential gene expression in cancer. Nat. Rev. Cancer 3, 869–876. 6. Liang, P., Meade, J., and Pardee, A.B. (2007) A protocol for differential display of mRNA expression using either fluorescent or radioactive labeling. Nat. Protoc. 2, 457–470. 7. Schena, M., Shalon, D., Davis, R.W., and Brown, P.O. (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270, 467–470.
8. Chee, M., Yang, R., Hubbell, E., Berno, A., Huang, X.C., Stern, D., et al. (1996) Accessing genetic information with high-density DNA arrays. Science 274, 610–614. 9. Velculescu V.E., Zhang L., Vogelstein B., and Kinzler K.W. (1995) Serial analysis of gene expression. Science 270, 484–487. 10. Zimmermann, C.R., Orr, W.C., Leclerc, R.F., Barnard, E.C., and Timberlake, W.E. (1980) Molecular cloning and selection of genes regulated in Aspergillus development. Cell 21, 709–715. 11. McCarthy, S.A., Samuels, M.L., Pritchard, C.A., Abraham, J.A., and McMahon, M. (1995) Rapid induction of heparin-binding epidermal growth factor/diphtheria toxin receptor expression by Raf and Ras oncogenes. Genes Devel. 9, 1953–1964. 12. Zhang, R., Tan, Z., and Liang, P. (2000) Identification of a novel ligand-receptor pair constitutively activated by Ras oncogenes. J. Biol. Chem. 275, 24436–24443. 13. You, M., Ku, P.T., Hrdlickova, R., and Bose, H.R., Jr. (1997) ch-IAP1, a member of the inhibitor-of-apoptosis protein family,
132
14.
15.
16. 17. 18.
19.
20.
21.
22. 23.
24.
25.
26.
Meade et al. is a mediator of the antiapoptotic activity of the v-Rel oncoprotein. Mol. Cell. Biol. 17, 7328–7341. Park, B.-W., O’Rourke, D.M., Wang, Q., Davis, J.G., Post, A., Qian, X., et al. (1999) Induction of the Tat-binding protein 1 gene accompanies the disabling of oncogenic erbB receptor tyrosine kinases. Proc. Natl. Acad. Sci. USA 96, 6434–6438. Wang, M., Tan, Z., Zhang, R., Kotenko, S.V., and Liang, P. (2002) Interleukin-24 (Mob-5/ Mda-7) signals through two heterodimeric receptors, IL-22R1/IL-20R2 and IL-20R1/ IL-20R2. J. Biol. Chem. 277, 7341–7347. El-Deiry, W.S. (1998) Regulation of p53 downstream genes. Semin. Cancer Biol. 8, 345–357. Wu, X., Bayle, J.H., Olson, D., and Levine, A.J. (1993) The p53-mdm-2 autoregulatory feedback loop. Genes Dev. 7, 1126–1132. El-Deiry W.S., Tokino, T., Velculescu, V.E., Levy, D.B., Parsons, R., Trent, J.M., et al. (1993) WAF1, a potential mediator of p53 tumor suppression. Cell 75, 817–825. Miyashita, T., and Reed, J.C. (1995) Tumor suppressor p53 is a direct transcriptional activator of the human bax gene. Cell 80, 293–299 Okamoto, K., and Beach, D. (1994) Cyclin G is a transcriptional target of the p53 tumor suppressor protein. EMBO J. 13, 4816–4822. Buckbinder, L, Talbott, R., Velasco-Miguel, S., Takenaka, I., Faha, B., Seizinger, B.R., et al. (1995) Induction of the growth inhibitor IGF-binding protein 3 by p53. Nature 377, 646–649. Polyak, K., Xia, Y., Zweier, J.L, Kinzler, K.W., and Vogelstein, B. (1997) A model for p53induced apoptosis. Nature 389, 300–305. Wu, G.S., Burns, T.F., McDonald, E.R., Jiang, W., Meng, R., Krantz, I.D., et al. (1997) KILLER/DR5 is a DNA damageinducible p53-regulated death receptor gene. Nat. Genet. 17, 141–143. Gu, Z., Flemington, C., Chittenden, T., and Zambetti, G.P. (2000) ei24, a p53 response gene involved in growth suppression and apoptosis. Mol. Cell. Biol. 20, 233–241. Israeli, D., Tessler, E., Haupt, Y., Elkeles, A., Wilder, S., Amson, R., et al. (1997) A novel p53-inducible gene, PAG608, encodes a nuclear zinc finger protein whose overexpression promotes apoptosis. EMBO J. 16, 4384–4392. Lo, P.K., Chen, J.-Y., Lo, W.-C., Chen, B.-F., Hsin, J.-P., Tang, P.-P, et al. (1999)
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37. 38.
Identification of a novel mouse p53 target gene DDA3. Oncogene 18, 7765–7774. Takei, Y., Ishikawa, S., Tokino, T., Muto, T., and Nakamura, Y. (1998) Isolation of a novel TP53 target gene from a colon cancer cell line carrying a highly regulated wildtype TP53 expression system. Genes Chromosomes Cancer 23, 1–9. Ng, C.C., Koyama, K., Okamura, S., Kondoh, H., Takei, Y., and Nakamura, Y. (1999) Isolation and characterization of a novel TP53-inducible gene, TP53TG3. Genes Chromosomes Cancer 26, 329–335. Tanaka, H., Arakawa, H., Yamaguchi, T., Shiraishi, K., Fukuda, S., Matsui, K., et al. (2000) A ribonucleotide reductase gene involved in a p53-dependent cell-cycle checkpoint for DNA damage. Nature 404, 42–49. Attardi, L., Reczek, E.E., Cosmas, C., Demicco, E.G., McCurrach, M.E., Lowe, S.W., et al. (2000) PERP, an apoptosis-associated target of p53, is a novel member of the PMP22/gas3 family. Genes Dev. 14, 704–718. Saller, E., Tom, E., Brunori, M., Otter, M., Estreicher, A., Mack, D.H., et al. (1999) Increased apoptosis induction by 121F mutant p53. EMBO J. 18, 4424–4437. Oda, E., Ohki, R., Murasawa, H., Nemoto, J., Shibue, T., Yamashita, T., et al. (2000) Noxa, a BH3-only member of the Bcl-2 family and candidate mediator of p53-induced apoptosis. Science 288, 1053–1058. Lin, Y., Ma, W., and Benchimol, S. (2000) Pidd, a new death-domain-containing protein is induced by p53 and promotes apoptosis. Nat. Genet. 26, 124–127. Oda, E., Arakawa, H., Tanaka, T., Matsuda, K., Tanikawa, C., Mori, T., et al. (2000) p53AIP1, a potential mediator of p53-dependent apoptosis, and its regulation by Ser-46-phosphorylated p53. Cell 102, 849–862. Okamura, S., Arakawa, H., Tanaka, T., Nakanishi, H., Ng, C.C., Taya, Y., et al. (2001) p53DINP1, a p53-inducible gene, regulates p53-dependent apoptosis. Mol. Cell 8, 85–94. Yu, J., Zhang, L, Hwang, P.M., Kinzler, K.W., and Vogelstein, B. (2001) PUMA induces the rapid apoptosis of colorectal cancer cells. Mol. Cell 7, 673–682. Nakano, K., and Vousden, K.H. (2001) PUMA, a novel proapoptotic gene, is induced by p53. Mol. Cell 7, 683–694. Leng, R.P., Lin, Y., Ma, W., Wu, H., Lemmers, B., Chung, S., et al. (2003) Pirh2, a
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
Automated Fluorescent Differential Display for Cancer Gene Profiling p53-induced ubiquitin-protein ligase, promotes p53 degradation. Cell 112, 779–791. Yin, Y., Liu, Y.-X., Jin, Y.J., Hall, E.J., and Barrett, J.C. (2003) PAC1 phosphatase is a transcription target of p53 in signalling apoptosis and growth suppression. Nature 422, 527–531. Owen-Schaub, L.B., Zhang, W., Cusack, J.C., Angelo, L.S., Santee, S.M., Fujiwara, T., et al. (1995) Wild-type human p53 and a temperature-sensitive mutant induce Fas/ APO-1 expression. Mol. Cell. Biol. 15, 3032–3040. Kannan, K., Kaminski, N., Rechavi, G., Jakob-Hirsch, J., Amariglio, N., and Givol, D. (2001) DNA microarray analysis of genes involved in p53 mediated apoptosis: activation of Apaf-1. Oncogene 20, 3449–3455. Stambolic, V., MacPherson, D., Sas, D., Lin, Y., Snow, B., Jang, Y., et al. (2001) Regulation of PTEN transcription by p53. Mol. Cell 8, 317–325. Sax, J.K., Fei, P., Murphy, M.E., Bernhard, E., Korsmeyer, S.J., and El-Deiry, W.S. (2002) BID regulation by p53 contributes to chemosensitivity. Nat. Cell Biol. 411, 842–849. Cho, Y.-J., Meade, J.D., Walden, J.C., Chen, X., Guo, Z., and Liang, P. (2001) Multicolor fluorescent differential display. Biotechniques 30, 562–572. Meade, J.D., Cho, Y.-J., Fisher, J.S., Walden, J.C., Guo, Z., and Liang, P. (2005) Automation of fluorescent differential display with digital readout. In Differential Display Methods and Protocols, 2nd edition. Vol. 317 (Liang, P., Meade, J.D., & Pardee, A.B., eds.) Humana Press, Totowa, NJ, pp. 23–57. Bauer, D., Muller, H., Reich, J., Riedel, H., Ahrenkiel, V., Warthoe, P., et al. (1993) Identification of differentially expressed mRNA species by an improved display technique (DDRT-PCR). Nucleic Acids Res. 21, 4272–4280. Liang, P., Bauer, D., Averboukh, L., Warthoe, P., Rohrwild, M., Muller, H., et al. (1995) Analysis of altered gene expression by differential display. Methods Enzymol. 254, 304–321. Liang, P., Zhu, W., Zhang, X., Guo, Z., O’Conell, R.P., Averboukh, L., et al. (1994) Differential Display using one-base anchored oligo-dT primers. Nucleic Acids Res. 22, 5763–5764. Liang, P., Averboukh, L., and Pardee, A.B. (1994) Method of differential display. In Methods in Molecular Genetics, (Adolph, K.W., ed.) Academic, San Diego, CA, pp. 3–16.
133
50. Yang, S., and Liang, P. (2004) Global analysis of gene expression by differential display a mathematical model. Mol. Biotechnol. 27, 197–208. 51. Liang, P., Averboukh, L., and Pardee, A.B. (1993) Distribution and cloning of eukaryotic mRNAs by means of differential display: Refinements and optimization. Nucleic Acids Res. 21, 3269–3275. 52. Hsu, D.K., Donohue, P.J., Alberts, G.F., and Winkles, J.A. (1993) Fibroblast growth factor-1 induces phosphofructokinase, fatty acid synthase and Ca (2+)-ATPase mRNA expression in NIH 3T3 cells. Biochem. Biophys. Res. Commun. 197, 1483–1491. 53. Sokolov, B.P., and Prockop, D.J. (1994) A rapid and simple PCR-based method for isolation of cDNAs from differentially expressed genes. Nucleic Acids Res. 22, 4009–4015. 54. Irie, T., Oshida, T., Hasegawa, H., Matsuoka, Y., Li, T., Oya, Y., et al. (2000) Automated DNA fragment collection by capillary array gel electrophoresis in search of differentially expressed genes. Electrophoresis 21, 367–374. 55. Ausubel, F., Brent, R., Kingston, R. E., Moore, D. D., Seidman, J. G., Smith, J. A., et al. (eds.) (1995) Short Protocols in Molecular Biology (3rd edition). Wiley, New York, NY. Section 4.9.1.–4.9.8. 56. Zhang, H., Zhang, R., and Liang, P. (1996) Differential screening of gene expression difference enriched by differential display. Nucleic Acids Res. 24, 2454–2455. 57. Ramdas, L., Coombes, K.R., Baggerly, K., Abruzzo, L., Highsmith, W.E., Krogmann, T., et al. (2001) Sources of nonlinearity in cDNA microarray expression measurements. Genome Biol. 2, RESEARCH0047. 58. Richmond, C.S., Glasner, J.D., Mau, R., Jin, H., and Blattner, F.R. (1999) Genomewide expression profiling in Escherichia coli K-12. Nucleic Acids Res. 27, 3821– 3835. 59. Gibbs, W.W. (2001) Shrinking to enormity: DNA microarrays are reshaping basic biology – but scientist fear that they may soon drown in data. Sci. Am. 284, 33–34. 60. Liang, P. (2000) Gene discovery using differential display. Gen. Eng. News 20, 37. 61. Liang, S., Rossby, S.P., Liang, P., Shelton, R.C., Manier, D.H., Chakrabarti, A., et al. (2005) Detection of an mRNA polymorphism by differential display. In Differential Display Methods and Protocols, 2nd edition. Vol. 317 (Liang, P., Meade, J.D., & Pardee, A.B., eds.) Humana Press, Totowa, NJ, pp 279–285.
Chapter 8 Manual Microdissection Combined with Antisense RNA–LongSAGE for the Analysis of Limited Cell Numbers Jutta Lüttges, Stephan A. Hahn, and Anna M. Heidenblut Summary Establishing a gene expression profile of defined subtypes of cells within an organ is still challenging because it frequently requires microdissection and subsequent amplification of the limited amount of messenger RNA (mRNA) isolated from the microdissected tissue in order to be able to proceed with comprehensive gene expression analyses via microarray or serial analysis of gene expression (SAGE) technology. Here we describe a manual microdissection strategy for the isolation of high-quality RNA. Furthermore, a strategy for combining linear amplification of RNA with longSAGE is described that allows the use of antisense RNA (aRNA) generated via the well-established linear amplification of RNA procedure together with the conventional SAGE or longSAGE technology. Key words: Microdissection, RNA amplification, T7 RNA polymerase, Amplified antisense RNA, aRNA, aRNA-longSAGE, Expression profiles
1. Introduction To be able to analyze the expression profile of distinct histological cell types within a complex primary tissue, a method to isolate the cells of interest is needed. Microdissection using laser capture or manual techniques has been successfully used to produce such highly enriched cell preparations. Manual microdissection is described herein because in our hands this procedure was easier and faster than using laser capture microdissection. Amplified antisense RNA (aRNA)-long serial analysis of gene expression (longSAGE) is a modification of the conventional SAGE protocol that allows the generation of SAGE libraries from very small sample sizes such as microdissected cells (1). As little as Robert Grützmann and Christian Pilarsky (eds.), Cancer Gene Profiling: Methods and Protocols, Methods in Molecular Biology, vol. 576, DOI 10.1007/978-1-59745-545-9_8, © Humana Press, a part of Springer Science + Business Media, LLC 2010
135
136
Lüttges, Hahn, and Heidenblut
40 ng of total RNA is sufficient to generate an aRNA-longSAGE library. This is achieved by linear amplification of RNA that is carried out prior to the synthesis of the SAGE library. Linear amplification of RNA (2) is a method routinely used in gene expression profiling via microarrays. It starts with a complementary DNA (cDNA) synthesis using a modified oligo(dT) primer that adds the T7 RNA polymerase promoter to the 3¢ end of the cDNA. In vitro transcription of this cDNA with T7 RNA polymerase yields amplified aRNA. This technique introduces less amplification bias than polymerase chain reaction (PCR)based cDNA amplification protocols (3). Furthermore, the use of aRNA in differential gene expression analysis leads to the detection of expression differences that are not observed when using nonamplified RNA as starting material (4, 5). The majority of these additional expression differences can be verified by quantitative real-time PCR (4). The aRNA obtained by linear amplification of RNA cannot be used in combination with the standard SAGE protocol because the latter needs sense RNA for the cDNA synthesis that is the first step of library generation. The aRNA-longSAGE protocol described herein uses a modified cDNA synthesis to adapt the SAGE procedure to the use of antisense RNA. This is done by using a random primer for the cDNA first strand synthesis (see Fig. 1). This so called “SAGErandom” primer consists of six random nucleotides and the recognition site of the SAGE anchoring enzyme NlaIII. The NlaIII site was included to specifically reverse transcribe only those RNA molecules that are accessible to the SAGE procedure. After first strand synthesis,
Fig. 1. cDNA synthesis in the aRNA-longSAGE protocol.
Manual Microdissection Combined with Antisense RNA–LongSAGE
137
the RNA is digested with RNaseH and the resulting cDNA first strand can be hybridized to oligo(dT) beads due to its polyA tail. The oligo(dT) sequence on the beads serves as a primer for the synthesis of the second cDNA strand. After second strand synthesis, all further steps are done according to the MicroSAGE protocol (6) adapted for longSAGE (7). Some minor modifications were introduced to improve the yield of ditags and the length of concatemers (see Fig. 2 for gel photographs of an aRNA-longSAGE library).
Fig. 2. Representative gel photographs of an aRNA-longSAGE library. (a) 0.5 ml first strand cDNA on a 1% agarose gel; (b) ditag PCR on a 12% polyacrylamide gel (−, negative control; +, positive control); (c) isolated ditags on a 12% polyacrylamide gel (PCR, 5 ml of ditag PCR prior to Hsp92II digestion, four of seven lanes with isolated ditags are shown in the photograph); (d) concatemers (C ) on a 8% polyacrylamide gel; (e) insert PCR on a 1.5% agarose gel, the horizontal line shows the product size that corresponds to an empty cloning vector.
138
Lüttges, Hahn, and Heidenblut
Using the longSAGE rather than the conventional SAGE protocol improves the annotation of SAGE tags because longSAGE tags are 21 bp long, whereas conventional SAGE tags are only 14 bp long. However, the modified cDNA synthesis used in the aRNA-longSAGE protocol can be combined with the conventional SAGE protocol as well as with the longSAGE protocol.
2. Materials 2.1. Microdissection
1. Microtome for cryo sectioning with a blade holder for disposable blades. 2. Tissue-Tek OCT (Sakura Finetek Europe) for embedding of frozen tissue sample. 3. Disposable microtome blade. 4. RNase free and sterile glass slides. 5. Coverslips. 6. Histomount (Pertex, Histolab Products AB, Göteborg Sweden). 7. Hematoxylin and eosin staining solution. 8. Light microscope (i.e., BH2, Olympus). 9. RNase-free 200-ml tubes. 10. Sterile disposable hypodermic needles (size 0.60 × 25 mm, 23 gauge × 1″, i.e., Braun Sterican). 11. Extraction buffer from MicoPure RNA isolation kit (Arcturus Engineering).
2.2. RNA Isolation
1. MicoPure RNA isolation kit (Arcturus Engineering). 2. Nuclease-free pipette tips. 3. 0.5-ml microcentrifuge tubes. 4. 2-ml lidless tubes. 5. RNase-Free DNase Set (Qiagen Catalog #79254).
2.3. Preparation of Amplified Antisense RNA
1. RNA amplification kit, e.g., MessageAmp from Ambion (Huntingdon, UK).
2.4. cDNA Synthesis
1. DEPC-treated water: Add 2 ml diethylpyrocarbonate (DEPC) to 1 l of water (see Note 1), shake for 30 min, and autoclave. 2. SAGErandom oligonucleotide 5¢-NNN NNN CATG-3¢.
Manual Microdissection Combined with Antisense RNA–LongSAGE
139
3. dNTP mix, 10 mM each, store aliquots at −20°C. 4. Dry ice and wet ice. 5. 5× First Strand Buffer, 0.1 M DTT, RNaseOUT, and SuperScript III Reverse Transcriptase (all from Invitrogen, Karlsruhe, Germany). 6. 5× Second Strand Buffer: 94 mM Tris/HCl, pH 6.9, 453 mM KCl, 23 mM MgCl2, 50 mM (NH4)2SO4, 0.75 mM b-NAD. 7. 5 U/ml RNaseH (USB, Cleveland, OH, USA) is diluted with Second Strand Buffer to a final concentration of 2 U/ml. 8. 1.5-ml sterile siliconized microcentrifuge tubes (Ambion). 9. Oligo(dT)25 Beads (Dynal Biotech, Hamburg, Germany) and a magnetic stand. 10. Overhead shaker. 11. Binding buffer and washing buffer B from Dynal (Dynabeads® mRNA Purification Kit). 12. 11.8 U/ml Escherichia coli DNA polymerase I (USB), 10 U/ml E. coli DNA ligase (USB) and 3 U/ml T4 DNA Polymerase (NEB, Frankfurt A.M., Germany). 13. Thermomixer and thermal cycler for incubation steps. 14. 0.5 M EDTA. 15. Buffer BW: 1 M NaCl, 5 mM Tris-HCl, pH 7.5, 0.5 mM EDTA; and BW/BSA: Buffer BW containing 0.1 mg/ml BSA (NEB). 16. 10× Buffer K from Promega (Ingelheim, Germany). 2.5. Cleavage of cDNA with the Anchoring Enzyme (Hsp92II)
1. 100× BSA (10 mg/ml, NEB). 2. 10 U/ml Hsp92II (see Note 2) and 10× Buffer K (both from Promega). 3. 5× ligase buffer from Invitrogen.
2.6. Ligating Linkers to Bound cDNA
1. Linker1A:
5¢-TTT GGA TTT GCT GGT GCA GTA CAA CTA GGC TTA ATA TCC GAC ATG-3¢
Linker1B:
5¢-TCG GAT ATT AAG CCT AGT TGT ACT GCA CCA GCA AAT CC(Amino C7)-3¢
Linker2A:
5¢-TTT CTG CTC GAA TTC AAG CTT CTA ACG ATG TAC GTC CGA CAT G-3¢
140
Lüttges, Hahn, and Heidenblut
Linker2B:
5¢-TCG GAC GTA CAT CGT TAG AAG CTT GAA TTC GAG CAG(Amino C7)-3¢
Linker oligonucleotides are dissolved in loTE to a final concentration of 350 ng/ml and stored at −20°C
2. loTE: 3.0 mM Tris-HCl, pH 7.5, 0.3 mM EDTA. 3. 10× polynucleotide kinase buffer, 10 mM ATP and T4 polynucleotide-kinase (10 m/ml, all from NEB). 4. 5 U/ml HC T4 ligase and 5× ligase buffer (both from Invitrogen). 5. 10× buffer 4 from NEB 2.7. Release of cDNA Tags Using the Tagging Enzyme MmeI
1. 32 mM S-adenosylmethionine (NEB). 2. 2 U/ml MmeI and 10× Buffer 4 from NEB. 3. PC8: Roti®-phenol/chloroform/isoamyl alcohol, pH 7.5– 8.0, from Roth (Karlsruhe, Germany), store at 4°C. 4. 10 M ammonium acetate solution, glycogen (store at −20°C) and ethanol.
2.8. Ligating Tags to form Ditags
5 U/ml HC T4 ligase and 5× ligase buffer (both from Invitrogen).
2.9. PCR Amplification of Ditags
1. 10× BV-Mg Buffer: 670 mM Tris–HCl, pH 8.8, 167 mM (NH4)2SO4, 67 mM MgCl2, 100 mM b-mercaptoethanol. 2. DMSO, store at −20°C. 3. dNTP mix, 10 mM each, store aliquots at −20°C. 4. Oligonucleotides “Primer 1”: 5¢-GTG CTC GTG GGA TTT GCT GGT GCA GTA CA-3¢ and “Primer 2”: 5¢-GAG CTC GTG CTG CTC GAA TTC AAG CTT CT-3¢. PCR primers are resuspended in loTE to a final concentration of 350 ng/ml and stored at −20°C. 5. Taq polymerase. 6. 40% acrylamide/bis solution (19:1, this is a neurotoxin when unpolymerized; handle with care) and N,N,N¢,N¢tetramethylethylendiamine (TEMED). 7. Ammonium persulfate: prepare 10% solution in water and store at 4°C. 8. Molecular weight marker for gel electrophoresis: 25-bp ladder from Invitrogen, diluted to a final concentration of 50 ng/ml. 9. Running buffer TAE: 40 mM Tris base, 20 mM acetic acid, 2 mM EDTA.
Manual Microdissection Combined with Antisense RNA–LongSAGE
141
10. Loading buffer: 20.0% (w/v) Ficoll® 70, 1.6% (v/v) glycerol, 0.01% (w/v) lauryl sarcosine, 0.001% (w/v) xylencyanole, 0.001% (w/v) bromphenol blue in TAE. 11. Staining solution: dilute 5 ml SYBR Green 1 concentrate (BioWhittaker, Rockland, ME, USA; SYBR Green 1 is toxic, handle with care) with 50 ml TAE. Prepare staining solution fresh as required, SYBR Green 1 is not stable in aqueous solution. 12. 15-ml polypropylene tubes and 5-ml glass pipette for PC8 extraction of amplified ditags. 13. PC8: Roti®-phenol/chloroform/isoamyl alcohol, pH 7.5– 8.0, from Roth, store at 4°C. 14. 15-ml centrifugation tubes. 15. 10 M ammonium acetate solution, glycogen (store at −20°C), and ethanol. 2.10. Isolation of Ditags
1. 100× BSA, 10 mg/ml. 2. 10 U/ml Hsp92II and 10× Buffer K (both from Promega). 3. PC8: Roti®-phenol/chloroform/isoamyl alcohol, pH 7.5– 8.0, from Roth, store at 4°C. 4. 10 M ammonium acetate solution, 20 mg/ml glycogen (store at −20°C), and ethanol. 5. TE solution (Invitrogen). 6. 40% acrylamide/bis solution (19:1, this is a neurotoxin when unpolymerized; handle with care) and N,N,N¢,N¢tetramethylethylendiamine (TEMED). 7. Ammonium persulfate: prepare 10% solution in water and store at 4°C. 8. Glycerol. 9. Molecular weight marker for gel electrophoresis: 25-bp ladder from Invitrogen, diluted to a final concentration of 50 ng/ml. 10. Running buffer TAE: 40 mM Tris base, 20 mM acetic acid, 2 mM EDTA. 11. Staining solution: dilute 5 ml SYBR Green 1 concentrate (BioWhittaker, SYBR Green 1 is toxic, handle with care) with 50 ml TAE. Prepare staining solution fresh as required, SYBR Green 1 is not stable in aqueous solution. 12. Electroelution device (Elutrap system from Schleicher & Schüll, Dassel, Germany). 13. Chloroform. 14. 3 M sodium acetate solution, pH 5.2, glycogen (store at −20°C), and ethanol.
142
Lüttges, Hahn, and Heidenblut
2.11. Concatenation of Ditags
1. 5 U/ml HC T4 ligase and 5× ligase buffer (both from Invitrogen). 2. PC8: Roti®-phenol/chloroform/isoamyl alcohol, pH 7.5– 8.0, from Roth, store at 4°C. 3. 10 M ammonium acetate solution, glycogen (store at −20°C), and ethanol. 4. 40% acrylamide/bis solution (19:1, this is a neurotoxin when unpolymerized; handle with care) and N,N,N¢,N¢tetramethylethylendiamine (TEMED). 5. Ammonium persulfate: prepare 10% solution in water and store at 4°C. 6. Loading buffer: 20.0% (w/v) Ficoll® 70, 1.6% (v/v) glycerol, 0.01% (w/v) lauryl sarcosine, 0.001% (w/v) xylencyanole, 0.001% (w/v) bromphenol blue in TAE. 7. Molecular weight marker: Smart Ladder Short Fragments from Eurogentech, Searing, Belgium. 8. Running buffer TAE: 40 mM Tris base, 20 mM acetic acid, 2 mM EDTA. 9. Staining solution: dilute 5 ml SYBR Green 1 concentrate (BioWhittaker, SYBR Green 1² is toxic, handle with care) with 50 ml TAE. Prepare staining solution fresh as required, SYBR Green 1 is not stable in aqueous solution. 10. Electroelution device (Elutrap system from Schleicher & Schüll). 11. Chloroform. 12. 3 M sodium acetate solution, pH 5.2.
2.12. Cloning Concatemers
1. pZERO-1 supercoiled (1 mg/ml; part of the Zero Background Cloning kit from Invitrogen). 2. 10× Buffer 2 (NEB) and 5 U/ml SphI (NEB). 3. LoTE (s. 2.6.2) and TE (Invitrogen). 4. PC8: Roti®-phenol/chloroform/isoamyl alcohol, pH 7.5– 8.0, from Roth, store at 4°C. 5. 10 M ammonium acetate solution, glycogen (store at −20°C), and ethanol. 6. 5× ligase buffer and 1 U/ml T4 DNA ligase (both from Invitrogen).
2.13. Transformation of Bacteria
1. Electrocompetent E. coli Top Ten bacteria (part of the Zero Background Cloning kit from Invitrogen). 2. Electroporation device. 3. LB broth (Sigma, Taufkirchen, Germany).
Manual Microdissection Combined with Antisense RNA–LongSAGE
143
4. 100 mg/ml Zeocin (Invitrogen, store at −20°C, this antibiotic is light sensitive). 5. X-Gal (5-bromo-4chloro-3-indoxyl-ß-d-galactopyranoside from Roth, store at −20°C). 6. LB-Zeocin-X-Gal-Agar: 50 mg/ml Zeocin, 80 mg/ml X-Gal, 1.5% (w/v) agar in LB broth. 2.14. Insert-PCR
1. 5× RDA buffer: 335 mM Tris-HCl, pH 8.8, 80 mM (NH4)2SO4, 50 mM b-mercaptoethanol, 0.5 mg/ml BSA. 2. 50 mM magnesium chloride solution. 3. Oligonucleotides Insert_for: 5¢-CTG GTT AAC CTT ACT GGC TGA GTT AGC TCA CTC ATT AGG CAC-3¢ and Insert_rev: 5¢-TGT AAA ACG ACG GCC AGT TAC GAC TCA CTA TAG GGC GAA TTG-3¢. 4. 10 mM dNTP-Mix (Promega, 10 mM each). 5. Taq polymerase.
3. Methods 3.1. Standard Operating Procedure for Specimen Isolation and Storage
1. Immediately after resection, the specimen has to be placed on crushed ice. 2. Report the time of ischemia (time of resection until processing in pathology). 3. Immediately perform a gross pathology inspection of the tissue. 4. Dissect the tissue of choice to samples of approximate 0.5 cm3 (~0.5 × 0.5 × 0.5). 5. Wrap tissue in tinfoil. 6. Snap freeze wrapped tissue pieces in liquid nitrogen. 7. Transfer frozen tissues for long-term storage into a −80°C freezer.
3.2. Microdissection
1. Take care that all areas of the microtome that will come into contact with the tissue will be RNase-free by treating the microtome with 100% ethanol. It is highly recommended to use disposable blades. 2. Mount your frozen tissue block using Tissue-Tek OCT. 3. Prepare for each specimen a 5-mm frozen tissue section on a standard glass slide. This tissue section serves as reference for identifying the areas of interest for microdissection and is stored for documentation.
144
Lüttges, Hahn, and Heidenblut
4. Stain each slide with standard hematoxylin and eosin (H&E) using cooled dyes and seal it with Histomount (Pertex) and a coverslip. 5. Identify via a light microscope on the stained reference section tissue areas containing cell of interest. This helps to identify the required cells during subsequent microdissection. 6. Generate one to several 10-mm tissue sections from the remaining tissue block via serial sectioning using sterile and RNase-free frosted cooled glass slides and store the slides at −20°C until subsequent processing steps. 7. Briefly fix tissue sections in RNase-free ethanol (Merck, Darmstadt, Germany). 8. Stain each section with standard H&E staining chemistry using dyes cooled to 4°C (do not seal sections with coverslips) and store sections until microdissection at −20°C. 9. Manually dissect tissue under microscope (i.e., BH2, Olympus) using sterile disposable hypodermic needles (size 0.60 × 25 mm, 23 gauge × 1″, i.e., Braun Sterican). Collect the cells in a 200-ml RNase-free reaction tube containing 50 ml extraction buffer (PicoPure RNA isolation kit, Arcturus Engineering). 10. Incubate the tube containing cells for 30 min at 42°C in an incubation oven. 11. Proceed with RNA isolation protocol or freeze the cell extract at −80°C. 3.3. RNA Isolation
This protocol follows, with some minor modifications, the protocol “C” by Arcturus for “Use with CapSure Macro LCM Caps”: 1. Pipette 250 ml Conditioning Buffer (CB) onto the purification column filter membrane. 2. Incubate the RNA purification column with conditioning buffer for 5 min at room temperature. 3. Centrifuge the purification column in the provided collection tube at 16,000 × g for 1 min. 4. Pipette 50 ml of 70% ethanol (EtOH) into the cell extract from Subheading 3.2. Mix well by pipetting up and down. DO NOT CENTRIFUGE. 5. Pipette the cell extract and EtOH mixture into the preconditioned purification column. The cell extract and EtOH will have a combined volume of approximately 100 ml. 6. To bind RNA to the column, centrifuge for 2 min at 100 × g. 7. Immediately follow with a centrifugation at 16,000 × g for 30 s to remove the flow-through.
Manual Microdissection Combined with Antisense RNA–LongSAGE
145
8. Pipette 100 ml Wash Buffer (W1) into the purification column and centrifuge for 1 min at 8,000 × g. 9. Add 5 ml of DNase I (Qiagen) to 35 ml of RDD-buffer (Qiagen), mix well, and add the mixture to the column. 10. Incubate for 15 min at room temperature. 11. Add 40 ml of Wash Buffer 1 (W1) to the column and centrifuge for 15 s at 8,000 × g. 12. Pipette 100 ml Wash Buffer 2 (W2) into the purification column and centrifuge for 1 min at 8,000 × g. 13. Pipette another 100 ml Wash Buffer (W2) into the purification column and centrifuge for 2 min at 16,000 × g. 14. Remove the flow-through and centrifuge again at 16,000 × g for 1 min. 15. Transfer the purification column to a new 0.5-ml microcentrifuge tube provided in the kit. 16. Pipette 12.5 ml nuclease free water (Qiagen) directly onto the membrane of the purification column (gently touch the tip of the pipette to the surface of the membrane while dispensing the water to ensure maximum absorption of water into the membrane). 17. Incubate the purification column for 1 min at room temperature. 18. Centrifuge the column for 1 min at 1,000 × g to distribute the water in the column. 19. Centrifuge for 1 min at 16,000 × g to elute RNA. 20. The isolated RNA is now ready for use in downstream applications or may be stored at −80°C until use. 21. Optional: The quality of the RNA can be analyzed on a RNA PicoChip on a BioAnalyzer platform (Agilent, Böblingen, Germany). 3.4. Preparation of Amplified Antisense RNA
Several companies sell kits for RNA amplification via T7 promoterdriven in vitro transcription of cDNA. All protocols start with cDNA synthesis using a modified oligo(dT) primer containing the T7 RNA polymerase promoter. After purification of the cDNA, the in vitro transcription is carried out followed by purification of the amplified antisense RNA (aRNA). The incubation time for the in vitro transcription varies between protocols. Longer incubation times give a higher yield of aRNA but might also lead to degradation of part of the aRNA. For the aRNA-longSAGE protocol, the in vitro transcription is carried out for 18 h. Shorter incubation times are possible, especially if more than 40 ng of total RNA is available for the amplification procedure. The yield of aRNA can be estimated using an RNA PicoChip on a
146
Lüttges, Hahn, and Heidenblut
BioAnalyzer platform (Agilent, Böblingen, Germany). A minimum of 1.2 mg of aRNA should be used for the generation of an aRNA-longSAGE library (see Note 3). 3.5. cDNA Synthesis
1. Add DEPC-treated water to the aRNA to a final volume of 10 ml, then add 2 ml of SAGErandom oligonucleotide and 1 ml of 10 mM dNTP mix and incubate for 5 min at 65°C in a thermal cycler. After the incubation place sample on dry ice, thaw on wet ice and add 4 ml First Strand Buffer, 1 ml of 0.1 M DTT, 1 ml RNaseOUT, and 1 ml SuperScript™ III Reverse Transcriptase. Incubate in a thermal cycler for 5 min at 37°C, 1 h at 50°C, and 15 min at 70°C. 2. Add 1 ml of RNase H (2 U/ml, diluted in Second Strand Buffer) and incubate for 20 min at 37°C in a thermal cycler. 3. Add 0.5 ml DEPC-treated water, mix well, and remove 0.5 ml of the sample for loading on a 1% agarose gel. 4. Add 79 ml of DEPC-treated water. 5. Wash 200 ml Oligo(dT)25 Beads with 100 ml of binding buffer and resuspend in 100 ml of binding buffer. 6. Mix the sample with resuspended beads in a siliconized microcentrifuge tube (see Note 4). Put the sample in an overhead shaker and rotate for 15 min at room temperature. 7. Wash sample twice with 200 ml Washing Buffer B and four times with 200 ml Second Strand Buffer. 8. Resuspend beads in 112.25 ml of ice-cold DEPC-treated water and add the following components on ice: 32 ml of 5× Second Strand Buffer, 6 ml of 0.1 M DTT, 3 ml dNTPMix (10 mM each), 4.5 ml E. coli DNA Polymerase I (11.8 U/ml), 1.5 ml E. coli DNA ligase (10 U/ml), and 0.75 ml E. coli RNaseH (2 U/ml, diluted in Second Strand Buffer). 9. Incubate in a thermomixer for 2.5 h at 16°C. To keep beads in suspension, mix the sample every 15 min on a slow speed vortex (use a setting of 5). 10. Add 4 ml T4 DNA polymerase (3 U/ml) and incubate for 5 min at 16°C. 11. Add 4 ml of 0.5 M EDTA and 750 ml of 1× BW and incubate for 20 min at 75°C. 12. Wash beads once with 750 ml of 1× BW, four times with 750 ml of 1× BW/1× BSA, and twice with 200 ml of 1× Promega Buffer K that contains 0.1 mg/ml BSA.
3.6. Cleavage of cDNA with the Anchoring Enzyme (Hsp92II)
1. Resuspend beads in 200 ml reaction mix containing 1× Puffer K (Promega), 0.1 mg/ml BSA, and 50 U Hsp92II (see Note 2) and incubate in a thermomixer for 1 h at 37°C. To keep beads in suspension, mix the sample every 15 min on a slow speed vortex (use a setting of 5).
Manual Microdissection Combined with Antisense RNA–LongSAGE
147
2. Wash beads once with 750 ml of 1× BW, four times with 750 ml of 1× BW/1× BSA, and twice with 200 ml of 1× ligase buffer (Invitrogen). 3. Resuspend beads in 200 ml of 1× ligase buffer (Invitrogen). 3.7. Ligating Linkers to Bound cDNA
Prior to the first use, the linker oligonucleotides are phosphorylated and hybridized to obtain linkers “1” and “2.” Phosphorylated and hybridized linkers can be stored at −20°C in aliquots for single use. 1. Linker oligonucleotides “1B” and “2B” are phosphorylated in two separate tubes by adding 6 ml loTE, 2 ml of 10× Polynucleotide Kinase Buffer (NEB), 2 ml of 10 mM ATP (NEB), and 1 ml T4 Polynucleotide Kinase (10 U/ml, NEB) to 9 ml linker oligonucleotide (350 ng/ml). The tubes are incubated in a thermal cycler for 30 min at 37°C and then for 10 min at 65°C. 2. To hybridize linkers, mix the phosphorylated “Linker B” molecules with 9 ml of the appropriate “Linker A” oligonucleotide (350 ng/ml), i.e., mix phosphorylated Linker 1B with 9 ml Linker 1A and phosphorylated Linker 2B with 9 ml Linker 2A. Incubate both tubes for 2 min at 95°C, 10 min at 65°C, 10 min at 37°C, and 20 min at 22°C in a thermal cycler. Add 271 ml loTE to each tube, aliquot, and store linkers at −20°C (final concentration of linkers is 20 ng/ml). 3. To ligate linkers to the immobilized cDNA divide sample (200 ml beads in 1× ligase buffer) equally in two new tubes. 4. Remove the supernatant and resuspend in 9 ml reaction mix containing 5 ml loTE, 2 ml of 5× ligase buffer and 2 ml of kinased and annealed Linker 1 or 2, respectively. 5. Incubate the sample for 2 min at 50°C, then for 10 min at room temperature. 6. Add 1 ml HC T4 ligase (5 U/ml, Invitrogen) to each tube, vortex carefully. 7. Incubate at 16°C for 1¾ h in a thermomixer. To keep beads in suspension, mix the sample every 15 min on a slow speed vortex (use a setting of 5). 8. Wash beads once with 500 ml of 1× BW/1× BSA. 9. Unite ligation reactions 1 and 2 in a new tube. 10. Wash beads three times with 500 ml of 1× BW/1× BSA, once with 200 ml of 1× BW, and once with 200 ml of 1× Buffer 4 (NEB). 11. Resuspend beads in 200 ml of 1× Buffer 4 (NEB) and store overnight at 4°C. 12. Wash beads twice with 200 ml of 1× Buffer 4 that was prewarmed to 37°C.
148
Lüttges, Hahn, and Heidenblut
3.8. Release of cDNA Tags Using the Tagging Enzyme MmeI
1. Prepare a 1 mM S-Adenosylmethionine (SAM) solution by diluting the 32 mM SAM solution that comes with the MmeI enzyme. 2. Resuspend beads in 200 ml prewarmed (37°C) reaction mix containing 1 × Buffer 4 (NEB), 0.05 mM SAM, and 8 U of MmeI. 3. Incubate at 37°C for 1 h in a thermomixer. To keep beads in suspension, mix the sample every 15 min on a slow speed vortex (use a setting of 5). 4. Centrifugate at 16,110 × g for 2 min in a microcentrifuge. 5. Transfer supernatant to a new microcentrifuge tube (there is no longer a need to use siliconized tubes). 6. Resuspend beads in 40 ml loTE. 7. Centrifugate at 16,110 × g for 2 min. 8. Remove supernatant and unite it with the supernatant of the first centrifugation step (total volume: 240 ml). 9. Do a PC8 extraction: Add an equal volume of PC8 to the sample, mix well on a vortex, centrifugate at 16,110 × g for 2 min, and transfer the upper (aqueous) phase to a fresh microcentrifuge tube. 10. Remove 40 ml of the sample for use as a “no ligase” control during ditag ligation and PCR amplification of ditags. Dilute this negative control with 160 ml loTE. 11. Precipitate the sample and negative control by adding 100 ml of 10 M ammonium acetate, 3 ml glycogen, and 1 ml 100% ethanol, and centrifuging for 30 min at 4°C at 16,110 × g. 12. Wash each pellet three times with 500 ml of 70% ethanol. 13. Resuspend the sample in 1.5 ml loTE and 2.5 ml water; resuspend the negative control in 1.5 ml loTE and 3.3 ml water. Incubate both tubes at room temperature for 5 min.
3.9. Ligating Tags to form Ditags
1. Add 1.2 ml of 5× ligase buffer to the sample and negative control, then add 0.8 ml HC T4 ligase (5 U/ml, Invitrogen) to the sample but not to the negative control. 2. Incubate for 2.5 h at 16°C in a thermal cycler. 3. Add 15 ml loTE to the sample and to the negative control.
3.10. PCR Amplification of Ditags
To optimize PCR conditions, a test PCR is run using different dilutions of the ditags (1:50/1:100/1:200/1:400 in loTE) at 26, 28, and 31 PCR cycles. A 1:50 dilution of the minus ligase control run at 31 cycles serves as a negative control. Prepare the PCRs under a laminar-flow hood to avoid contamination of the sample.
Manual Microdissection Combined with Antisense RNA–LongSAGE
149
1. For each PCR, mix 1 ml of template (diluted ditags), 4 ml of 10× BV-Mg Buffer, 3 ml DMSO, 5 ml dNTP mix (10 mM each), 1 ml of each PCR primer (350 ng/ml), and 25 ml of water. 2. Add a drop of mineral oil to each well and incubate in a thermal cycler for 3 min at 95°C, then hold the temperature at 78°C and add 10 ml of polymerase mix containing 3 ml of Taq polymerase in 1× BV-Mg Buffer to each well. 3. Run PCR for 26, 28, and 31 cycles in parallel, each cycle consisting of 30 s at 95°C, 30 s at 55°C, and 30 s at 70°C. After the last PCR cycle, incubate for 5 min at 70°C. 4. Load 5 ml of each PCR on a 20 × 20-cm polyacrylamide gel (12%, 19:1 acrylamide/bis). Run the gel in TAE buffer at 180 V until the bromphenol blue band of the marker has traveled a distance of approximately 8 cm (see Note 5). Stain the gel in SYBR Green 1 solution for 15 min and visualize the bands under UV light. 5. Use PCR conditions that were optimal in the test PCR for large-scale PCR. Large-scale PCR consists of 96 PCRs that are run in parallel and then pooled in a 15-ml polypropylene tube (see Note 6). 6. Centrifuge for 1 min at 2,630 × g and remove the mineral oil. 7. Extract with an equal volume of PC8, then centrifuge for 10 min at 2,200 × g. 8. Transfer 2.1 ml of the sample (upper phase) in each of two centrifuge tubes, add 700 ml of 10 M ammonium acetate, 18 ml of glycogen, and 6 ml of ethanol to each tube. Mix well and centrifuge for 30 min at 4°C and 12,000 × g. 9. Wash the pellet twice with 5 ml of 70% ethanol, remove the supernatant, and air-dry the pellet. 10. Resuspend each pellet in 45 ml loTE. 11. Incubate for 5–10 min at 37°C to aid solubilization. 3.11. Isolation of Ditags
1. Mix the complete sample (approximately 90 ml) with 68 ml water, 20 ml of 10× Buffer K, 2 ml BSA (10 mg/ml), and 20 ml Hsp92II (10 U/ml) and incubate for 1 h at 37°C in a heating block. 2. Do a PC8 extraction, then add 66.7 ml of 10 M ammonium acetate, 3 ml of glycogen, and 1 ml of ethanol. 3. Precipitate the ditags overnight at −70°C. 4. Centrifuge for 30 min at 4°C and 16,110 × g, wash the pellet twice with 500 ml of 70% ethanol, dry the pellet for 10 min at 16°C, and resuspend the pellet in 90 ml of TE.
150
Lüttges, Hahn, and Heidenblut
5. Add 5 ml of glycerol (see Note 7). 6. Load complete sample on a 20 × 20-cm polyacrylamide gel (12%, 19:1 acrylamide/bis). Run gel at 4°C and 180 V until the bromphenol blue band of the marker has traveled a distance of approximately 8 cm (see Note 5). Stain the gel in SYBR Green 1 solution for 15 min and visualize the bands under UV light. 7. Cut out the 34-bp ditag band. 8. For electroelution of the ditags, prepare the electroelution device in such a way that the elution chamber is 2 U-inserts wide and the trap is 1 U-insert wide. Put the gel slices into the elution chamber and electroelute at 4°C and 150 V for 2 h, then reverse the polarity and turn on at 200 V for 20 s. Transfer the eluted ditags (1 ml sample volume) from the trap to two microcentrifuge tubes (see Note 8). 9. Do a PC8 extraction. 10. Extract the aqueous phase with an equal volume of chloroform. 11. Precipitate ditags by adding 50 ml of 3 M sodium acetate, 2 ml glycogen, and 1,250 ml ethanol to each tube. Incubate at −70°C overnight. 12. Centrifuge at 4°C and 16,110 × g for 30 min, wash the pellets twice with 500 ml of 70% ethanol, air-dry the pellets on ice, and resuspend both pellets in altogether 7 ml loTE. 3.12. Concatenation of Ditags
1. Add 2 ml of 5× ligase buffer and 1 ml T4 ligase HC (5 U/ml; Invitrogen). 2. Incubate in a thermal cycler at 16°C for 30 min. 3. Add 190 ml loTE and extract with 200 ml PC8. 4. Add 100 ml of 10 M ammonium acetate solution, 3 ml of glycogen, and 700 ml of ethanol, keep on ice for 10 min, then centrifuge 15 min at 16,110 × g. Wash the pellet twice with 500 ml of 70% ethanol and resuspend in 10 ml loTE. 5. Add 5 ml of loading buffer, incubate for 10 min at 65°C, chill the sample on ice, and load the sample on one lane of an 8% polyacrylamide gel (acrylamide/bis 19:1, see Note 9). 6. Electrophorese for 3 h at 130 V. Stain the gel in SYBR Green 1 solution for 15 min. 7. Visualize the bands under UV light and excise concatemers >300 bp from the gel (see Note 10). Do not excise the large concatemers at the upper edge of the well (leave a margin of 1 mm gel at the upper edge of the well). 8. For electroelution of the concatemers, prepare the electroelution device in such a way that the elution chamber is 2 U-inserts wide and the trap is 1 U-insert wide. Put the gel
Manual Microdissection Combined with Antisense RNA–LongSAGE
151
slices into the elution chamber and electroelute for 60 min at room temperature then reverse polarity and turn on at 200 V for 20 s. Transfer the eluted ditags (1 ml sample volume) from the trap to two microcentrifuge tubes. 9. Do a PC8 extraction. 10. Extract the aqueous phase with an equal volume of chloroform. 11. Precipitate ditags by adding 50 ml of 3 M sodium acetate, 2 ml glycogen, and 1,250 ml ethanol to each tube. Incubate at −20°C for 1 h or overnight. 12. Centrifuge at 4°C and 16,110 × g for 15 min, wash the pellets twice with 500 ml of 70% ethanol, air-dry the pellets, and resuspend both pellets in altogether 15 ml of water. 3.13. Cloning of Concatemers
1. Mix 1 ml pZERO-1 supercoiled cloning vector (1 mg/ml, Invitrogen) with 2 ml of 10× Buffer 2 (NEB), 16 ml water, and 1 ml SphI (5 U/ml, NEB). 2. Incubate for 15 min at 37°C in a waterbath (see Note 11). 3. Add 180 ml loTE and do a PC8 extraction with 200 ml PC8. 4. Precipitate the linearized vector by adding 66.7 ml of 10 M ammonium acetate, 3 ml glycogen, and 1 ml ethanol, and centrifuging for 10 min at 16,110 × g. 5. Wash the pellet three times with 500 ml of 70% ethanol. 6. Resuspend the air-dried pellet in 40 ml TE (final concentration of the linearized vector is 25 ng/ml). 7. Mix 1 ml of the linearized pZERO-1 with 6 ml concatemers, 2 ml of 5× ligase Buffer, and 1 ml T4 DNA ligase (1 U/ml, both Invitrogen). Incubate for 1 h at 16°C and another hour at room temperature. 8. Add 190 ml loTE and do a PC8 extraction with 200 ml PC8. 9. Precipitate the sample by adding 66.7 ml of 10 M ammonium acetate, 3 ml glycogen, and 1 ml ethanol, and centrifuging for 20 min at 16,110 × g and 4°C. 10. Wash the pellet four times with 500 ml of 70% ethanol. 11. Resuspend the air-dried pellet in 8 ml loTE.
3.14. Transformation of Bacteria
1. Use 0.8 ml cloned concatemers to electroporate an aliquot (40 ml) of electrocompetent E. coli Top Ten (Voltage: 1,800 V, see Note 12). 2. Resuspend electroporated bacteria in 1 ml of LB medium. 3. Incubate for 1 h at 37°C and 220 rpm.
152
Lüttges, Hahn, and Heidenblut
4. Plate 300 ml bacteria suspension on each of three 14.5-cm LBZeocin-X-Gal plates. 3.15. Insert-PCR
1. For each PCR, mix 2 ml of 5× RDA-buffer, 1.2 ml of 50 mM MgCl2, 0.3 ml of each primer, 0.3 ml of 10 mM dNTP-Mix, and 5.9 ml of water. 2. Pipette 10 ml of the PCR mix into the wells of a 96-well plate and add a drop of mineral oil to each well. 3. Use a sterile toothpick to gently touch a white bacteria colony (see Note 13) and then dip it into the PCR mix. 4. Incubate in a thermal cycler for 2 min at 95°C then hold the temperature at 78°C and add 5 ml of polymerase mix containing 1 ml of Taq polymerase in 1× RDA Buffer to each well. Run five cycles consisting of 30 s at 95°C, 30 s at 60°C, and 45 s at 72°C, then run an additional 30 cycles consisting of 30 s at 95°C and 60 s at 70°C. 5. Run 5 ml of each PCR of a 1.5% agarose gel to check the insert sizes of the SAGE library. Empty vectors will give a 330-bp PCR product.
4. Notes 1. Unless mentioned otherwise, water means water with a conductivity of at least 18 MW. 2. Hsp92II is an isoschizomer of NlaIII that can be stored at −20°C. Due to different unit definitions for Hsp92II and NlaIII, the volume of Hsp92II that is needed for digestion steps is much higher than the volume of NlaIII. 3. If there is more than 1.2 mg of aRNA available, use up to 2.5 mg of aRNA for the generation of an aRNA-longSAGE library. More starting material tends to generate larger insert sizes in our hands. 4. Use siliconized tubes when dealing with magnetic beads to prevent the beads from adsorbing to the surface of the tube. Wash the beads by resuspending on a slow speed vortex (use a setting of 5) instead of pipetting the beads up and down in order to minimize loss of beads by adsorption to pipette tips. 5. Keeping the traveling distance constant will result in equal electrophoresis conditions between libraries better than keeping traveling time constant. Eight centimeters of traveling distance on a 20 × 20-cm gel gives a good separation of ditags from linkers.
Manual Microdissection Combined with Antisense RNA–LongSAGE
153
6. Make sure not to use polystyrene tubes because polystyrene reacts with PC8. 7. Glycerol is added instead of loading buffer to avoid contamination of the ditags. Adding glycerol is essential to increase the density of the sample. Without glycerol, the sample will be lost by diffusing into the running buffer. 8. Electroelution gives a higher yield of regained sample than gel elution by diffusion as is done in the standard SAGE protocol. 9. This is a different gel than used in the standard SAGE protocol. Using an acrylamide/bis proportion of 19:1 instead of 37.5:1 gives a better separation of undesired small concatemers from the concatemers that are cut out from the gel. 10. For no obvious reason, the concatenation step may not work on the first try for each library. Because this protocol does use only a small fraction of synthesized ditags as template for large-scale PCR it is possible to try again with a new largescale PCR. 11. A fully linearized vector is important for the success of the cloning step. Check on an agarose gel whether the vector is fully linearized. If the digestion with SphI did not yield fully linearized vector, the linearization should be repeated with a longer incubation time or with more than 5 U of SphI. 12. Use bacteria from the Zero Background Cloning Kit (Invitrogen). Prepare competent bacteria according to the instructions given in the kit. In our hands, this bacteria strain is better than the E. coli DH10B recommended in the original SAGE protocol. 13. Blue white screening helps to chose colonies with large inserts. Even though there are white colonies with short inserts as well as blue colonies with long inserts, all in all, the average insert size is longer for white colonies than for blue ones.
References 1. Heidenblut, A. M., Luttges, J., Buchholz, M., Heinitz, C., Emmersen, J., Nielsen, K. L., Schreiter, P., Souquet, M., Nowacki, S., Herbrand, U., Kloppel, G., Schmiegel, W., Gress, T. and Hahn, S. A. (2004) aRNA-longSAGE: a new approach to generate SAGE libraries from microdissected cells. Nucleic Acids Res, 32, E131. 2. Van Gelder, R. N., von Zastrow, M. E., Yool, A., Dement, W. C., Barchas, J. D. and Eberwine, J. H. (1990) Amplified RNA synthesized from limited quantities of heterogeneous cDNA. Proc Natl Acad Sci USA, 87, 1663–1667.
3. Puskas, L. G., Zvara, A., Hackler, L., Jr. and Van Hummelen, P. (2002) RNA amplification results in reproducible microarray data with slight ratio bias. Biotechniques, 32, 1330– 1334, 1336, 1338, 1340. 4. Polacek, D. C., Passerini, A. G., Shi, C., Francesco, N. M., Manduchi, E., Grant, G. R., Powell, S., Bischof, H., Winkler, H., Stoeckert, C. J., Jr. and Davies, P. F. (2003) Fidelity and enhanced sensitivity of differential transcription profiles following linear amplification of nanogram amounts of endothelial mRNA. Physiol Genomics, 13, 147–156.
154
Lüttges, Hahn, and Heidenblut
5. Feldman, A. L., Costouros, N. G., Wang, E., Qian, M., Marincola, F. M., Alexander, H. R. and Libutti, S. K. (2002) Advantages of mRNA amplification for microarray analysis. Biotechniques, 33, 906–912, 914. 6. St Croix, B., Rago, C., Velculescu, V., Traverso, G., Romans, K. E., Montgomery, E., Lal, A., Riggins, G. J., Lengauer, C., Vogelstein, B.
and Kinzler, K. W. (2000) Genes expressed in human tumor endothelium. Science, 289, 1197–1202. 7. Saha, S., Sparks, A. B., Rago, C., Akmaev, V., Wang, C. J., Vogelstein, B., Kinzler, K. W. and Velculescu, V. E. (2002) Using the transcriptome to annotate the genome. Nat Biotechnol, 20, 508–512.
Chapter 9 Quantitative DNA Methylation Profiling on a High-Density Oligonucleotide Microarray Anne Fassbender, Jörn Lewin, Thomas König, Tamas Rujan, Cecile Pelet, Ralf Lesche, Jürgen Distler, and Matthias Schuster Summary Recently, the analysis and functional elucidation of CpG island methylation has become a focus area of genomic research. Deviations from the normal parental imprinting pattern have been shown to cause developmental defects associated with serious symptoms. Aberrant DNA methylation of tumor suppressor and other functional genes, especially when found in 5¢ untranslated regions and early exons, has been associated with tumorigenesis. In the context of applying DNA methylation analysis for the molecular characterization of cancer and other diseases, standardized protocols enabling parallel genome-wide methylation profiling of numerous samples are required. DNA methylation profiling is described using a CpG island microarray representing more than 50,000 CpG-rich DNA fragments. Fragments were selected to represent the vast majority of known 5¢-untranslated regions as well as the first exons of thousands of genes. Measurement probes were designed to represent these fragments were displayed on an Affymetrix custom array. A modified procedure for differential methylation hybridization (DMH) is described for methylation enrichment. Application of a novel signal normalization concept enables accurate and reproducible measurements using a single fluorescence channel. The use of defined calibrator material allows quantification of DNA methylation patterns by DMH in a massively parallel fashion. Key words: DNA methylation, DMH, Microarray, Normalization, Calibration
1. Introduction The elucidation of the complex interplay between human phenotypes on the one hand and DNA methylation and other genomic factors on the other requires accurate as well as highly parallel assays with the ability to unravel genomic information layer by layer (1–3). Robert Grützmann and Christian Pilarsky (eds.), Cancer Gene Profiling: Methods and Protocols, Methods in Molecular Biology, vol. 576, DOI 10.1007/978-1-59745-545-9_9, © Humana Press, a part of Springer Science + Business Media, LLC 2010
155
156
Fassbender et al.
Oligonucleotide and polynucleotide arrays, in general, (4) and high-density oligonucleotide microarrays, in particular, provide an excellent technical basis for such genomic assays (5, 6). Arraybased DNA methylation assays have already been established using a variety of techniques for the detection of differential methylation such as methylation-sensitive restriction enzymes (7–10), methyl-binding proteins (11, 12), or methylationspecific antibody arrays (13). Based on these methods, genomewide methylation profiling was shown to be generally feasible using either microarrays synthesized in situ (14) or microarrays carrying immobilized polymerase chain reaction (PCR) products generated from BAC clones (15) or CpG island libraries (16). Here, we present a quantitative, genome-wide, DNA methylationprofiling assay based on the concept of differential methylation hybridization (DMH) (7). The assay is constructed using a highdensity Affymetrix probe set array designed to cover 51,317 CpG-rich fragments, most of them in the promoters or transcribed regions of annotated genes. Figure 1 depicts the principle of the DMH technique that was introduced in 1999 (7) and successfully applied to genome-wide DNA methylation marker discovery (7, 8). According to this procedure, high molecular weight genomic DNA is digested using MseI. After ligation of a universal adapter, the fragment mixture is digested using the methylation-sensitive enzyme, BstuI (7) or a mixture of BstuI
Fig. 1. Principle of differential methylation hybridization (DMH). Step 1. DNA fragmentation using methylation-unspecific restriction enzymes results in millions of fragments that are either very short (s) and are removed in a purification step, too long (L) for a later PCR step, or of adequate size for DMH (M−, M+, C ). Step 2. Adapter linker ligation to all fragments. Step 3. Digestion using methylation-sensitive restriction enzymes and PCR amplification: Fragments containing unmethylated restriction sites (M−) are cleaved and therefore not represented in the amplificate; fragments containing only methylated sites (M+) remain uncleaved and are amplified, as are fragments containing no restriction sites at all (C), which serve as controls for normalizing the methylation signal contained in (M+) after fragmentation, labeling, and array hybridization.
Quantitative DNA Methylation Profiling on a High-Density Oligonucleotide
157
and HpaII (8). In the following step, undigested fragments are PCR amplified with a universal primer. The resulting amplificate is enzymatically fragmented, labeled, and subsequently hybridized to microarrays. The prerequisite for analysis of a region of interest by DMH is the existence of at least one BstUI or HpaII recognition site within the fragment. When using the originally described restriction enzyme combination (8), 73% of amplifiable fragments do not contribute methylation information, because they do not contain methylation-sensitive restriction sites. However, these fragments do affect the hybridization of methylation-variable fragments via cross hybridization. To address these limitations, both the fragmentation and the methylation-sensitive restrictions were evaluated in silico and optimized to increase the methylation information content and to reduce the genomic complexity of the amplificate. Excessive digestion of non-CpG island sequences is achieved in the fragmentation digest by applying additional four-base cutters recognizing AT-rich sequences. The resulting short fragments are removed in purification steps, whereas a large portion of GC-rich sequences is subject to methylation-sensitive digestion by one or several of the four restriction enzymes used to digest unmethylated, but not DNA strands. Array-to-array variability of fluorescence values requires a powerful normalization strategy for methylation-variable signals. The presented DMH application utilizes two techniques for normalization. First, probe sets targeting methylation-invariable fragments, i.e., amplifiable fragments devoid of methylation-sensitive restriction sites, are used to normalize against experimental variability affecting absolute fluorescence levels and, second, methylation calibrators allow generation of truly quantitative data. Methylation marker development can then be streamlined using this quantitative method by combining data generated in different studies.
2. Materials 2.1. Sample DNA Extraction
1. 40 U Proteinase K (Qiagen). 2. QiaAmp DNA Mini Kit (Qiagen). 3. RNAseA (Qiagen).
2.2. DNA Methylation Calibrators
1. 0% methylated DNA: GenomiPhi® DNA Amplification Kit. 2. 100% methylated DNA: Sss1-methyltransferase (NEB); S-adenosylmethionine (SAM) (NEB).
158
Fassbender et al.
2.3. Adapter
1. H24: 5¢-AGGCAACTGTGCTATCCGAGGGAT-3¢, 200 mM in water. 2. H12: 5¢-TAATCCCTCGGA-3¢, 200 mM in water.
2.4. DNA Fragmentation
1. Unmethylated Lambda DNA (NEB). 2. DNA from human PBL (Promega). 3. QiaQuick PCR Purification Kit (Qiagen).
2.5. Adapter Ligation
1. T4 DNA ligase (NEB). 2. 1 mM ATP: 0.275 g ATP disodium salt (Sigma Aldrich) dissolved in water (Fluka) and adjusted to a final volume of 50 ml. The pH is adjusted to 7.5 with NaOH. Aliquots of 50 ml are stored at −20°C and not reused. 3. MinElute PCR Purification Kit (Qiagen).
2.6. MethylationSensitive Restriction
1. BstuI, HpaII, HinP1I, and HpyCH4IV (all NEB).
2.7. PCR
1. DeepVent® (exo-) DNA Polymerase (NEB). 2. dNTPs, 100 mM each (Fermentas), mixed and diluted 1:10 with water. 3. Microcon YM-30 (Millipore).
2.8. Fragmentation and Labeling
1. Gene Chip® Mapping 10K Xba Assay Kit (Affymetrix).
2.9. Hybridization, Washing, Staining, and Scanning
1. Affymetrix custom array (see Note 18).
2. EB buffer from QiaQuick PCR Purification Kit (Qiagen).
2. 12× MES: 70.4 g 2-(N-morpholino) ethanesulfonic acid monohydrate and 193.3 g 2-(N-morpholino)-ethanesulfonic acid sodium salt dissolved in 1,000 ml water (Fluka). 3. DMSO (Sigma Aldrich), EDTA (Gibco), 50× Denhardt’s Solution (Eppendorf), 10% Tween 20 (Pierce), tetramethylammonium-chloride (TMA-Cl, Sigma Aldrich). 4. Herring sperm DNA (10 mg/ml; Promega), human Cot-1 (Roche Diagnostics). 5. Control Oligo B2, 3 nM (Affymetrix). 6. ImmunoPure streptavidin (Perbio Science), R-phycoerythrin streptavidin (Invitrogen), anti-streptavidin antibody (AXXORA). 7. Wash Buffer A: 300 ml of 20× SSPE (Roche Diagnostics), 1 ml of 10% Tween 20 and 699 ml water are mixed and filtered through a 0.2-mm sterile filter. Wash Buffer B: To 30 ml of 20 × SSPE, add 1 ml of 10% Tween 20 and 969 ml water. Filter the solution through a 0.2-mm sterile filter.
Quantitative DNA Methylation Profiling on a High-Density Oligonucleotide
159
3. Methods 3.1. Sample DNA Extraction
1. Approximately 20-mg fresh frozen tissue sample is lysed by incubation with tissue lysis buffer (TLB, Qiagen) in combination with 40 U Proteinase K overnight (16 h) (see Notes 1 and 2). 2. The genomic DNA is isolated using the QiaAmp DNA Mini Kit (Qiagen) according to the manufacturer’s protocol. 3. The elution of the DNA is performed with 60 ml of prewarmed Elution Buffer (EB, Qiagen) (50°C) (see Notes 3 and 4). 4. Finally, the concentration is quantified by UV (see Note 5).
3.2. DNA Methylation Calibrators
1. 0% Methylation: Universally unmethylated DNA is prepared by molecular displacement amplification (MDA) using the GenomiPhi® DNA Amplification kit according to the manufacturer’s instructions on 10 ng of isolated human genomic DNA from peripheral blood lymphocytes (Promega). 2. 100% Methylation: Isolated human genomic DNA from peripheral blood lymphocytes (Promega) is methylated using Sss1-methyltransferase (NEB) in the presence of S-adenosylmethionine (SAM) according to the manufacturer’s instructions. Ten micrograms of DNA is incubated with 40 U SssI methylase and 1.24 ml SAM in a final volume of 100 ml for 16 h at 37°C in a thermomixer.
3.3. Preparation of Adapters
1. Equal amounts of the two primers H24 and H12 are mixed. 2. The mixture is incubated at 95°C for 5 min and than slowly cooled down. 3. Aliquots are stored at −20°C and reused.
3.4. DNA Fragmentation
1. 500 ng to 1 mg (see Note 6) of human genomic DNA is treated with 5 U each of MseI (NEB), Csp6I (Fermentas), and BfaI (NEB) in 30 ml of 1× Y+/Tango buffer (Fermentas) at 37°C for 16 h (see Note 7 and Table 1). 2. Enzyme activity controls: unmethylated Lambda DNA (NEB) is treated with 5 U of MseI (NEB) and Csp6I (Fermentas), respectively, and DNA from human PBL (Promega) is treated with BfaI in the same 1 × Y+/Tango buffer (Fermentas) at 37°C for 16 h (see Note 8). 3. Negative controls: 1 mg DNA from human PBL (Promega) is used as a “no ligase” negative control and water as a “no DNA” negative control throughout the complete procedure. 4. Enzymes are heat inactivated for 20 min at 65°C.
160
Fassbender et al.
Table 1 Characteristics of original versus optimized DMH restriction protocols DMH Protocol (Huang et al.)
Optimized DMH protocol
Percent of all fragments
Percentage of informative fragmentsb
Percent of all fragments
Percentage of informative fragmentsb
Detectable fragments
44
16
31
29
Removed fragments
56
7
69
5
a
Fragmentation digest
MseI
MseI, Csp6I, BfaI
Methylation-specific digest
BstUI, HpaII
HpaII, HinP1I, HpyCH4IV, BstUI
Fragments 15% of the array surface. The dynamic range of each array is assessed by analyzing the separation of signal distributions of all background and all normalization probes. If Fishers linear discriminant is smaller than 1, the array is excluded from the analysis. 3. Methylation signal normalization: The median intensity per array is calculated over all methylation-invariable probes, i.e., all probes targeting fragments containing no methylation-sensitive restriction sites (see Fig. 2a and Note 19). This value is
Fig. 2. DMH data aggregation. Top: Calculation of normalized corrected signal Sn for each methylation-variable probe by interpolation of signal S between the median of the control signals C1 (log2) and the median of the noise control signals C0 (log2). Bottom: Calculation of fragment methylation scores M (1) from normalized signals Sn using normalized calibration signals Sn0 and Sn100 and (2) by median averaging over all calibrated probe signal representing the same fragment.
Quantitative DNA Methylation Profiling on a High-Density Oligonucleotide
163
subtracted from all log2-transformed signals of methylationvariable probes. 3.11. Methylation Calibration
1. Unmethylated and universally methylated DNA are processed as 0 and 100% calibrator samples. Raw data processing and normalization is performed as described above. 2. Using the normalized probe signals of the calibrators, a twopoint calibration is performed for each normalized methylationvariable probe of each sample of interest to extrapolate absolute methylation levels for each methylation-variable probe (see Fig. 2b and Note 21). 3. Median averaging is performed over all calibrated probe data per probe set to obtain a quantitative methylation value for each methylation-variable fragment (see Note 22).
4. Notes 1. The use of cell lines is possible. In this case, pelleted cells are resuspended in 200 ml PBS and lysed using the manufacturer’s cell lysis buffer (AL), 40 U Proteinase K, and 20 U RNAse A for 4 h at 37°C. 2. The use of carrier material should be avoided and separation from RNA is important. 3. Water is used, because buffers can inhibit enzyme reactions. 4. Elution of DNA in small volumes is necessary to obtain DNA of sufficient concentration (>40 ng/ml for the next step). 5. Care should be taken to obtain template DNA of high integrity. To check quality and concentration of DNA, a gel electrophoresis is recommended. 6. If possible, 1 mg of DNA should be used to ensure assay robustness. If the DNA amount is limited, the protocol can be used with at least 200 ng. 7. The fragmentation enzyme mixture is optimized to fulfill the following criteria: (1) cleave AT-rich sequences into small enough fragments to be excluded in the purification step and thereby reduce complexity, (2) create a large number of amplifiable CpG-rich fragments with high coverage of methylation-variable regions, (3) a large proportion of these fragments contain methylation-sensitive restriction sites. The fragment distributions resulting from restriction by a variety of individual four base cutters and mixtures thereof are analyzed in silico. The product resulting from restriction using a mixture of MseI, Csp6I, and BfaI is characterized in
164
Fassbender et al.
Fig. 3. Fragmentation characteristics of modified protocol versus original DMH protocols used by Huang et al. (7, 8). The optimized protocol generates short AT-rich fragments, which are removed in the subsequent purification. (a) Molecular size standard; (b) PBL DNA treated with Csp6I, BfaI, and MseI; (c) PBL DNA treated with MseI; (d) undigested PBL DNA.
Table 1 in comparison with MseI only, which was used in previous DMH methods (7, 8). As shown in Fig. 3, addition of Csp6I and BfaI to MseI provides a higher proportion of very short fragments that are removed during the purification steps. Because less than 5% of these fragments contain methylation-sensitive restriction sites, their removal results in a reduction of genomic complexity without significant loss of methylation information. 8. Genomic DNA is used to test BfaI, because lambda DNA does not contain sufficient cutting sites to see any difference to undigested DNA on an agarose gel. 9. Because DNA is digested into small fragments and eluted in water, prolonged storage should be avoided to prevent further degradation. Therefore, the respective process steps: DNA fragmentation, ligation, and methylation-sensitive restriction are done on subsequent days. 10. If the enzymes perform well, the controls show specific band patterns when lambda DNA is used to test MseI and Csp6I. The pattern can be obtained from REBASE (http://rebase. neb.com/rebase/). Enzymes are considered to perform well if the band with highest molecular weight is at approximately 2,000 bp. The digest of PBL (Promega) DNA with BfaI results in a smear with a size of approximately 100–2,000 bp. The digest needs to be repeated if this criterion is not fulfilled. The sample DNA should show a smear like PBL DNA. This smear can be very faint, especially if less than 1 mg starting material is used. 11. Enrichment of methylated fragments is achieved by methylationsensitive restriction. In addition to BstuI and HpaII, as used
Quantitative DNA Methylation Profiling on a High-Density Oligonucleotide
165
in the DMH protocol introduced by the Huang group in 2001 (8), HinP1I and HpyCH4IV are applied to increase the stringency of methylation enrichment. Thereby, the number of informative fragments is doubled and their portion within all amplifiable fragments has been increased to 29% from 16% (Table 1). Considering that one unmethylated restriction site per fragment causes cleavage and thereby prevents amplification, shorter fragment length in combination with increased fragment number will provide methylation information that is less prone to be affected by sporadic events like composite methylation in the direct neighborhood of comethylated sequences carrying the target information. Altogether, the methylation-sensitive amplifiable fragments generated by the selected enzyme combination cover 99.7% of all CpG-islands and 99.1% of all TSS. 12. 60°C is the recommended incubation temperature for BstuI, but the enzyme works at 37°C with decreased efficiency. 13. The product of the methylation-sensitive restriction shows degradation over time, i.e., the PCR yield decreases. Therefore, it is recommended that PCR is done within 2 weeks after methylation-sensitive restriction. 14. Twenty PCR cycles are performed to avoid amplification bias. 15. The yield of one 100 ml PCR is not sufficient to obtain 20 mg of PCR product in a maximum volume of 45 ml, which is necessary for the use of the Fragmentation and Labeling procedure of the Gene Chip® Mapping 10K Xba Assay kit (Affymetrix). Therefore six PCRs are performed, purified, and concentrated. 16. If many samples are processed in parallel, additional 100-ml PCRs need to be performed. 17. Extended preparation times for PCR mastermix should be avoided. 18. The Affymetrix custom array contains probe sets designed to match 51,317 fragments of the non-methylation-specific restriction library. The majority of the detected fragments contains one to ten methylation-sensitive restriction sites and covers 5–20 CpG sites. Whereas 60% of fragments are positioned within single annotated genes, 38% are outside the context of known genes. Approximately 2% can be associated with more than one gene. The fragments represented on the array overlap with 14,017 (out of 30,391) unique TSS and 10,522 (out of 22,395) unique CpG islands. The representation of CpG islands is strongly biased toward promoter regions and exons I of annotated genes and against repetitive elements.
166
Fassbender et al.
19. In addition to the methylation-variable fragments, the Affymetrix custom array contains 1,000 fragments devoid of methylation-sensitive restriction sites that are used for signal normalization. The hybridization background is represented by 4,821 nongenomic oligonucleotides. For detection of signal saturation effects, 1,034 probes specific for repeat fragments have been included (Table 2). 20. The use of any other Affymetrix-type microarray containing oligonucleotide probes is possible if probes are selected according to the following criteria: (1) At least six probes per probe set are recommended. (2) Probe sets covering
Table 2 CpG island microarray – oligonucleotide content Probe type
Single probes
Probe sets
Methylation
491,491
51,317
Normalization
9,834
1,000
Background
4,821
n.a.
Repeats
1,034
n.a.
n.a. not applicable
Fig. 4. Ability of individual methylation-variable probe sets to differentiate 0, 50, and 100% methylation using DNA methylation mixtures prepared in vitro. The DMH fragment data is normalized by subtraction of the mean and division through the standard deviation, and ranked by t statistics.
Quantitative DNA Methylation Profiling on a High-Density Oligonucleotide
167
fragments from 100 to 600 bp should be selected. (3) A minimum of one single methylation-sensitive restriction site is sufficient. (4) Fragments without any methylation-sensitive restriction sites are necessary for normalization. 21. To characterize the analytical performance of each methylation-variable probe set, unmethylated and methylated DNA as well as mixtures of both representing 50% methylation were processed as described. In Fig. 4 all methylation-variable probe sets are ordered according to their ability to differentiate between 0 and 100% methylation based on t statistics. Our data illustrate that 95% of probe sets are functional, i.e., are able to clearly differentiate different methylation states. 22. Example data: The described assay was used to analyze 24 colon cancer samples with 11 samples from normal colon. All samples were from commercial sources and were obtained under appropriate consent. In Fig. 5, the volcano plot of t statistics versus methylation difference is shown. Markers combining large methylation differences between normal and CRC that separate both groups with high significance are indicated in red. Forty-one fragments that show hypermethylation and nine fragments that show hypomethylation in colon cancer were identified (see Table 3). Many of
Fig. 5. Volcano plot for differential DNA methylation analysis between 24 CRC tissue samples and 11 normal controls. (Differences between methylation scores M are shown. The most significant discriminators with large methylation differences are shown as crosses).
168
Fassbender et al.
Table 3 Marker selection from differential DNA methylation analysis between 24 CRC tissue samples and 11 normal controls Methylation difference HUGO genesa
HUGO TSSb
6.4
−0.46
NPY
NPY
3
6.7
−0.41
HOXA3
4
7.1
−0.43
HOXA5
HOXA5
5
7.6
−0.46
AQP1
AQP1
6
6.3
−0.38
EFCBP1
EFCBP1
7
5.3
−0.47
8
5.8
−0.39
9
6.3
−0.33
10
4.8
−0.45
STK24
STK24
11
6.5
−0.38
HOXB13
HOXB13
12
5.4
−0.41
C20orf117
13
4.8
−0.40
ZHX3
14
4.5
−0.55
C20orf161
15
5.1
−0.41
PARD6B
16
4.7
−0.41
ZNF217
17
5.6
−0.41
DOK5
18
5.7
−0.41
19
4.9
−0.40
RAE1
20
6.4
−0.49
GNAS
21
5.9
−0.34
CYP1B1
CYP1B1
17145863, 15172987
22
5.2
−0.48
CCNA1
CCNA1
16524460, 16807314, 16449996
23
4.6
−0.43
ALG5
24
4.9
−0.51
PCDH8
PCDH8
25
5.1
−0.38
PCDH17
PCDH17
26
5.1
−0.37
EDNRB
No.
t Statistics
1
Hyper-methylated
2
PubMed reference IDc
15352125; 12819009; 12032849
ADCY8 GPR7
GPR7
17437806
DBC1
16846474, 15746151
16278676, 17145863, 16912168
C20orf161
DOK5 TFAP2C
14996719
16001328
15026333, 14688019, 12499435 (continued)
Quantitative DNA Methylation Profiling on a High-Density Oligonucleotide
169
Table 3 (continued) No.
t Statistics
Methylation difference HUGO genesa
27
4.6
−0.47
28
4.8
−0.42
29
4.8
−0.44
C20orf31
30
5.0
−0.40
GTPBP5
31
Hypo-methylated
32
−4.78
0.44
ACSL6, ACSL1
33
−5.04
0.38
KCNH1, KCNH5
HUGO TSSb
PubMed reference IDc
HCK
17344919
TPX2
Fragment overlaps with gene Fragment is within 2,500 bp of TSS c Examples for reports of hypermethylation in the respective gene a
b
Fig. 6. Comparison of quantitative MSP and DMH measurements for EDNRB (left) and CYP1B1 (right) for 24 colorectal cancer tissues and 11 normal colon control tissues. (Methylation scores M are given for MSP and DMH).
these genes, e.g., GPR7, DBC1, HOXB13, TFAP2C, GNAS, CYP1B1, CCNA1, EDNRB, and HCK, found to be hypermethylated in the colon tumor samples have been previously reported to be associated with methylation and cancer. Two genes, EDNRB and CYP1B1, were selected to be analyzed by quantitative methylation-specific PCR (MSP) as an independent method. Correlations of 76% and 85% between DMH and MSP methylation scores are observed (see Fig. 6).
170
Fassbender et al.
Acknowledgment We thank the German Ministry of Education and Research (BMBF) for financial support for part of this study by Förderprojekt “NGFN2: Systematisch-Methodische Platform Epigenetik” (01GR0492). References 1. Laird, P.W. (2003) The power and the promise of DNA methylation markers. Nat. Rev. Cancer 3, 253–266. 2. Lalande, M. (1996) Parental imprinting and human disease. Annu. Rev. Genet. 30, 173–195. 3. Fan, J.-B., Chee, M.S. and Gunderson, K.L. (2006) Highly parallel genomic assays. Nat. Rev. Genet. 7, 632–644. 4. Southern, E.M., Maskos, U. and Elder, J.K. (1992) Analyzing and comparing nucleic acid sequences by hybridisation to arrays of oligonucleotides: evaluation using experimental models. Genomics 13, 1008–1017. 5. Lockhart, D.J. et al. (1996) Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat. Biotechnol. 14, 1675–1680. 6. Syvanen, A.C. (2005) Towards genome-wide SNP genotyping. Nat. Genet. 37, S5–S10. 7. Huang, T.H., Perry, M.R. and Laux, D.E. (1999) Methylation profiling of CpG islands in human breast cancer cells. Hum. Mol. Genet. 8, 459–470. 8. Yan, P.S., Chen, C-M., Shi, H., Rahmatpanah, F., Wei, S.H., Caldwell, C.W. and Huang, T.H. (2001) Dissecting complex epigenetic alterations in breast cancer using CpG island microarrays. Cancer Res. 61, 8375–8380. 9. Hatada, I. et al. (2002) A microarray-based method for detecting methylated loci. J. Hum. Genet. 47, 448–451. 10. Schumacher, A. et al. (2006) Microarraybased DNA methylation profiling: technol-
ogy and applications. Nucleic Acids Res. 34, 528–542. 11. Rauch, T., Li, H., Wu, X. and Pfeifer, G.P. (2006) MIRA-assisted microarray analysis, a new technology for the determination of DNA methylation patterns, identifies frequent methylation of homeodomain-containing genes in lung cancer cells. Cancer Res. 66, 7939–7947. 12. Gebhard, C., Schwarzfischer, L., Pham, T.-H., Schilling, E., Klug, M., Andreesen, R. and Rehli, M. (2006) Genome-wide profiling of CpG methylation identifies novel targets of aberrant hypermethylation in myeloid leukemia. Cancer Res. 66, 6118–6128. 13. Weber, M., Davies, J.J., Wittig, D., Oakeley, E.J., Haase, M., Lam, W.L. and Schübeler, D. (2005) Chromosome-wide and promoterspecific analyses identify sites of differential DNA methylation in normal and transformed human cells. Nat. Genet. 37, 853–862. 14. Fodor, S.P. et al. (1991) Light-directed, spatially addressable parallel chemical synthesis. Science 251(4995), 767–773. 15. Ishkanian, A.S. et al. (2004) A tiling resolution microarray with complete coverage of the human genome. Nat. Genet. 36, 299–303. 16. Heisler, L.E. et al. (2005) CpG island microarray probe sequences derived from a physical library are representative of CpG islands annotated on the human genome. Nucleic Acids Res. 33, 2952–2961.
Chapter 10 Single-Nucleotide Polymorphism (SNP) Analysis to Associate Cancer Risk Julie Earl and William Greenhalf Summary Identification of hereditary factors that predispose to cancer allows targeted cancer screening and better quantification of environmental risk factors. The ability to identify which single nucleotide polymorphisms (SNPs) are associated with cancer or segregate with disease in families allows high-risk loci to be identified. In this chapter, two platforms for analysing SNPs are discussed, the Affymetrix and Illumina systems. Application of both platforms requires the same principles of good laboratory practice but there are important differences in materials and methods, which will be discussed. Key words: Familial cancer, Arrays, Association, Linkage
1. Introduction Linkage and association studies have been used to quantify cancer risk in the past with some success, for example, the loci of the Rb gene was identified in families with retinoblastoma (1), the APC loci was identified in familial adenomatous polyposis (2), various loci (each associated with mismatch repair genes) were identified in human non-polyposis colon cancer (3–5), and the STK11 locus was identified in Peutz-Jeghers syndrome (6, 7). All of these were high-risk autosomal dominant conditions; such inheritance is relatively rare in cancer, the majority of genetic predisposition results from complex interactions of multiple genes with each other and with environmental exposure (8, 9). Such weaker associations have also been investigated using microsatellites (10) but with very little success, largely because microsatellites change
Robert Grützmann and Christian Pilarsky (eds.), Cancer Gene Profiling: Methods and Protocols, Methods in Molecular Biology, vol. 576, DOI 10.1007/978-1-59745-545-9_10, © Humana Press, a part of Springer Science + Business Media, LLC 2010
171
172
Earl and Greenhalf
over the generations and so are not applicable to identifying ancient common founders (11). In any case, weaker associations will usually involve polymorphisms that are relatively common in the general population and only become significant risk factors when in combination. Weaker associations are also not amenable to analysis by linkage studies, because non-affected carriers and those not carrying high-risk alleles cannot be distinguished. This makes genome-wide scans for association impossible, but less empirical studies based on one or two targeted loci are feasible. To target the loci for association studies, identification of allelic loss in tumour samples can be used. The rationale is that high-risk alleles will be enriched during tumourigenesis by loss of the lowrisk partner, thus causing a loss of heterozygosity in the microsatellite locus. For low penetrance alleles, the selective pressure for loss of heterozygosity will be weak and so faith in this rationale requires optimism verging on irrationality. Single base changes occurring as the result of transition or transversion mutations are much more stable than microsatellites and so linkage of such changes to a disease related allele will be maintained over far greater numbers of generations; these changes may even be functionally related to the disease allele. Unfortunately, identification of such base changes is more complex than identifying size differences in microsatellite regions. Possible techniques include DNA sequencing (12), allele-specific polymerase chain reaction (PCR) (13, 14), and single-strand conformational polymorphism (SSCP) analysis (15), but despite improvements in technology (16, 17), at the time of writing, all of these remain suitable only for analysing small numbers of loci in relatively small numbers of individuals, making their use impractical for whole-genome scanning. The development of microarray technology in 1995 (18) revolutionised the area of cancer genomics, initially because it could be applied to expression analysis, revealing sequences that were up- or down-regulated during cancer development (19–22). Subsequently this technology was adapted, allowing competitive or comparative binding of cancer and non-cancer genomic DNA, revealing regions that were amplified or lost in malignant cells (23, 24) (comparative genomic hybridisation). As the technology improved, the accuracy with which hybridisation to different sequences could be distinguished reached a level at which single-nucleotide polymorphisms (SNPs) could be identified and levels compared. This offered the first practical method for genome-wide linkage (25) and association analysis (26–28) using SNPs. Not only are association studies using SNPs much more powerful, they are also more amenable to high throughput and, for whole-genome studies, require less starting material than microsatellite approaches. The lower requirement for input DNA and availability of high-density arrays means that SNP arrays have also become the technology of choice for empirical studies of
Single-Nucleotide Polymorphism (SNP) Analysis
173
loss of heterozygosity (LOH) in tumour samples (29). There are approximately 5–10 million SNPs in the human genome, occurring every 400–1,000 bp (30). It has been estimated that approximately 500,000 SNPs are required to genotype an individual of European ancestry (31). SNP genotyping involves a DNA amplification step and hybridisation of labelled DNA to an array containing immobilised oligonucleotide representations of SNPs. Since its earliest incarnations, array technology has offered alternative approaches, some of which have prospered and developed into routine laboratory workhorses, while others have flourished only to become unpopular or redundant. Both competitive and comparative hybridisations remain widely used for different applications. Competitive hybridisation requires nucleic acid from the different sources (e.g. cancer and normal tissue) to be differentially labelled (18). This allows a higher level of internal control and is the method of choice in many forms of expression analysis. Comparative hybridisation was the original technique (32) and involves separate hybridisations, this requires a very high level of reproducibility and relies on inclusion of more controls. Despite this, comparative approaches are the most popular in genomic SNP analysis; this owes a lot to the technical excellence of the Affymetrix platform, which has always relied on single labelling. Affymetrix has now been joined by another highprofile competitor, Illumina, and the field is now divided regarding which platform is the most effective and which is most appropriate for different applications. Both systems have similar requirement for sample preparation and good laboratory practice. 1.1. General Requirements 1.1.1. Minimum Information About a Microarray Experiment
Minimum information about a microarray experiment (MIAME) (33) establishes standards that allow raw data from array experiments from different groups to be compared. Most of these standards were designed with expression profiling in mind, but the general principles (if not the specific details) are equally valid for association or linkage analysis using SNP arrays. MIAME specify that the content of the microarray data contributed complies with the following elements: 1. The raw data for each hybridisation must be accessible (e.g. CEL file for Affymetrix arrays). 2. The final processed (normalised) data for the set of hybridisations in the study should be accessible and transparent (e.g. the matrix used to draw the conclusions from the study should have a standard format). 3. The essential sample annotation including experimental factors and their values should be available (e.g. was blood DNA used or were paraffin-embedded samples required). 4. The experimental design including sample data relationships should be given (e.g. which raw data file relates to which
174
Earl and Greenhalf
sample, which repeat hybridisations are quality controls and which are biological replicates). 5. Sufficient annotation of the array (e.g. genomic coordinates, probe oligonucleotide sequences, or reference commercial array catalog number). 6. The essential laboratory and data processing protocols should be transparent (e.g. the normalisation method used to obtain the final processed data should be included in reports). 1.1.2. Laboratory Environment
A room should be designated as a DNA-free pre-PCR clean area for storage of reagents required for DNA digestion, ligation, and PCR. Ideally a separate room will be allocated for the set up of pre-PCR stages that is separate to the room where reagents are stored. However, if this is not possible, then an area within a DNA-free room should be allocated for pre-PCR stage set-up. All reagents and equipment (e.g. laboratory coats, pipette tips and racks) used in the DNA-free room should not be taken into “DNA contaminated areas” that will have airborne DNA contamination, particularly laboratories where PCR amplification is performed. Work surfaces and equipment designated for a DNA-free area can be decontaminated using DNAZap, which completely degrades contaminating DNA and RNA to a level below PCR sensitivity (Ambion, AM9890) or Microsol (Anachem, MIC-201), a disinfectant that also acts on nucleic acids. It is good practice to decontaminate all work surfaces in this manner whether working in a DNA-free or DNA-contaminated area. Post-PCR stages can be performed in the main laboratory, this includes PCR (not reaction set-up), PCR clean up, fragmentation, labelling, hybridisation, washing, and staining. Reagents for these steps should be stored in this laboratory and not taken into the DNA-free area.
1.1.3. Sample Types
Successful application of SNP arrays for association or linkage studies requires good quality DNA, which means that, where possible, freshly obtained blood should be used. However, this is not always possible or convenient. Archived samples are often only available as formalin-fixed, paraffin-embedded samples, these can be applied to either platform, although the requirement for whole-genome amplification means that the Illumina GoldenGate system is preferable. DNA quality is very variable depending on the type of tissue, the age of the sample, and how the tissue was fixed. Therefore, PCR-based quality control is essential before using the DNA on the array. Even so, a compromise in quality will almost certainly be necessary, this may be acceptable on higher density arrays, but will be more problematic with arrays that have only one SNP in a region being genotyped. To increase DNA quantity when using the Affymetrix platform, multiple PCRs can
Single-Nucleotide Polymorphism (SNP) Analysis
175
be used preceding fragmentation, this will also have the benefit of avoiding Jackpot effects; single-allele amplification occurring because the second allele is missed in the first round of amplification. A similar approach could equally be applied to the Illumina system, but because this would include use of the linkage panels multiple times, it might prove prohibitively expensive. 1.1.4. Equipment
Both platforms require PCR amplification; it is a common misconception that given a good PCR machine, successful amplification will depend purely on the reagents and the protocol. In reality, subtle differences in machines and even tubes will make significant differences to effectiveness. Affymetrix recommends the 2720 thermal cycler or GeneAmp PCR system 9700 (Applied Biosystems) and MJ Tetrad PTC-225 (BioRad). The Affymetrix protocol is optimised thoroughly and has rigorous guidelines regarding the source of reagents, equipment, and plasticware. Illumina does not specify in such detail the reagents and plasticware to be used, mainly because all of the necessary reagents are supplied with the genotyping assay kit. In accordance with MIAME principles, the PCR machine used should be given in any report and groups should ensure that the machine used is commonly available to other investigators.
1.1.5. Sample Preparation
DNA quality is critical regardless of the platform used. The most commonly used method for DNA quantification is absorbance at 260 nm, but this is sensitive to single-stranded DNA (ssDNA), RNA, protein, and reagent contamination from DNA preparation methods. For pure double-stranded DNA (dsDNA), an absorbance value of 1 at 260 nm corresponds to a DNA concentration of 50 mg/ml. DNA can be run through a low-strength agarose gel (1%) to ensure that it is intact. This method of quality control is recommended by Affymetrix, who specify that the molecular weight of genomic DNA should be >20 kb. Illumina specifies that DNA fragment sizes should be at least 2 kb, although the GoldenGate assay can be used with fragments as small as 200 bp and is able to tolerate degraded sample better than the Infinium assay; both of which are provided by Illumina. Furthermore, DNA should be free of impurities with A260/280 ratios between 1.7 and 1.9. Inhibitors in genomic DNA preparations can be removed by ethanol precipitation using the protocol in Subheading 3.1.1. Illumina advises against using UV spectrometry, even the highly respected NanoDrop technology to quantify DNA; because contaminating ssDNA, oligonucleotides, RNA, and proteins can interfere with the readings and thus give inaccurate results. Therefore, Illumina recommends that DNA be quantified using the Quant-iTTM PicoGreen® assay, a fluorescent-based nucleic acid stain for quantification of dsDNA. The assay has been optimised
176
Earl and Greenhalf
so that fluorescence originating from contaminating RNA or ssDNA is minimal. The system is capable of detecting dsDNA at a concentration of 25 pg/ml. The protocol for DNA quantification using the Quant-iTTM PicoGreen® assay is described in Subheading 3.1.2. DNA extraction methods that yield ssDNA are not appropriate, because dsDNA is required for restriction digestion. DNA isolation methods recommended by Affymetrix include QIAamp® DNA Blood Maxi kit (QIAGEN). An alternative is SDS/ ProteinaseK digestion with phenol chloroform extraction, followed by ultracentrifugation/concentration with Microcon® or Centricon® filters (Millipore). It is vital to avoid contamination with DNA from other sources because this will result in a high marker detection rate but the call rate will fall because there will be mixed alleles present. The most likely way for cross-contamination to occur would be via contact of the preparation with airborne PCR-amplified DNA within the laboratory. Several safe guards are employed to avoid cross-contamination of DNAs, such as the allocation of a DNAfree room for storage and preparation of pre-PCR reagents, as described above. It is essential to use dedicated equipment, lab coats, etc. for DNA-free and main laboratory areas, with restricted movement between these areas. Because DNA quality is the critical component of any SNP-based genotyping assay, both Illumina and Affymetrix recommend that a small number of DNA samples be tested initially before performing a large-scale genotyping assay. 1.1.6. Whole-Genome Amplification of DNA in Genotyping Assays
Whole-genome amplified (WGA) DNA prepared using multiple displacement amplification (MDA)-based methods (REPLI-g®, Qiagen, and Fpolymerase) or by amplification using random primers (OmniPlex® assay, Rubicon Genomics) have been used. These methods yielded concordance rates of >98.8% and genotype call rates of >99.8% using the Illumina GoldenGate platform (34, 35). WGA with reasonable quantities of good quality DNAs is therefore effective, but, in practice, WGA will usually be considered when DNA is poor quality and minimum quantity, in which case, the genotyping data obtained will be severely prejudiced (36). Therefore, Illumina recommends that the starting DNA be intact at a minimum of 50 ng/ml, quantified using the PicoGreen® assay. Some degree of DNA degradation can be tolerated but it is recommended that a minimum of 100–200 ng of partially degraded DNA should be used. A separate WGA step (in addition to amplification integral to the assay) is not recommended for use with Illumina’s Infinium platform nor with the Affymetrix genotyping assays.
Single-Nucleotide Polymorphism (SNP) Analysis
177
2. Materials 2.1. General Materials
1. Absolute ethanol.
2.1.1. Clean-Up of Genomic DNA
2. 7 M sodium acetate. 3. Glycogen. 4. Reduced TE (10 mM Tris–HCl, pH 8.0, 0.1 mM EDTA, pH 8.0).
2.1.2. DNA Quantification Using the Quant-iT™ PicoGreen®Assay
1. Calf thymus DNA (Sigma, D4654) or bacteriophage lambda DNA (Sigma, D3654). 2. Quant-iT™ PicoGreen® dsDNA assay kit (Invitrogen, P7589). 3. Tris–EDTA buffer (TE): 10 mM Tris–HCl, 1 mM EDTA, pH 7.5. 4. Spectrofluorometer or fluorescence microplate reader.
2.2. Affymetrix Materials
1. GeneChip human mapping 500K assay kit (Affymetrix). 2. MJ Tetrad PTC-225 (BioRad). 3. Reduced TE buffer (10 mM Tris–HCl, pH 8.0, 0.1 mM EDTA, pH 8.0). 4. Molecular biology-grade water. 5. Appropriate restriction enzyme and reaction buffer (NspI or StyI). 6. Bovine serum albumin (10 mg/ml). 7. T4 DNA ligase (400 U/ml). 8. TITANIUM Taq DNA polymerase (50×) (Clontech, 639209). 9. GC melt (5 M) (Clontech, 639238) (see Note 1). 10. dNTPs (2.5 mM each). 11. 0.5 M EDTA pH 8.0. 12. 2× gel-loading buffer. 13. PCR clean-up plate (Clontech, 636974). 14. 4% TBE agarose gel (Cambrex, 54929) (see Note 2). 15. 2-Morpholinoethanesulfonic acid (MES) (12×, 1.22 M). 16. Dimethyl sulfoxide (DMSO). 17. Denhardt’s solution (50×). 18. Herring sperm DNA (10 mg/ml). 19. Human Cot-1 DNA 1 mg/ml (Invitrogen, 15279-011). 20. Tween-20 (10%). 21. Tetramethylammonium chloride (TMACL) (5 M).
178
Earl and Greenhalf
22. PCR strip tubes (BioRad, TBS-0201). 23. PCR tube strip caps (BioRad, TCS-0801). 24. All-purpose HiLo DNA marker (Bionexus, BN2050). 2.3. Infinium Array, Illumina
1. Infinium II whole-genome genotyping kit, HumanHap 240S, 300, 550, or 650Y. 2. Sentrix universal 96-array matrix for 96 samples (Illumina, FA-12-202) or Sentrix universal 16-beadchips 384-plex set of 6 (Illumina, GT-95-212). 3. Stand-alone BeadArray reader (Illumina, SC-16-300/301). 4. TE buffer: 10 mM Tris-HCl, pH 7.5; 1 mM EDTA. 5. 0.1 N (0.1 M) NaOH (i.e. 4 g/l). 6. 96-well storage plate (ABgene, AB-0859). 7. Cap-Mat (ABgene, AB-0566). 8. Microplate shaker (VWR, 444-7016). 9. Isopropanol. 10. 100% formamide. 11. 95% formamide:10 mM EDTA. 12. Heat-sealed foil cover (ABgene, AB-0559). 13. Aluminium block (Illumina, 21119). 14. Te-Flow rack (Tecan, 760–800).
2.4. GoldenGate Array, Illumina
1. Single-use DNA activation kit with enough reagents for six 96-well plates, i.e. 576 samples (Illumina, GT-95-201). 2. GoldenGate assay kit for 96/576 samples (Illumina GT-95203 [96] and GT-95-204 [576]). 3. Sentrix universal 96-array matrix for 96 samples (Illumina, FA-12-202) or Sentrix universal 16-beadchips 384-plex set of 6 (Illumina, GT-95-212). 4. 2-isopropanol. 5. Thermal cycler. 6. 0.1 N (0.1 M) NaOH (i.e. 4 g/l).
3. Methods 3.1. General Methods 3.1.1. Clean-Up of Genomic DNA
1. Add 2.5 volumes of absolute ethanol (stored at −20°C) and 0.5 volumes of 7 M sodium acetate and 10 mg of glycogen (a co-precipitant to ensure that you do not lose your pellet) per 1 mg genomic DNA.
Single-Nucleotide Polymorphism (SNP) Analysis
179
2. Vortex and incubate at −20°C for 1 h and centrifuge at 12,000 × g for 20 min at room temperature. 3. Wash the pellet with 0.5 ml of 80% ethanol and centrifuge at 12,000 × g for 5 min, repeat this step once. 4. Air-dry the pellet and resuspend it in reduced TE buffer. 3.1.2. DNA Quantification Using the Quant-iT™ PicoGreen®Assay
1. Prepare standards (1 ml or more) using either bacteriophage lambda DNA or calf thymus DNA, by dilution to concentrations of 1 mg/ml, and 100, 10, and 1 ng/ml in TE buffer. 2. Add 1 ml of Quant-iT™ PicoGreen® reagent to 1 ml of the diluted samples and a TE blank; incubate at room temperature for 2–5 min, either in the dark or in a foil-wrapped container to avoid exposure to light. 3. Measure the fluorescence intensity of the sample at 520 nm with excitation at 485 nm using either a spectrofluorometer or fluorescence microplate reader. 4. Subtract the reading at 520 nm of the blank (TE alone) from the DNA dilution standards and plot a curve of nucleic acid concentration against fluorescence intensity at 520 nm. 5. Dilute genomic DNA in TE buffer to a final volume of 1 ml and add 1 ml of Quant-iT™ PicoGreen® reagent and incubate at room temperature for 2–5 min protected from light (as previously). 6. Measure the fluorescence intensity of the sample and TE buffer alone at 520 nm. Subtract the value obtained for TE buffer and determine the DNA concentration from the standard curve.
3.2. Affymetrix SNP Arrays
The GeneChip® Human Mapping Array Sets produced by Affymetrix are high-density arrays that represent thousands of SNPs. The density of Affymetrix arrays has increased from 10,000 SNPs per chip (10K chips) to 100K, 500K, and the recently introduced Genome-Wide Human SNP array 6.0, which has 906,600 SNPs and a further 40,000 non-polymorphic probes. Originally, the 500K array set was comprised of two arrays (approximately 262,000 SNPs in the NspI array and 238,000 for the StyI array). The 500K set was subsequently amalgamated onto a single chip marketed as the Genome-Wide Human SNP Array 5.0. The restriction enzymes were chosen to maximise the number of PCR-amplifiable (i.e. 200–1,100 bp) fragments containing informative SNPs. The procedure essentially involves digestion of genomic DNA to fragment sizes of 200–1,100 bp and the ligation of adaptors to the digested products. The adaptors act as PCR primer sites to allow amplification and enrichment of these fragments. The PCR products are purified and fragmented using DNAseI to less than
180
Earl and Greenhalf
Fig. 1. SNP genotyping using the Affymetrix 500K protocol. (a) PCR of digestion/ligation product. (b) Fragmentation of PCR product run through a 4% agarose gel (37).
200 bp and a biotin label is added before hybridisation on the array (Fig. 1). There are several quality control steps throughout the procedure that allow the assessment of the efficiency of each stage and subsequent optimisation. Stage 1: Genomic DNA preparation This stage is performed in the designated DNA-free room or pre-PCR room (see Note 3). The minimum amount of starting DNA required for the Affymetrix chips is 250 ng in a volume of 5 ml in reduced TE. Stage 2: Restriction digestion of genomic DNA This stage should be performed in the pre-PCR room or designated DNA-free room (see Note 3). Genomic DNA is digested with the adaptor-specific restriction enzyme, either NspI or StyI. Ideally, a master mix would be prepared in a DNA-free room and genomic DNA added in the pre-PCR room (see Note 3). 1. For one reaction: The following can be mixed in a single tube: 11.6 ml of molecular biology-grade water, 2 ml of 10× appropriate digestion buffer, 0.2 ml BSA (10 mg/ml), 1 ml enzyme (10 U/ml), and, finally, 5 ml genomic DNA (50 ng/ml). 2. Alternatively: It would be more typical to carry out the process on multiple samples, in which case, a master mix should be prepared with 5% excess to allow for pipetting errors. For example, for eight samples, mix 97 ml of molecular biology-grade water, 17 ml of 10× digestion buffer, 1.7 ml BSA, and 8.5 ml enzyme. Add 14.75 ml of this master mix to 5 ml of genomic DNA. 3. Briefly centrifuge and incubate in a pre-heated PCR machine with a heated lid at 37°C for 2 h. Incubate at 65°C for 20 min and hold at 4°C. Samples should be stored at −20°C if not proceeding to the next stage immediately. Stage 3: Ligation Digested DNA is ligated to the appropriate adaptor in the pre-PCR room or designated DNA-free room (see Note 3). Reactions are prepared on ice as follows:
Single-Nucleotide Polymorphism (SNP) Analysis
181
1. Prepare master mixes ensuring a 5% excess. The following volumes are given for one reaction with the recommended volume followed by the required volume in parentheses: Mix 0.8 ml (0.75 ml) adaptor, 2.7 ml (2.5 ml) T4 DNA ligase buffer, and 2 ml T4 DNA ligase (400 U/ml) (see Note 4). 2. Add 5.25 ml to digestion reaction, briefly centrifuge, and incubate in a pre-cooled PCR machine at 16°C for 3 h, then heat to 70°C for 20 min and hold at 4°C until proceeding to the next step. As in the previous stage, samples should be stored at −20°C if not proceeding to the next stage immediately. Samples should be centrifuged briefly before proceeding. Stage 4: PCR The ligated product is used as a template for PCR amplification using primers that bind within the adaptor region. PCRs are performed in triplicate to achieve sufficient DNA quantity for the subsequent stages. 1. Add 75 ml of molecular biology-grade water to the ligated product from stage 3 to make a total volume of 100 ml. 2. Prepare the master mix on ice in the DNA-free room with a 5% excess. The following volumes are given for one sample (three reactions), with the recommended volume followed by the required volume in parentheses: 125 ml (118.5 ml) molecular biology-grade water, 31.5 ml (30 ml) Titanium Taq PCR buffer, 63 ml (60 ml) GC melt (5 M), 44 ml (42 ml) dNTPs (2.5 mM each), 14 ml (13.5 ml) PCR primer 002 (100 mM), and 6 ml TITANIUM Taq DNA polymerase (50×) (see Note 4). 3. Add 90 ml of PCR master mix to 10 ml of diluted ligation product in a dome-capped PCR tube and briefly centrifuge (see Note 5). 4. PCR is performed in the main laboratory and cycling proceeds as follows: 94°C for 3 min followed by 30 cycles of: 94°C for 30 s, 60°C for 45 s, and 68°C for 15 s. The reaction is completed with an additional elongation step at 68°C for 7 min and held at 4°C or at −20°C if not proceeding to the next stage immediately. 5. Briefly centrifuge the samples and add 3 ml of PCR product to 3 ml of 2× gel-loading buffer and run through a 2% agarose gel at 100 V for 1 h to confirm that the products are in the correct size range of 200–1,100 bp. A typical PCR result is shown in Fig. 1(37). Stage 5: PCR purification and quantification PCR products are pooled, purified, and concentrated into a volume of 45 µl.
182
Earl and Greenhalf
1. Add 8 ml of 0.1 M EDTA, pH 8.0, to each PCR prior to purification. 2. Pool PCR products into one well of the PCR clean-up plate and apply a vacuum until the well is dry (it will appear glossy); then add 50 ml of molecular biology-grade water and allow the membrane in the well to dry, repeat this step twice. Allow the membrane to dry completely at the end of the last wash and then add 45 µl of RB buffer (supplied with the clean-up plate). 3. Secure the plate to a flat-top vortex and set at the lowest speed and leave plate for 10 min to allow DNA immobilised on the membrane to be resuspended in the RB buffer (see Note 6). 4. Carefully remove the RB buffer into a clean 0.2-ml microcentrifuge tube and take 2 ml of purified PCR product and add to 198 ml of molecular biology-grade water. Read the absorbance at 260 and 280 nm using a spectrophotometer. Calculate the concentration of DNA assuming one absorbance unit at 260 nm equals 50 mg/ml DNA, multiply this value by 100 to allow for the dilution factor. If there is an insufficient quantity of DNA after purification to proceed to the fragmentation step, then additional PCRs can be performed on the digestion/ligation product to increase DNA quantity. There should be sufficient ligated DNA template to perform at least nine PCRs. Samples should be stored at −20°C if not proceeding to the next stage immediately. Stage 6: Fragmentation This stage should be performed in the main laboratory. It relies on a fragmentation reagent that is supplied as either 3 or 2 U/ml; a note must be made regarding which version is used. The protocol below is based on the 3 U/ml kit. 1. Transfer 90 mg of purified DNA into a sterile 0.2-ml PCR tube and make up to 45 ml using RB buffer, add 5 ml of 10× fragmentation buffer to each DNA sample. 2. Prepare the fragmentation mix so that it is at 0.05 U/ml as follows: For five reactions (the minimum number of reactions using the 3 U/ml reagent), mix 26.5 ml molecular biologygrade water, 3 ml of 10× fragmentation buffer, and 0.5 ml fragmentation reagent (3 U/ml) (see Note 7). 3. Add 5 ml of fragmentation mix to each 90 mg of purified DNA, briefly centrifuge, and place in a PCR machine preheated to 37°C for 35 min, followed by incubation at 95°C for 15 min to inactivate the enzyme. 4. Add 4 ml of fragmentation reaction to 4 ml of 2× gel-loading buffer and run through a 4% agarose gel alongside the all-
Single-Nucleotide Polymorphism (SNP) Analysis
183
purpose HiLo DNA marker at 100 V for 1 h to confirm that fragment sizes are less than 200 bp. A typical fragmentation reaction result is shown in Fig. 1(37). Stage 7: Labelling This stage is performed in the main laboratory. 1. For one reaction, the following volumes of reagents are added to 50.5 ml (see Note 8) of the fragmentation reaction in a new domed-cap 0.2-ml PCR tube: 14 ml of 5× TdT buffer, 2 ml GeneChip® DNA labelling reagent (30 mM), and 3.5 ml TdT (30 U/ml). If multiple samples are being used, a labelling master mix can be prepared, for example, for eight samples, the following would be mixed together (recommended volumes are followed by required volumes in parentheses): 117.6 ml (112 ml) of 5× TdT buffer, 16.8 ml (16 ml) GeneChip® DNA labelling reagent (30 mM), and 29.4 ml (28 ml) TdT (30 U/ml); combine 19.5 ml of labelling master mix with 50.5 ml of the fragmentation reaction. 2. Briefly centrifuge and incubate at 37°C for 4 h followed by a denaturation step at 95°C for 15 min. Samples should be stored at −20°C if not proceeding to the next stage immediately. Stage 8: Hybridisation Prior to hybridisation on the array, labelled DNA must be suspended in a hybridisation cocktail as follows. For one reaction, use: 12 ml MES (12×, 1.22 M), 13 ml DMSO (100%), 13 ml Denhardt’s solution (50×), 3 ml EDTA (0.5 M), 3 ml herring sperm DNA (10 mg/ml), 2 ml OCR, 0100 (supplied in the Affymetrix kit), 3 ml Human Cot-1 DNA (1 mg/ml), 1 ml Tween-20 (3%), and 140 ml TMACL (5 M). Add 190 ml to the 70 ml of each labelled DNA sample. Mix well and heat to 99°C in a heated block for 10 min to denature. Cool on ice for 10 s. Briefly centrifuge samples to collect condensate and incubate at 49°C for 1 min. Inject 200 ml of denatured hybridisation cocktail into the array and hybridise for 16–18 h at 60 rpm. Washing, staining, and scanning of the array is an automated process operated using the Genechip® operating software GCOS that produces files for data analysis. Stage 9: Data analysis Data will be in the form of CHP, CEL, or DAT files and can be imported into and analysed using the GTYPE software. Genotype calls are made by using either the Dynamic Model (DM) or the Bayesian Robust Linear Model with Mahalanobis (BRLMM). The dynamic model is used to call SNP genotypes on single samples. The BRLMM algorithm is a clustering method that requires multiple samples (a minimum of 50) and achieves call rates of a greater accuracy for both homozygous and heterozygous alleles than the dynamic model.
184
Earl and Greenhalf
3.2.1. Quality Control of SNP Calling
The Modified Partitioning Around Medoids (MPAM) calling algorithm is used to assess for sample contamination using a subset of SNPs on the StyI and NspI array. Mapping detection rate (MDR) and mapping call rate (MCR) are quality control (QC) measures that indicate whether a sample is contaminated. MCR is calculated as the number of SNPs called with the MPAM algorithm/total number of SNPs checked for QC purposes. All chips have control sequences showing either a perfect match or a 1-bp mismatch with a reference sequence. The SNP detection filter compares background-subtracted intensity of a perfect match (PM) probe to background-subtracted intensity of a mismatch probe (MP). MDR is defined as the number of SNPs passing the MPAM discrimination filter/total number of SNPs checked with the MPAM algorithm for QC purposes. In a pure sample, possible allele calls (AB) for a given SNP are AA, AB, and BB at ratios 100:0, 50:50, and 0:100, respectively. A breach in these ratios indicates sample contamination and thus the call rate will fall, although the detection rate will not be affected. The expected MCR value is >0.93 and >0.99 for MDR. If a sample is contaminated, a high MDR will be achieved but the MCR will decrease because the expected allele ratio will be compromised. Therefore a reduction in MCR less than 0.93 with a MDR >0.99 indicates sample contamination; this does not apply if the MDR is less than 0.99.
3.2.2. Quality Control of GeneChip® Human Mapping 500K Assay
Prior to running actual samples, it is essential to prepare and run the control DNA supplied in the kit on an array and assess the quality of genotype calls. The control DNA supplied is of a guaranteed quantity and quality and, provided DNA labelling and hybridisation procedures are optimal, it represents the maximum efficiency of genotyping in a particular operator’s hands. There are several quality control steps throughout the protocol. Troubleshooting procedures are outlined in Table 1.
3.3. Illumina Arrays
Illumina offer two types of genotyping assay, GoldenGate and the Infinium. Both of these use beadchip technology but require different methods of DNA enrichment. Illumina arrays have the advantage that the protocol from template preparation to sample hybridisation and generation of genotyping data only takes 3–4 days as opposed to 4–5 days with Affymetrix (Table 2). Illumina offers custom-designed arrays where the user can submit a list of gene accession numbers and Illumina generates a list of SNPs within a defined region of these genes, at the time of writing, this is done using genome build 36. Furthermore, Illumina offers a cancer SNP panel that contains more than 1,400 SNPs from more than 400 genes reported to be involved in cancer.
Single-Nucleotide Polymorphism (SNP) Analysis
185
Table 1 Troubleshooting the Affymetrix 500K SNP genotyping protocol Problem
Likely cause
PCR products are not in the correct size range
– Starting template DNA may be fragmented, run 1–2 ml on an agarose gel – Quantification may be inaccurate, calibrate the UV spectrophotometer – Use the specified PCR machine, tubes, and reagents for reaction
Insufficient quantity of PCR product after PCR purification
– UV spectrophotometer may not be calibrated – Increase the number of PCR replicates to more than three – Enzyme inhibitors in template sample, purify by ethanol precipitation
Fragments are too small after fragmentation reaction
– Too much DNAse used
MDR 50% of tumoral cells in blood or marrow samples (119). Mantle cell lymphoma (MCL) is also characterized by a set of genomic aberrations that target genes involved in the pathogenesis of the disease. Examples include the genomic amplification of chromosomes 8q24 affecting c-MYC, 10p13 involving BMI1 oncogene and 11q13 targeting CCND1/cyclin D1, and the losses of 8p21.3 including TRAIL-R1/R2 genes, 9p21 (INK4A/ARF), 11q23 (ATM), and 17p13.1 (P53) (83, 120, 121). The pattern of these alterations has been correlated with tumor phenotypes and, thus, blastoid variants of MCL usually display inactivation of P16/INK4A and P53 genes, whereas indolent forms of MCL, usually having mutated IgVH genes, frequently show deletion of chromosome 8p (83, 120, 121). Although most patients with MCL show poor clinical outcome with current immunochemotherapy regimens, the long-term survivors can be identified by a characteristic genomic profile defined by the absence of deletions of P53, P16/ARF, and chromosome 9q21-q22, and by the presence of the deletion of chromosome 1p21-p22 (83, 122). In both B-CLL and MCL, the development of disease-specific CGH microchips may be of value in the clinic, because they should allow testing of the genomic profiles as prognostic and predictive factors of response to novel therapies. In a recent report, high-density SNP–CGH arrays were used to analyze genome-wide changes of copy number and allele status in B-CLL samples from patients who were sensitive or resistant to MDM2 inhibitors. These studies conclusively demonstrate that P53 status is the major determinant of response to MDM2 inhibitors in B-CLL (123). In a study of 107 FCL diagnostic biopsies with an array–CGH platform containing more than 26,819 BAC clones covering >95% of the human genome, 68 regional alterations were identified in >10% of cases. Importantly, 11 of these areas were independent predictors of overall survival using a multivariate analysis that included the International Prognostic Index (IPI) score. Further, two of the 11 regions (deletions of 1p36 and 6q21-q24) were also predictors of transformation risk (Cheung et al., in press). These genetic data may be useful to identify FCL high-risk patients as candidates for risk-adapted therapies. The acquisition of UPD is a common event in cancer. Genome-wide SNP analysis has revealed large-scale cryptic regions of UPD in many hematologic tumors. In AML, these alterations are
Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies
241
nonrandom and contain homozygous mutations in genes known to be mutational targets in leukemia (WT1, FLT3, CEBPA, and RUNX1) (124, 125). A high proportion of patients with myeloproliferative disorders, including polycythemia vera, essential thrombocythemia, and chronic idiopathic myelofibrosis, carry a dominant gain-of-function mutation of JAK2(126–128). Using SNP–CGH, the UPD of chromosome 9p typical of these entities has provided the molecular mechanism of homozygous mutation of JAK2 in these entities (129). These data imply that mutation of one allele precedes mitotic recombination, which acts as a “second hit” responsible for removal of the remaining wildtype allele, which is substituted with a copy of the mutated allele. Additional examples are the identification of UPD surrounding the NF1 gene locus in cases of juvenile myelomonocytic leukemia associated with neurofibromatosis (130). In lymphoma, the mutation status of genes within areas of UPD is less established, although biallelic mutations of P53 and P16/ARF have been reported in cases with UPD of 17p and 9p, respectively. SNP-CGH arrays showed that in mantle cell lymphoma and in FCL most areas of UPD were coincident with known regions of chromosome deletion (104, 105, 131). However, UPD was also observed in chromosome 6p in 20–30% of initial biopsies from patients with FCL, an area not usually targeted by DNA copy number changes. To date, the gene or genes involved in this area have not been detected (104, 105). A different application of high-density SNP–CGH arrays has been the genome-wide linkage search of 206 families with B-CLL. These studies identified potential susceptibility loci on chromosomes 2q21.2, 6p22.1, and 18q21.1. Notably, none of the regions coincided with areas of common chromosomal abnormalities frequently observed for B-CLL (132). These findings strengthen the argument for an inherited predisposition to B-CLL that might explain familial aggregation, and they support similar microarray studies in other familial cancers with unknown causing genes.
4. Integrative Oncogenomics as a Tool to Discover Novel Cancer Genes
Initial comparative genomic studies evaluated the degree to which DNA copy number alterations contributes to variations in the transcriptional program of tumors (133). Using cDNA microarrays, Pollack and colleagues found that 62% of highly amplified genes in breast tumors showed moderately or highly elevated expression. However, the influence of low-level DNA copy number changes was much more limited and only 12% of all the variation in gene expression among the breast tumors was directly attributable to underlying genomic dosage (134). Again using beast cancer as a
242
Martínez-Climent et al.
model disease, Hyman and colleagues reported that both highand low-level copy number changes had a substantial impact on gene expression, with 44% of the highly amplified genes showing overexpression and 10.5% of the highly overexpressed genes being amplified (135). A third study focused on the process of transformation of FCL to DLBCL, which is observed in more than one third of patients with FCL and is generally characterized by an aggressive clinical course and refractoriness to treatment. Parallel array–CGH and gene expression analyses revealed that FCL transformation was accompanied with a variable spectrum of recurrent genomic imbalances and gene expression changes. Among the approximately 600 genes that presented deregulated expression in the transformation phase, up to one third showed correlation with DNA copy number variation (136). Overall, these reports concluded that a fraction of transcriptomic modifications are a consequence of genomic changes in tumors. Since these studies, more sophisticated bioinformatics methods were developed for determining whether altered patterns of gene expression correlate with chromosomal abnormalities. One of these software is Chromosomal Aberration Region Miner (ChARM), a robust and accurate expectation-maximization-based method for identification of segmental aneuploidies from gene expression and array–CGH microarray data, sensitive enough to detect statistically significant and biologically relevant subtle changes in mixed populations of cells (137). Likewise, DIGMAP is a powerful computational tool enabling the coupled analysis of microarray data with genome location (138). More complex devices include the VAMP (Visualization and Analysis of array– CGH, transcriptome, and other Molecular Profiles) software, developed as a graphical user interface for visualization of CGH arrays, transcriptome arrays, SNP–CGH arrays, LOH results, and chromatin immunoprecipitation arrays. The interface offers the possibility of looking for recurrent regions of alterations, confrontation to transcriptome data or clinical information, and clustering (139). ARACNE is a different algorithm designed to scale up to the complexity of cellular regulatory networks present in microarray profiles, based on a theoretic approach that eliminates indirect interactions inferred by coexpression methods. For instance, authors demonstrated and validated a complex interactive network among the transcriptional targets of the c-MYC oncogene in B-cell lymphomas (140). One of the major advances of integrative oncogenomic approaches has been the identification of novel cancer genes. In one landmark report, Garraway and colleagues identified microphthalmia-associated transcription factor (MITF) as the target gene of a melanoma amplification by integrating SNP–CGH array maps with gene expression signatures derived from the NCI60 cell lines. Further investigation demonstrated that MITF represents a
Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies
243
“lineage survival” oncogene required for both melanoma development and metastatic spread (141). In the study by Yu and colleagues, the power of integrating multiple diverse genomic data of prostate cancer models (in vitro cell line, in vivo tumor profiling, and genome-wide location data) to search for key targets genes of the Polycomb family protein EZH2, showed the ADRB2 gene as a critical mediator of beta-adrenergic signaling (142). A number of additional papers have applied similar genetic screens to mouse models of cancer to discover new oncogenes. In a screen for gene copy number changes in mouse mammary tumors, a 350-kb amplicon from a region syntenic to a locus amplified in human cancers at chromosome 11q22 was detected. This amplicon contained only one gene, YAP, which encodes the mammalian ortholog of Drosophila Yorkie (Yki), and resulted a regulator of cellular proliferation and apoptosis in epithelial cells (143). In a mouse model of hepatocarcinoma, genome-wide analyses of tumors revealed a similar amplification at mouse chromosome 9qA1 syntenic to human chromosome 11q22. Gene expression analyses delineated cIAP1 and YAP as candidate oncogenes that cooperated to promote tumorigenesis (144). A different study characterized metastatic variants in an induced mouse model of melanoma, identifying an acquired focal chromosomal amplification that corresponded to a much larger amplification in chromosome 6p25 in human metastatic melanomas. Further investigation demonstrated that NEDD9, the only gene within the minimal common region that exhibited amplification-associated overexpression, was a bona fide melanoma metastasis oncogene (145). Through the analysis of human and mouse models of B-cell lymphoma, Chang and colleagues demonstrated that c-MYC regulates a much broader set of miRNAs than previously anticipated. Notably, MYC overexpression promoted a widespread repression of miRNA expression, primarily through direct binding to miRNA promoters (146). An important advantage of the simultaneous study of human and mouse tumors is that putative candidate genes can be functionally validated in vivo. The identification of tumor suppressor genes in cancer by classic genetics methods has been difficult and slow. In one report, integration of genomic and gene expression microarray data was applied to localize suppressor genes. Within 20 homozygous deletion areas detected in 48 human B-cell lymphoma cell lines, a number of novel candidate genes were pinpointed (100). Notably, some of these genes were shown to be inactivated in lymphoma biopsies by various genetic and epigenetic mechanisms that substantially varied among the different lymphoma subgroups. Thus, the P53inducible PIG7/LITAF was silenced by homozygous deletion in primary mediastinal B-cell lymphoma and by promoter hypermethylation in germinal center lymphoma, whereas the proapoptotic BIM gene showed homozygous deletion in mantle cell lymphoma and promoter hypermethylation in Burkitt lymphoma (100).
244
Martínez-Climent et al.
A different study evaluated the candidate target genes in chromosome 8p21.3 deletions delineated through high-resolution array–CGH of B-cell lymphomas. In previous reports, the presence of deletions of 8p in mantle cell lymphoma was associated with blood dissemination (83, 147). By comparing gene expression profiles of tumors with and without 8p deletion, only two genes within the 8p21.3 deletion, those encoding for the TRAIL receptors R1 and R2, showed significant downregulation in deleted tumors (148). However, a recent report discovered that deletion of BIN3, another gene included within the 8p21.3 commonly deleted region, generated B-cell lymphoma in aging mice (149). Loss of BIN3, which is a BAR adapter protein, did not affect normal cell proliferation but rather increased the motility of transformed cells. It is tempting to speculate that the loss of BIN3 may enhance B-cell lymphocyte migration, leading to a disseminated disease in patients with mantle cell lymphoma. A similar integrative microarray analysis revealed downregulation of the gene encoding P53-binding protein 1 (53BP1) in DLBCL with heterozygous deletion of chromosome 15q15, this deletion being more common in the BCR-DLBCL group (150). Although a reduced gene and protein dosage (haploinsufficiency) caused by the single-copy loss is suggested as the tumoral pathogenetic mechanism in these reports, further investigations are needed to validate this attractive hypothesis. A different strategy combined nonsense-mediated RNA decay microarrays and array–CGH for the genome-wide identification of genes with biallelic inactivation involving nonsense mutations and loss of the wild-type allele. This approach enabled the authors to identify previously unknown inactivating mutations in the receptor tyrosine kinase gene EPHB2, which were shown to be functionally important in the progression and metastasis of prostate cancer (151). Zardo and colleagues used an alternative approach that integrated array–CGH and restriction landmark genomic scanning for global analysis of aberrant methylation of CpG islands in a series of human glioblastomas (152). Results showed that most aberrant methylation events are focal and independent of genomic deletions, but a small subset of genes were affected by convergent methylation and deletion, including genes that exhibit tumor-suppressor activity such as SOCS1 and COE3. In a different study, Stransky and colleagues used a combination of transcriptome correlation map analysis and array–CGH to evaluate, at a large-scale, epigenetic suppression of gene expression of whole genomic regions. Authors demonstrated such regional copy number-independent deregulation of transcription by long-range epigenetic silencing in a series of bladder carcinomas (153). In another study, authors determined the expression profiling of microRNAs in T24 cells, revealing that 17 out of 313 miRNAs were upregulated after DNA demethylation and histone deacetylase inhibition treatment. One of these, miR127, was shown to repress the BCL6 oncogene, suggesting a role in the pathogenesis of this disease (154).
Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies
245
Multiple myeloma is one of the tumors where integrative oncogenomic approaches have been more successfully applied. Shaughnessy and colleagues performed microarray analysis on myeloma cells from 532 patients. Seventy genes, 30% of them mapping to chromosome 1, were linked to reduced length of survival. Importantly, most upregulated genes mapped to chromosome 1q (frequently amplified in myeloma), and downregulated genes mapped to chromosome 1p (frequently deleted in myeloma). These data suggest that altered transcriptional regulation of chromosome 1 genes contribute to multiple myeloma pathogenesis and can be used to identify high-risk disease (60). In a different study, high-resolution array–CGH data and expression profiles were determined in a collection of myeloma cell lines and patient biopsies. Unsupervised classification defined distinct genomic subtypes. Genomic and expression data integration generated a refined list of myeloma gene candidates, thereby providing a molecular framework for dissection of disease pathogenesis (155). More recently, two different groups investigated possible genetic lesions responsible for the constitutive NF-kB activation observed in multiple myeloma by integrating array–CGH and gene expression profiling data. Keats and colleagues found mutations in ten genes causing the inactivation of TRAF2, TRAF3, CYLD, and cIAP1/cIAP2 and activation of NFKB1, NFKB2, CD40, LTBR, TACI, and NIK that result primarily in constitutive activation of the noncanonical NF-kB pathway, with the single most common abnormality being inactivation of TRAF3 (156). Annunziata and colleagues compared the genetic profiles of multiple myeloma cell lines that were resistant or sensitive to an inhibitor of IkappaB kinase beta (IKKbeta) targeting the NF-kB pathway. Sensitive cell lines with NF-kB activation showed frequent genetic or epigenetic alteration of NIK, TRAF3, CYLD, cAPI1/cAPI2, CD40, NFKB1, or NFKB2 genes (157). These two complementary reports uncovered frequent genetic lesions of genes in the NF-kB pathway, suggesting that NF-kB inhibitors hold promise for the treatment of this disease.
5. Future Investigations: Integrative Computational Analysis of Novel High-Throughput Genetic Technologies in Cancer Biology
A myriad of new high-throughput technologies are being used in cancer research, including exon arrays to analyze alternative splicing, tiling arrays for high-resolution investigation of DNA and histone methylation patterns, on-chip chromatin immunoprecipitation to discover DNA–protein interactions, and protein microarrays to measure global protein expression portraits. Consequently, next comparative oncogenomic and proteomic assays will attempt to visualize these complex molecular interactions
246
Martínez-Climent et al.
in the context of highly connected and regulated cellular networks. While we assist these fantastic advances, our last challenge is to use this comprehensive biological knowledge to accelerate the transition from current empirical therapies to tailored medicine.
6. Materials 6.1. Total RNA Preparation for Microarray Analysis
This protocol is suitable for total RNA sample preparation for microarray analysis from cell lines or fresh frozen tissues. RNA obtained this way is very clean and salt free (see Notes 1 and 2). 1. TRIzol® Reagent, Invitrogen Life Technologies. 2. RNeasy® Mini Kit, QIAGEN. 3. Absolute ethanol (store ethanol at room temperature). 4. 80% ethanol (store ethanol at room temperature). 5. IKA® T-10 Basic Homogenizer (for fresh frozen tissue). 6. Nanodrop ND-1000 Spectrophotometer. 7. 2100 Bioanalyzer and Agilent, RNA 6000 Nano LabChip® kit.
6.2. DNA Preparation for Microarray Analysis
This protocol is based on the procedure established by QIAGEN using their DNeasy® Blood & Tissue kit. 1. DNeasy® Blood & Tissue kit, QIAGEN. 2. Absolute ethanol. 3. Reduced EDTA TE buffer (10 mM Tris–HCl, 0.1 mM EDTA, pH 8.0). 4. Nanodrop ND-1000 Spectrophotometer.
6.3. Oligonucleotide Gene Expression Microarrays
1. One-cycle target labeling and control reagents, Affymetrix. 2. Absolute ethanol. 3. 80% ethanol. 4. GeneChip Hybridization, Wash, and Stain kit, Affymetrix. 5. GeneChip Eukaryotic Hybridization Control Kit, Affymetrix, P/N 900454 (30 reactions) or P/N 900457 (150 reactions), contains Control cRNA and Control Oligo B2. 6. Nanodrop ND-1000 spectrophotometer. 7. 2100 Bioanalyzer and Agilent, RNA 6000 Nano LabChip® kit. 8. Hybridization Oven 640, Affymetrix. 9. Heatblock.
Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies
247
10. Fluidics Station 450, Affymetrix. 11. GeneChip® Scanner 3000, Affymetrix. 6 .4. CGH to BAC Microarrays
1. 2.5× random primers (BioPrime DNA labeling systems, Invitrogen). Store at −20°C. 2. Genomic DNA. 3. Klenow fragment (40 U/mL, BioPrime DNA labeling system, Invitrogen). Store at −20°C. 4. Cy3- and Cy5-labeled dCTP (1 mM, Amersham Pharmacia Biotech, Inc.). 5. 0.5 M EDTA, pH 8.0. 6. 1 M Tris-HCl, pH 7.6. 7. 10× dNTP mixture in sterile water: 3.7 mM dATP, dTTP, and dGTP (Invitrogen), 1.8 mM dCTP (Invitrogen), 10 mM Tris-HCl, pH 7.6, and 1 mM EDTA. 8. Sephadex G-50 spin column (Amersham Pharmacia Biotech, Inc). 9. Human cot-1 DNA (1 mg/mL, Invitrogen). 10. 20% sodium dodecyl sulfate (SDS) in sterile H2O (heat at 68°C to dissolve). 11. 100% ethanol. Store at −20°C. 12. 3.0 M sodium acetate, pH 5.2. 13. Dextran sulfate sodium salt (500,000 MW). 14. Formamide (re-distilled, ultra pure, Invitrogen). Store at −20°C. 15. 20× SSC (3.0 M NaCl, 0.3 M sodium citrate, pH 7.0). 16. Master mix mixture: dissolve 1 g dextran sulfate in 5 mL of formamide, 1 mL of 20× SSC, and 1 mL dH2O. Adjust to pH 7.0 with approximately two drops of HCl. 17. PN buffer: 0.1 M sodium phosphate, 0.1% Nonidet P40, pH 8.0. 18. UV Stratalinker 2400 (Stratagene) capable of producing 130,000 × 100 mJ UV. 19. Rocking table (~1 rpm) inside a 37°C incubator. 20. Rubber cement (Ross, American Glue Corporation). 21. Silicon gasket (Press-to-seal, 2-mm thick, #62-6508-24, PGC Scientific). 22. 100% glycerol. 23. 10× phosphate-buffered saline (PBS). 24. Stereomicroscope. 25. Binder clips, medium size.
248
Martínez-Climent et al.
26. 1M Pixel CCD Imager (custom made; Dan Pincel, UCSF) or the 2-color scanner array WoRxe Biochip Reader (AppliedPrecision, Issaquah, WA, USA), a white-light CCD-based system that provides highest quality images along with more accurate and repeatable microarray results. 6.5. High-Resolution SNP–CGH Microarrays
1. Reduced EDTA TE buffer (10 mM Tris–HCl, 0.1 mM EDTA, pH 8.0), TEKnova. 2. 250 ng genomic DNA per array working stock, 50 ng/mL. 3. StyI (10,000 U/mL), New England Biolabs (NEB). 4. NspI (10,000 U/mL), New England Biolabs (NEB). 5. AccuGENE® Water, molecular biology grade, Cambrex. 6. T4 DNA Ligase, New England Biolabs (NEB). 7. Adaptor Nsp (50 mM), Affymetrix. 8. Adaptor Sty (50 mM), Affymetrix. 9. G-C Melt (5 M), Clontech. 10. dNTP (2.5 mM), Takara or Fischer Scientific. 11. PCR Primer 002 (100 mM), Affymetrix. 12. Clontech TITANIUM® Taq Polymerase (50×), Clontech. 13. All purpose Hi-Lo DNA Marker, Bionexus, Inc. 14. DNA amplification clean-up kit, to be used with Affymetrix DNA products (one plate). The kit contains RB buffer. 15. Fragmentation reagent (DnaseI), Affymetrix. 16. 10× Fragmentation Buffer, Affymetrix. 17. 4% TBE Gel, BMA Reliant precast (4% NuSieve 3:1 Plus Agarose), Cambrex. 18. GeneChip® DNA Labeling Reagent (30 mM), Affymetrix. 19. Terminal deoxynucleotidyl transferase (30 U/mL), Affymetrix. 20. 5× Terminal deoxynucleotidyl transferase buffer, Affymetrix. 21. 5 M tetramethyl ammonium chloride (TMACL), Sigma. 22. MES Hydrate Sigma Ultra, Sigma. 23. MES Sodium salt, Sigma. 24. Denhardt’s Solution, Sigma. 25. Herring sperm DNA (HSDNA), Promega. 26. Human Cot-1 DNA®, Invitrogen. 27. Oligo control reagent, 0100 (OCR, 0100), Affymetrix. 28. GeneChip 250K array (one per sample). 29. GeneAmp® PCR System 9700 Thermocycler by Applied Biosystems. 30. GeneChip Hybridization oven 640.
Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies
249
31. Manifold-QIAvac multiwell unit, QIAGEN, P/N 9014579. 32. Biomek® Seal and Sample aluminum foil lids, Beckman. 33. Jitterbug® 115 VAC, Boekel Scientific. 34. QIAGEN® Vacuum regulator, QIAGEN.
7. Methods 7 .1. Total RNA Preparation for Microarray Analysis
1a. For fresh frozen tissue samples: The amount of tissue required is variable depending on the kind of tissue and varies from 10 to 100 mg to get 10–300 mg of total RNA. Be careful not to let tissue thaw before homogenization. Homogenize tissue directly in TRIzol® reagent using an electric homogenizer by means of a small-gauge generator (5 mm). The recommended volume of TRIzol® is 1 mL for each 50–100 mg of tissue. Homogenize each sample tube at least three times for at least 1 min each time. Keep the samples on ice in between each round of homogenization because overheating of samples can cause RNA degradation. 1b. For cell lines: Pellet cells by centrifugation and completely remove culture medium. Do not wash cells at this time, proceed directly to lyse cells with the appropriate amount of TRIzol® reagent (recommended by manufacturer: 1 mL/5–10 × 106 cells) by pipetting. 2. Let the samples stand for 5 min at room temperature. 3. Pass the sample twice through a 25-gauge needle to reduce viscosity of the sample. 4. Add 200 mL of chloroform per milliliter of TRIzol® used and shake the sample for 15 s vigorously by hand. Incubate for 1 min and shake again for 15 s. 5. Centrifuge the sample at 12,000 × g for 15 min at 2–8°C. 6. After centrifugation, the mixture separates into two phases, the colorless upper phase is the aqueous phase containing the RNA. The other phase is the pink phase (phenol– chloroform) that contains DNA and proteins. Take 200 mL from the top layer to continue and add to 700 mL of QIAGEN RLT buffer in a new RNase-free tube. (Do not add 2-mercaptoethanol to RLT buffer because it may increase background in the array). 7. Add 500 mL of absolute ethanol to the sample (200 mL + 700 mL RLT). Mix well by vortexing. 8. Apply the mixture to a QIAGEN Mini or MicroElute spin column and spin for 15 s at 8,000 × g. Discard the flow-through
250
Martínez-Climent et al.
and repeat the procedure until all the sample has been loaded onto the column. 9. Replace the collector tube for a new tube and wash the column by adding 500 mL of the RPE buffer. Centrifuge for 15 s at 8,000 × g and discard the flow-through. 10. Add 700 mL of 80% ethanol and spin at 8,000 × g 15 s. Repeat this step again to efficiently remove all guanidine salts. 11. Transfer the column to a new collector tube and spin for 5 min at top speed with tubes cap off to ensure removal of ethanol. 12. To elute RNA, transfer the column to a new 1.5-mL RNasefree microfuge tube. Elute with 20 or 14 mL of RNase-free water for Mini or MicroElute Spin column, respectively. 7.2. Quality Control of RNA
To qualify RNA for microarray applications, it is important to measure its concentration, 260/280 ratio, 260/230 ratio, and RNA integrity. We use Nanodrop to asses that the concentration is at least 250 ng/mL, the 260/280 ratio is between 1.9 and 2.1; and the 260/230 ratio is greater than 1.5 (this determines the presence of salts that could inhibit labeling reactions). Integrity of RNA can be measured by studying integrity of ribosomal RNA (rRNA) on a gel. Affymetrix recommends the use of the capillary electrophoresis Bioanalyzer 2100 system from Agilent. This software calculates the RNA integrity number (RIN), which in our experience should be greater than 8.0 to guarantee that the sample will work properly on the array.
7.3. DNA Preparation for Microarray Analysis
For tissue samples: 1a. The amount of tissue needed is variable, but 25-mg tissue (up to 10 mg spleen) maybe suitable for this application. Cut the tissue into small pieces, and place it in a 1.5-mL microcentrifuge tube. Add 180 mL Buffer ATL. 2a. Add 20 mL proteinase K. (600 mAu/mL) Mix thoroughly by vortexing, and incubate at 55°C until the tissue is completely lysed (it can be lysed overnight). During incubation, occasional vortexing is recommended to disperse the sample. 3a. Add 200 mL Buffer AL to the sample, and mix thoroughly by vortexing. Then add 200 mL ethanol (96–100%), and mix again by vortexing. It is essential that the sample, Buffer AL, and ethanol are mixed immediately and thoroughly to yield a homogeneous solution. For cell lines: 1b. Start from approximately 5 × 106 cells, pellet them and wash twice with 1× PBS. Resuspend the pellet in 200 mL of 1× PBS. 2b. Add 20 mL proteinase K (600 mAu/mL) and 200 mL of buffer AL, mix thoroughly by vortexing, and place at 70°C for 10 min.
Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies
251
3b. Then add 200 mL ethanol (96–100%), and again mix thoroughly by vortexing. 4. Pipet the sample (including any precipitate) into the DNeasy® Mini spin column placed in a 2-mL collection tube. Centrifuge at 6,000 × g for 1 min. Discard the flow-through and collection tube. 5. Place the DNeasy® Mini spin column in a new 2-mL collection tube, add 500 mL Buffer AW1, and centrifuge for 1 min at 6,000 × g. Discard the flow-through and collection tube. 6. Place the DNeasy® Mini spin column in a new 2-mL collection tube, add 500 mL Buffer AW2, and centrifuge for 3 min at 20,000 × g to dry the DNeasy® membrane. Discard the flow-through and collection tube. 7. Place the DNeasy® Mini spin column in a clean 1.5- or 2-mL microcentrifuge tube, and pipet 200 mL Buffer AE directly onto the DNeasy® membrane. Incubate at room temperature for 1 min, and then centrifuge for 1 min at 6,000 × g to elute. If SNP arrays from Affymetrix are to be performed, then use a buffer with low EDTA concentration to elute the sample (10 mM Tris–HCl; 0.1 mM EDTA, pH 8.0) because EDTA concentration adversely affects the following reactions. 7.4. Quality Control of DNA
8. Oligonucleotide Gene Expression Microarrays 8.1. Introduction
The principal parameters to control DNA quality are concentration (for the 500K SNP array from Affymetrix, it should be at least 50 ng/mL), a 260/280 ratio of approximately 1.9 if pure DNA, and a 260/230 ratio greater than 1.5 in salt-free samples. To determine DNA integrity, we perform gel electrophoresis on a 1–2% agarose 1× TBE gel. High-quality genomic DNA will give a band of 10–20 Kb on the gel.
We use the One-Cycle Eukaryotic Target Labeling Assay from Affymetrix. It is possible to start with total RNA (1–15 mg) or mRNA (0.2–2 mg). We usually begin with 2 mg of total RNA. It is fundamental to start with the same amount of RNA for all samples to be compared. This RNA is first reverse transcribed using a T7-Oligo(dT) Promoter Primer. The second-strand synthesis reaction is mediated by RNase H. Double-stranded cDNA obtained is then purified and used as a template in the following in vitro transcription (IVT) reaction. The IVT reaction is performed in the presence of T7 RNA polymerase and a biotinylated nucleotide analog/ribonucleotide mix for complementary RNA (cRNA) amplification and biotin labeling. These biotinylated
252
Martínez-Climent et al.
cRNA targets are then cleaned up, fragmented, and hybridized to GeneChip expression arrays (see Note 3). 8 .2. Preparation of Poly-A RNA Controls for One-Cycle cDNA Synthesis (Spike-in Controls) 8.2.1. First-Strand cDNA Synthesis
The relative amount of Poly-A RNA Controls added to the sample RNA will be constant, therefore, it is dependent on the initial amount of sample. For 2 mg of RNA, 2 mL of a 1:50,000 dilution of Poly-A RNA Controls is used. 1. Mix RNA sample, diluted poly-A RNA controls, and T7-Oligo(dT) Primer. Incubate the reaction for 10 min at 70°C. Then cool the sample at 4°C for at least 2 min. 2. In a separate tube, assemble the First-Strand Master Mix: 4.0 mL of 5× 1st Strand Reaction Mix; 2.0 mL of 0.1 M DTT; 1 mL of 10 mM dNTP (per sample). 3. Transfer 7 mL of First-Strand Master Mix to each RNA/ T7-Oligo(dT) Primer mix for a final volume of 19 mL. Mix by flicking the tube a few times. Immediately place the tubes at 42°C and incubate for 2 min at 42°C. 4. Add 1 mL of SuperScript II to each RNA sample for a final volume of 20 mL. 5. Incubate for 1 h at 42°C; then cool the sample for at least 2 min at 4°C.
8.2.2. Second-Strand cDNA Synthesis
1. Prepare Second-Strand Master Mix: 91 mL RNase-free Water; 30 mL of 5× 2nd Strand Reaction Mix; 3 mL of 10 mM dNTP; 1 mL E. coli DNA ligase; 4 mL E. coli DNA Polymerase I; 1 mL RNase H (per sample). 2. Add 130 mL of Second-Strand Master Mix to each firststrand synthesis sample from First-Strand cDNA Synthesis for a total volume of 150 mL. Incubate for 2 h at 16°C. 3. Add 2 mL of T4 DNA Polymerase to each sample and incubate for an additional 5 min at 16°C. 4. Add 10 mL of 0.5 M EDTA and proceed to Section 8.2.3. Do not leave the reactions at 4°C for long periods of time.
8.2.3. Cleanup of Double-Stranded cDNA
1. Add 600 mL of cDNA Binding Buffer to the double-stranded cDNA synthesis preparation and mix by vortexing for 3 s. The color of the mixture should be yellow. If not, add 10 mL of 3 M sodium acetate pH 5.0 and mix. 2. Apply 500 mL of the sample to the cDNA Cleanup Spin Column sitting in a 2-mL collection tube, and centrifuge for 1 min at ³8,000 × g. Discard the flow-through. Repeat reload of the spin column with the remaining mixture and centrifuge as above. Discard the flow-through and collection tube.
Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies
253
3. Transfer the spin column into a new 2-mL collection tube. Wash the spin column with 750 mL of the cDNA Wash Buffer. Centrifuge for 1 min at ³8,000 × g. Discard the flowthrough. 4. Open the cap of the spin column and centrifuge for 5 min at maximum speed to completely eliminate ethanol. Discard the flow-through and collection tube. 5. Transfer spin column into a 1.5-mL collection tube, and pipet 14 mL of cDNA Elution Buffer directly onto the spin column membrane. Incubate for 1 min at room temperature and centrifuge 1 min at maximum speed (£25,000 × g) to elute. 8.3. Synthesis of Biotin-Labeled cRNA
1. Transfer the needed amount of template cDNA (if 2 mg were used as starting material, use 12 mL of purified cDNA) to RNase-free microfuge tubes and add the following reaction components in the order indicated: 8 mL RNase-free Water; 4 mL of 10× IVT Labeling Buffer; 12 mL IVT Labeling NTP Mix; and 4 mL IVT Labeling Enzyme Mix. It is important not to assemble the reaction on ice, because spermidine in the 10× IVT Labeling Buffer can lead to precipitation of the template cDNA. 2. Incubate at 37°C for 16 h in a thermal cycler.
8.3.1. Cleanup and Quantification of Biotin-Labeled cRNA
1. Add 60 mL of RNase-free water to the IVT reaction and mix by vortexing for 3 s. 2. Add 350 mL IVT cRNA Binding Buffer to the sample and mix by vortexing for 3 s. 3. Add 250 mL ethanol (96–100%) to the lysate, and mix well by pipetting. Do not centrifuge at this step. 4. Apply sample (700 mL) to the IVT cRNA Cleanup Spin Column sitting in a 2-mL collection tube. Centrifuge for 15 s at ³8,000 × g. Discard the flow-through and collection tube. 5. Transfer the spin column into a new 2-mL collection tube. Pipet 500 mL IVT cRNA Wash Buffer onto the spin column. Centrifuge for 15 s at ³8,000 × g to wash. Discard the flowthrough. 6. Pipet 500 mL 80% (v/v) ethanol onto the spin column and centrifuge for 15 s at ³8,000 × g. Discard the flow-through. 7. Centrifuge for 5 min with caps off at maximum speed to allow complete drying of the membrane. Discard the flowthrough and collection tube. 8. Transfer spin column into a new 1.5-mL collection tube, and pipet 21 mL of RNase-free water directly onto the spin column membrane. Centrifuge for 1 min at maximum speed (£25,000 × g) to elute.
254
Martínez-Climent et al.
For subsequent quantification of the purified cRNA, we dilute the eluate 1:5 or 1:4-fold in RNase-free water. We use Nanodrop to determine the concentration of the cRNA obtained and Bioanalyzer to study the sizes of the labeled products (which should have an average size of 1,580 nucleotides). If using total RNA as starting material, it is necessary to calculate an adjusted cRNA yield to reflect carryover of unlabeled total RNA. Using an estimate of 100% carryover, use the formula below to determine adjusted cRNA yield: adjusted cRNA yield = RNAm−(total RNAi) (y) RNAm = amount of cRNA measured after IVT (mg). total RNAi = starting amount of total RNA (mg). y = fraction of cDNA reaction used in IVT Sample Cleanup Module. 8.3.2. Fragmenting the cRNA for Target Preparation
1. Fragmentation of cRNA is a critical step of the protocol. When using a 49-microarray format, we will fragment 20 mg (with a volume ranging from 1 to 21 mL). The final volume of fragmentation reaction is 40 mL, where 8 mL corresponds to 5× Fragmentation Buffer. 2. Incubate the reaction at 94°C for 35 min. Put on ice after the incubation. Save an aliquot for analysis on the Bioanalyzer. This standard fragmentation procedure should produce a distribution of RNA fragment sizes from approximately 35–200 bases. Undiluted, fragmented cRNA sample is ready to perform the hybridization. If you are not going to proceed with labeling, store the sample at −20°C (or −70°C for longer-term storage).
8.3.3. Hybridization
1. Mix the following for each target, scaling up volumes for hybridization to multiple probe arrays. –15 mg fragmented cRNA (final concentration 0.05 mg/mL) –5 mL control oligonucleotide B2, 3 nM (final concentration 50 pM) –15 mL of 20× Eukaryotic Hybridization Controls (bioB, bioC, bioD, cre) (final concentration 1.5 pM) –150 mL of 2× hybridization buffer (final concentration 1×) –30 mL DMSO (final concentration 10%) –Nuclease-free water, upto 300 mL 2. Equilibrate probe array to room temperature immediately before use. 3. Heat the hybridization cocktail to 99°C for 5 min in a heat block.
Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies
255
4. Meanwhile, wet the array by filling it through one of the septa with an appropriate volume of 1× prehybridization buffer using a micropipettor and appropriate tips. Incubate the probe array at 45°C for 10 min with rotation. 5. Transfer the hybridization cocktail that has been heated at 99°C, in step 3, to a 45°C heat block for 5 min. 6. Spin the hybridization cocktail(s) at maximum speed in a microcentrifuge for 5 min to remove any insoluble material from the hybridization mixture. 7. Remove the buffer solution from the probe array cartridge and fill with 200 mL (for the 49-microarray format) of the clarified hybridization cocktail, avoiding any insoluble matter at the bottom of the tube. 8. Place the probe array into the hybridization oven, set to 45°C. Avoid stress to the motor; load the probe arrays in a balanced configuration around the axis. Rotate at 60 rpm. Hybridize for 16 h. 8 .3.4. Staining, Washing, and Scanning
Staining and washing are performed using the Fluidics Station 450 (Affymetrix). At this point, the most important issue is to select the correct script for your chip. For example, HUG-133 2.0 Plus uses protocol FS450_0001. The script contains the directions to stain and wash the microarray: the number of cycles of washing or staining, the temperature, and the buffer. For HUG-133 2.0 Plus, place Stain Cocktail 1 in sample holder 1, Stain Cocktail 2 in sample holder 2, and Array Holding Buffer in sample holder 3. In the final step, the probe array is filled with array holding buffer; arrays can be stored for 3 h at 4°C in the dark before scanning. The scanner used is the GeneChip® Scanner 3000. A complete image of the scanned array is stored as a .DAT file (scanned image, full information), and GCOS software generates the .Cel file, which represents the first summarization step because the image is summarized in median intensity/probe cell.
9. CGH to BAC Microarrays 9.1. Introduction
The arrays for CGH consist of a linker-adapter PCR representation of BAC clones printed on a substrate. Each clone contains at least one sequence tagged site (STS) and is mapped to the human genome sequence. Clones containing unique sequences near telomeres and clones containing genes known to be significant in cancer and medical genetics are included. Hybridization to these arrays allows detection of single copy gains and losses compared
256
Martínez-Climent et al.
with diploid cells even in presence of normal cell contamination (see Notes 4–6). 9.2. Random-Primed Labeling of Genomic DNA for Array–CGH Analysis
A typical random-primed labeling procedure is described. The random-primed labeling is carried out in a 25-mL reaction volume containing 600 ng genomic DNA, 1× random primers, 40 U Klenow DNA polymerase, Cy3- and Cy5-labeled dCTP, and 1× dNTP mixture. 1. Mix 6,000 ng genomic DNA with 10 mL of 2.5× random primer solution and bring the volume up to 21 mL with sterile H2O. 2. Denature the DNA by heating the mixture at 99°C in a PCR machine for 10 min. Briefly centrifuge and place on ice. 3. Add 2.5 mL of the 10× dNTP mixture, 1 mL of 1 mM Cy3and Cy5-labeled dCTP, and 0.6 mL Klenow DNA polymerase. Incubate at 37°C for 12–20 h. 4. Remove unincorporated nucleotides from the DNA. Place a Sephadex G-50 column in a 1.5-mL tube and pre-spin the column at 760 × g for 1 min. Discard the supernatant. Tap the end of the tube on a paper towel to remove the remaining supernatant from the neck of the tube. Place the column in a clean 1.5-mL tube, apply the sample onto the column, and spin at 760 × g for 2 min.
9.3. Hybridization of Fluorescently Labeled Genomic DNA for Array–CGH Analysis
1. Preparation of the array for the hybridization:
(a) Expose a printed array to 260,000 mJ (2,600 × 100 mJ) of UV using a Stratalinker. Place the slide in the Stratalinker, with the array facing up. Overcrosslinking the slide might result in a decrease in fluorescent hybridization signal.
(b) Fill a 10-mL syringe with rubber cement and fit a 200mL pipet tip onto the syringe outlet. You may have to cut 1–2 mm off the wide end of the pipet tip for it to fit well. Apply a rubber cement ring around each array on the slide, using a stereomicroscope to observe the area of the array. Air-dry and apply a second thick layer of rubber cement on top of the first layer. Air-dry the rubber cement.
2. Preparation of samples for hybridization:
(a) Combine 25 mL labeled test genomic DNA, 25 mL labeled reference genomic DNA, and 40–50 mg human Cot-1 DNA. Precipitate the DNA sample mixture by adding 2.5 volumes of ice-cold 100% ethanol and 0.1 volume of 3 M sodium acetate pH 5.2. Vortex the solution briefly and collect the precipitate by centrifugation at 14,000 × g for 45 min at 4°C.
Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies
257
(b) Carefully aspirate and discard the supernatant. Wipe the excess liquid from the tube and air-dry the pellet for approximately 5–10 min. Dissolve the pellet in 7 mL dH2O, 14 mL 20% SDS, and 49 mL master mix mixture. Incubate for 1 h at room temperature to completely resuspend.
3. Denature the DNA sample at 73°C for 13 min and then incubate at 37°C for 1–2 h to allow the Cot-1 DNA to anneal to repetitive sequences. 4. Place the array on a heat block set at 37°C for 5 min to warm the array. 5. Apply the sample (step 3) onto the array. Keep the sample at 37°C until just before application to the array to reduce nonspecific binding of the probe to the array surface. Place a silicon gasket around the edge of the slide and lay a clean glass slide on top, aligning the edges with the gasket. Clamp the assembly together using binder clips. Incubate the array for 48–68 h at 37°C on a rocking table (~1 rpm). 6. Disassembly the array assembly and rinse the hybridization solution from the slide under a stream of PN buffer. It is preferable to leave the rubber cement on the array at this time, because it will not affect the rising steps that follow. 7. Wash the slides once in 50% formamide, 2× SSC, pH 7.0, for 15 min at 45°C, followed by a 15 min wash in PN buffer at room temperature. The washes can conveniently be done in slide staining jars (coplin jars) placed in water baths. 8. At the bench, carefully remove the rubber cement with forceps, while keeping the array moist with PN buffer. 9. Mount the slide in a DAPI solution to stain the array spots (90% glycerol, 10% PBS, 1 mM DAPI). 9.4. Microarray Image Capture with CCD Imager and Microarray Image Quantification
To capture the microarray image with CCD Imager for the image quantification, we use the software “UCSF SPOT” available in www.janlab.org/downloads.html. This software allows numerical values to be obtained, expressed in log2 ratio, for the ratios comprised between the sample to be analyzed and the control sample. The numerical data are processed and saved in an Excel table. Using the software “SPROC,” the data are normalized from the spot files, generating the final log2 ratio file data with the standard deviation (medians of each three spots). At the same time, the program arranges the BACs by genomic position and chromosome location (http//genome. vse.ucsc.edu).
258
Martínez-Climent et al.
10. High-Resolution SNP–CGH Microarrays 10.1. Introduction
10.1.1. Step 1. Genomic DNA Preparation
The purpose of the Affymetrix GeneChip Mapping 500K Assay is to detect SNPs greater than 500,000 in samples of genomic DNA. The Mapping 500K Set is comprised of two arrays and two assay kits. The protocol starts with 250 ng of genomic DNA per array and will generate SNP genotype calls for approximately 250,000 SNPs for each array of the two-array set. The assay utilizes a strategy that reduces the complexity of the human genomic DNA up to tenfold by first digesting the genomic DNA with the NspI or StyI restriction enzyme and then ligating sequences onto the DNA fragments. The complexity is further reduced by a PCR procedure optimized for fragments of a specified size range. After these steps, the PCR products are fragmented, end-labeled, and hybridized to a Gene Chip array (see Note 7). To minimize contamination of the samples, the use of two separate rooms to perform the assay is recommended: one is the pre-PCR clean room (or area for the DNA template and free of PCR products), and the other is the PCR staging room or main laboratory, where the rest of steps are performed. 1. Thoroughly mix the genomic DNA by vortexing at high speed for 3 s. 2. Determine the concentration of each genomic DNA sample. 3. Based on OD measurements, dilute each sample to 50 ng/mL using reduced EDTA TE buffer.
10.1.2. Step 2. Restriction Enzyme Digestion
Before proceeding: – Program the thermal cycler in advance. Switch on the thermal cycler 10 min before the reactions are ready so that the lid is heated. – Reference genomic DNA 103 is supplied in both the GeneChip® Mapping 250K NSP or Sty Assay kits. This DNA can be used as a positive control. 1. Depending on the restriction enzyme used, prepare the following Digestion Master Mix ON ICE (for multiple samples, make a 5% excess). For the NspI digestion, 9.75 mL H2O; 2 mL of 10× NE Buffer 2; 2 mL of 10× BSA (1 mg/mL); and 1 mL NspI (10 U/mL). For the StyI digestion, 9.75 mL H2O; 2 mL of 10× NE Buffer 3; 2 mL of 10× BSA (1 mg/mL); and 1 mL StyI (10 U/mL). Note: The BSA is supplied as 100× (10 mg/mL), and needs to be diluted 1:10 with molecular biology-grade water before use. 2. Add 5 mL of genomic DNA diluted to each tube. The total amount of genomic DNA is 250 ng for each restriction enzyme.
Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies
259
3. Aliquot 14.75 mL of the digestion master mix to each tube containing DNA. Mix gently and spin at 400 × g. 4. Place the tubes in the thermal cycler and run the 500K Digest program: 37°C, 120 min; 65°C, 20 min; hold at 4°C. Store the sample at −20°C if not proceeding to the next step. 10.1.3. Step 3. Ligation
Before proceeding: – Program the thermal cycler in advance. Switch on the thermal cycler 10 min before the reactions are ready so that the lid is heated. – Ligase buffer contains ATP and should be thawed/held at 4°C. Avoid multiple freeze-thaw cycles, according to vendor’s instructions. 1. Depending on the restriction enzyme used, in the pre-PCR area, prepare the following Ligation Master Mix ON ICE (for multiple samples, prepare a 5% excess): for NspI, 0.75 mL Adaptor Nsp 50 mM; 2.5 mL of 10× T4 DNA Ligase Buffer; and 2 mL T4 DNA Ligase (400 U/mL). For StyI: 0.75 mL Adaptor Sty 50 mM; 2.5 mL of 10× T4 DNA Ligase Buffer; and 2 mL T4 DNA Ligase (400 U/mL). Total volume: 5.25 mL. 2. Aliquot 5.25 mL of the Ligation Master Mix into each digested DNA simple. Add 19.75 mL of the digested DNA to bring the total volume to 25 mL. Mix gently and spin at 400 × g for 1 min at 4°C. 3. Place the tubes into a thermal cycler and run the 500K Ligate program: 16°C, 180 min; 70°C, 20 min; hold at 4°C.
Store samples at −20°C if not proceeding to the next step within 60 min.
4. Dilute each DNA ligation reaction by adding 75 mL of molecular biology-grade water to the 25 mL (1/4 dilution). 10.1.4. Step 4: PCR
Before proceeding: – Program the thermal cycler in advance. Switch on the thermal cycler 10 min before the reactions are ready so that the lid is heated. 1. Prepare the following PCR master mix ON ICE (three PCR reactions per sample) in the pre-PCR clean room for NspI or StyI ligation reactions and vortex at medium speed for 2 s (for multiple samples, make a 5% excess). For one PCR: 39.5 mL H2O; 10 mL of 10× Clontech TITANIUM® Taq PCR Buffer; 20 mL of 5 M G-C Melt; 14 mL of 2.5 mM dNTPs; 4.5 mL of 100 mM PCR Primer 002; and 2 mL of 50× Clontech TITANIUM® Taq Polymerase. Note: 90 mg of PCR product is needed for fragmentation.
260
Martínez-Climent et al.
2. Transfer 10 mL of each diluted ligated DNA to the corresponding three PCR tubes. 3. Add 90 mL PCR master mix to obtain a total volume of 100 mL. 4. Mix gently and spin samples at 400 × g for 1 min. 5. Place in the thermal cycler in the main laboratory and run the 500K PCR program (optimized for the GeneAmp® PCR System 9700 Thermocycler): 94°C, 3 min; 30× (94°C, 30 s; 60°C, 45 s; 68°C, 15 s); 68°C, 7 min; hold at 4°C. 6. Run 3 mL of each PCR product mixed with 3 mL of 2× Gel Loading Dye on 2% TBE gel at 120 V for 1 h. PCR products can be stored at −20°C if not proceeding to the next step within 60 min. 10.1.5. Step 5: PCR Purification and Elution with Clontech Clean-Up Plate
1. Connect a vacuum manifold to a suitable vacuum source able to maintain approximately 600 mbar. 2. Place a Clean-Up Plate on top of the manifold. Cover wells that are not needed with a PCR plate cover. We recommend covering the plate with the aluminum cover, and removing the portion of the cover corresponding to the probe wells. 3. Add 8 mL of 0.1 M EDTA (diluted from the 0.5 M EDTA in water) to each PCR reaction. Seal the plate with the plate cover, vortex at medium speed for 2 s, and spin at 400 × g for 1 min. 4. Consolidate three PCR reactions for each sample into one well of the Clean-Up Plate. 5. Apply a vacuum and maintain at 600 mbar until the wells are completely dry. 6. Wash the PCR products by adding 50 mL molecular biologygrade water and dry the wells completely (~20 min). Repeat this step two additional times for a total of three water washes. 7. Switch off the vacuum source and release the vacuum. 8. Carefully remove the Clean-Up Plate from the vacuum manifold and immediately:
(a) Blot the plate on a stack of clean absorbent papers to remove any liquid that might remain on the bottom of the plate.
(b) Dry the bottom of each well with an absorbent wipe.
9. Add 45 mL RB buffer to each well. Cover the plate with PCR plate cover film and seal tightly. Moderately shake the CleanUp Plate on a plate shaker for 10 min at room temperature. 10. Recover the purified PCR product to clean tubes by pipetting the eluate out of each well and transferring it to the corresponding tube.
Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies
10.1.6. Step 6: Quantification of Purified PCR Products
261
1. Add 2 mL of the purified PCR product to 198 mL molecular biology-grade water and mix well. 2. Read the absorbance at 260 nm. Ensure that the reading is in the quantitative range of the instrument (generally 0.2– 0.8 OD). 3. Apply the convention that one absorbance unit at 260 nm equals 50 mg/mL for double-stranded PCR products. 4. For fragmentation:
10.1.7. Step 7: Fragmentation
(a) Transfer 90 mg of each of the purified DNA samples to the corresponding wells of a new plate.
(b) Bring the total volume of each well up to 45 mL by adding the appropriate volume of RB buffer.
(c) Cover the plate with PCR plate cover film and seal tightly.
(d) Vortex at medium speed for 2 s, and spin down at 400 × g for 1 min.
Before proceeding: – Preheat the thermal cycler to 37°C before setting up the fragmentation reaction. – Prepare the fragmentation dilution immediately prior to use. – Perform all the dilution and mixing steps on ice. 1. Preheat the thermal cycler to 37°C. 2. Add 5 mL of 10× Fragmentation Buffer to each sample (45 mL) in the corresponding tube ON ICE, giving a total volume of 50 mL. 3. Examine the label of the GeneChip Fragmentation Reagent tube for the units per microliter definition, and calculate the dilution: Y = microliters of stock Fragmentation Reagent. X = units of stock Fragmentation Reagent per microliter (see the label on the tube). 0.05 U/mL = final concentration of diluted Fragmentation Reagent. 120 mL = final volume of diluted Fragmentation Reagent (enough for 20 reactions). Y = 0.05 U/mL × 120 mL/X U/mL. 4. Dilute the stock of Fragmentation Reagent to 0.05 U/mL as follows:
(a) Place the water, Fragmentation Buffer, and Fragmentation Reagent on ice.
(b) Combine the reagents ON ICE in the order described in the example listed below.
(c) Vortex at medium speed for 2 s.
262
Martínez-Climent et al.
An example of dilution is: 105 mL H2O; 12 mL of 10× Fragmentation Buffer; and 3 mL Fragmentation Reagent; giving a total volume of 120 mL. 5. Divide the Fragmentation Reagent into the tubes required. 6. Add 5 mL of diluted Fragmentation Reagent (0.05 U/mL) to the PCR samples tubes containing Fragmentation mix on ice. Pipet up and down several times to mix. The total volume for each sample is 50 mL. 7. Mix the tubes gently and spin briefly at 400 × g at 4°C. 8. Place the samples in a preheated thermocycler as quickly as possible, and run the 500K Fragment program: 37°C, 35 min; 95°C, 15 min; hold at 4°C. 9. Spin the samples to collect at the bottom of the tube. 10. Dilute 4 mL of fragmented PCR product with 4 mL gel loading dye and run on a 4% TBE gel. Proceed immediately to the labeling step if the result matches the example below. 10.1.8. Step 8: Labeling
Before proceeding: – Program the thermal cycler in advance. Switch on the thermal cycler 10 min before the reactions are ready so that the lid is heated. 1. Prepare Labeling Mix ON ICE and vortex at medium speed for 2 s (for multiple samples, make a 5% excess): 14 mL of 5× TdT Buffer; 2 mL of 30 mM GeneChip® DNA Labeling Reagent; and 3.5 mL TdT (30 U/mL). 2. Aliquot 19.5 mL of Labeling Master Mix into the tubes containing 50.5 mL of fragmented DNA, giving a total volume of 70 mL. 3 . Mix the reaction gently and spin at 400 × g for 1 min at 4°C. 4. Run the 500K Label program: 37°C, 4 h; 95°C, 15 min; hold at 4°C. 5. Spin the plate briefly at 400 × g to collect the reaction at the bottom of the tube. Samples can be stored at −20°C if not proceeding to the next step.
10.1.9. Step 9: Target Hybridization
Before proceeding: – It is important to allow the arrays to equilibrate to room temperature completely. Unwrap the array and leave on the bench top for 15 min. – DMSO is light sensitive. It should be contained in a dark glass bottle. – Preparation of the 12× MES Stock: 70.4 g MES Hydrate; 193.3 g MES Sodium salt; 800 mL molecular biology-grade water.
Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies
263
Mix and adjust the volume to 1,000 mL. The pH should be between 6.5 and 6.7. Filter through a 0.2-mm filter. Do not autoclave. Store between 2 and 8°C, and shield from light. 1. Prepare the Hybridization Cocktail Master Mix in the order described. For multiple samples, prepare a 5% excess: 12 mL of 12× MES; 13 mL DMSO; 13 mL of 50× Denhardt’s Solution; 3 mL of 0.5 M EDTA; 3 mL HSDNA (10 mg/mL); 2 mL OCR 0100; 3 mL human Cot-1 DNA (1 mg/mL); 1 mL of 3% Tween-20; and 140 mL of 5 M TMACL. Mix well. 2. Transfer each of the labeled samples to a 1.5-mL Eppendorf tube. Aliquot 190 mL of the Hybridization Cocktail Master Mix into the 70 mL of labeled DNA samples, giving a final volume of 260 mL. 3. Heat the 260 mL of hybridization mix and labeled DNA at 99°C in a heat block for exactly 10 min to denature. 4. Cool on crushed ice for 10 s. 5. Spin briefly at 400 × g in a microfuge to collect any condensate. 6. Place the tubes at 49°C for 1 min. 7. Inject 200 mL denatured hybridization cocktail into the array. 8. Hybridize at 49°C for 16–18 h at 60 rpm in the oven. The remaining hybridization mix can be stored at −20°C for future use.
11. Notes 1. The source of RNA is a major determinant of the success for each individual microarray experiment. In this procedure, between 3 and 50 mg of high-quality RNA (usually corresponding to a 15- to 100-mm3 tumor biopsy) is needed. Ideally, tumor biopsies frozen immediately after surgical resection in liquid nitrogen (at least at −80°C to prevent RNA degradation) should be used (158, 159). This requirement limits the study of large series of patient samples, most of which are not stored in adequate conditions, especially in retrospective analyses or in series of rare tumors that are collected from different institutions. In addition, this requirement makes it problematic to obtain early tumors or biopsies obtained through minimally invasive methods such as fine-needle aspiration (159–161). An alternative method to preserve biological specimens involves suspending the tissue in a preservative such as RNAlater
264
Martínez-Climent et al.
(Ambion, Austin, Tx, USA), followed by snap freezing of the tissue the next day. This method obviates the immediate need for nitrogen liquid, and preserves the integrity of RNA to be used in microarrays experiments (160). 2. Although there are novel methods to extract high-quality RNA from small tumor amounts (even from a single cell) and formalin-fixed tissues, the utility of these RNAs should be extensively evaluated and carefully validated in gene expression microarrays (160, 162). 3. Tumors are composed of different cell types, including malignant cells, stromal and inflammatory cells, and blood vessels. The proportion of these cell populations vary between and within tumors. Because this heterogeneity can complicate the interpretation of microarrays results, a careful selection of the tumors to be included in the study is an important step. In addition, a detailed histopathological analysis of each tumor sample is mandatory. In cases with a low percentage of tumor cells, microdissection of the tumor cells in biopsies or cell sorting by flow cytometry in blood, marrow aspirates, effusions, or desegregated lymph nodes may be a good choice (163, 164). However, expression of the nonmalignant surrounding cells may also be informative, and in some situations the analysis of both isolated tumor cells and whole tumors may be a good choice (55). One additional issue is the inclusion in the microarray study of normal cell populations to allow the comparison of the genetic profiles of tumors with their putative cells of origin and with the normal surrounding cells (3, 165). 4. In array-based CGH using BAC clones, several factors influence the success of the analysis. First, the general heterogeneity of the spotted BACs, which differ in the proportion of repetitive sequences and gene DNA contents, provide variable signal hybridization intensities. Second, like in gene expression microarrays, is the presence of “contaminating” nontumoral surrounding cells in the sample. These normal cells have two DNA copies genome wide (with the exception of X and Y chromosomes in male patients), and conversely to gene expression profiling, its analysis does not provide any biological information to the study. Thus, array–CGH should limit its application to cases with more than 50% of tumoral cells, because lower proportions may yield a normal genomic profile corresponding to the normal cells (76). Third, is the production variability among the different arrays printed at each laboratory, including the few commercially available BAC microarrays. Fourth, these arrays can be used to analyze paraffin-embedded tissues, but this largely depends on the DNA quality and integrity isolated from fixed cells.
Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies
265
Despite these difficulties, whole-genome BAC arrays of approximately 1 Mb resolution (including 3,000–4,000 probes) have been successfully applied to search for genomic changes in many cancer types, allowing an accurate description and mapping of areas of genomic amplification and deletion (14, 120, 136, 166–168). These alterations can be easily confirmed and visualized in the tumoral cells by complementary fluorescence in situ hybridization (FISH) using the same BACs as probes (169). 5. Array–CGH devices can be also applied to scan tumor genomes in other species, predominantly in laboratory mice (112, 170), dogs (171), and Drosophila (172). 6. More recently, tiling resolution human DNA microarrays with more than 32,000 overlapping BAC clones covering the entire human genome have been developed, allowing the identification of minute DNA alterations in tumors not previously detected (109, 173). However, the presence of such amounts of BAC clones that cannot be individually verified, and the inclusion of only one BAC per array (instead of the three to five BACs spotted on the 1-Mb BAC arrays) have limited the application of these initial arrays. 7. One advantage of SNP–CGH arrays is that they only use a test (tumoral) DNA that is hybridized on the chip, without needing any normal DNA as a control. Results of one particular sample are generally “normalized” with respect to available data obtained from the study of a pool of normal DNAs; however, to increase sensitivity and avoid false positive results, the analysis of tumoral and normal DNAs from each individual in two different arrays is usually recommended (174–176). Important limitations of this technology include the poor-quality results obtained from the analysis of DNAs extracted from paraffin-embedded tissues and the limitation for the detection of areas of UPD in biopsies with more than 50% of nontumoral cells.
12. Integrative Oncogenomics: Correlation of Genomic Aberrations and Gene Expression Data
We describe step-by-step our recommended sequence of algorithms and statistical tests to integrate expression data with copy number data. 1. Derive gene expression levels and raw copy number data The data from the expression and copy number .cel files must be preprocessed to remove noise and make the arrays comparable between them.
266
Martínez-Climent et al.
For gene expression data, RMA, GCRMA (177, 178), dChip, or other methods can be used. The authors recommend the use of RMA, because it has became the de facto standard to obtain the expression levels of a gene. To derive the raw copy number, there are several methods, such as CNAT, CNAG (179), dChip, and Aroma.affymetrix (180). There are marginal differences between them. The most accurate seems to be Aroma.affymetrix. In this case, the user has to be confident using R programming language. CNAT, CNAG, and dChip provide convenient user interfaces that Aroma does not. There are other packages for the R programming language (SNPChip). The main disadvantage that occurs in these packages – and not in Aroma.affymetrix – is that all of the information of the .cel files must be stored in memory, limiting the number of arrays to be analyzed to a few tens – depending on the type of array. There are some special information files related with Affymetrix chips called chip definition files (cdf). These files provide the information on how to group each single probe into a set of probes. We recommend using the cdf provided by the Brainarray Website instead of the Affymetrix default files (181). This website updates the information of these files frequently, improving the results of the analysis. On the other hand, these definition files have the advantage that a set of probes correspond to a single gene – in the case of Affymetrix, a gene can be represented by several set of probes, making it difficult to know the correct one. 2. Segmentation of the raw copy number data Copy number alterations occur in segments of the genome – a whole chromosome, an arm of a chromosome, or a part of it. This fact can be used to extract the parts – segments – of the genome that have the same copy number. The procedure to get these parts from the raw copy number data is called segmentation. There are various algorithms to perform the segmentation. Three with the most widespread use are circular binary segmentation (CBS) (176), hidden Markov models (HMM), and CGHSeq (182). CNAT, CNAG, and dChip provide HMM segmentation whereas Aroma uses CBS segmentation. CHGSeq must be used under the Matlab platform. A major drawback of HMM is that the number of states – and the corresponding copy numbers – have to be established beforehand. If there is contamination of normal tissue in a tumor sample, copy number will no longer be an integer number, and HMM may fail to discern copy number alterations. CBS and
Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies
267
CGHSeq do not have this problem; they provide an estimation of the copy number for each of the segments. 3. Assign copy number values to the genes After the described computations, each SNP has its copy number assigned. These data have to be combined to assign to each gene its corresponding copy number. The copy number for each gene is the mean of the copy number of the segments of the genome where the gene is located. Special attention has to be paid to genes in which there are copy number changes because aberrant splicing forms can occur. 4. Remove effects in expression data that are not related with the position in the genome Segmentation can be also applied to expression data to locate segments of the genome with genes overexpressed or underexpressed. If applied to expression data, it is better to apply CBS or CGHSeq because there are no obvious means to establish the states beforehand (as needed by HMM). Another possibility is to apply a filter (a moving average across the position in the genome) to the normalized expression data. The weights of this filter can follow a Gaussian distribution (Gaussian filter). 5. Detection of cytobands whose genes have their copy number or their expression significantly modified The authors suggest performing a hypergeometric test to detect which cytobands have genes that show a significant variation in copy number (increase or decrease). The hypergeometric test has four parameters: N (the total number of genes), n (the total number of genes in the cytoband), K (the number of genes with copy number increased), and k (the number of genes within a cytoband with copy number increased). This test provides a p value that describes whether the number of genes with copy number increased is especially large, i.e., statistically significant, for a particular cytoband. This test can be performed against all of the cytobands in the genome (~300) and for all of the samples within the study. The same procedure can be applied to gene expression to detect cytobands whose genes show a significant variation in their expression. 6. Global analysis of copy number and expression changes within a study A simple procedure to describe which loci in the genome show variation within a study is to show the percentage of samples that have variation in the copy number (increased or decreased) and coherent variation of gene expression, i.e., the percentage of samples that shows increase in the copy number and upregulation (Fig. 2).
Fig. 2. Results of the study of 29 lymphoma cell lines. Two chromosomes are shown (chromosomes 17 and 18), each with two graphics. The upper plot shows the percentage of samples with copy number increased and decreased. The lower plot shows the percentage of samples with copy number and expression increased (or decreased). It can be seen, for example, that 17q21.31 shows several genes that have increased both their copy number and their expression. 18q21.31 also shows an increase in copy number and expression. The gene BCL2 is located in this region.
Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies
269
References 1. Chung CH, Bernard PS, Perou CM. (2002) Molecular portraits and the family tree of cancer. Nature genetics. 32(Suppl), 533–540. 2. DeRisi J, Penland L, Brown PO, Bittner ML, Meltzer PS, Ray M, et al. (1996) Use of a cDNA microarray to analyse gene expression patterns in human cancer. Nature genetics. 14, 457–460. 3. Brentani RR, Carraro DM, Verjovski-Almeida S, Reis EM, Neves EJ, de Souza SJ, et al. (2005) Gene expression arrays in cancer research: methods and applications. Critical Reviews in oncology/hematology. 54, 95–105. 4. Staudt LM, Dave S. (2005) The biology of human lymphoid malignancies revealed by gene expression profiling. Advances in immunology. 87, 163–208. 5. Hoheisel JD. (2006) Microarray technology: beyond transcript profiling and genotype analysis. Nature reviews. Genetics. 7, 200–210. 6. Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, et al. (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature. 403, 503–511. 7. Tinker AV, Boussioutas A, Bowtell DD. (2006) The challenges of gene expression microarrays for the study of human cancer. Cancer cell. 9, 333–339. 8. Sotiriou C, Piccart MJ. (2007) Taking geneexpression profiling to the clinic: when will molecular signatures become relevant to patient care? Nature reviews. Cancer. 7, 545–553. 9. Unger MA, Rishi M, Clemmer VB, Hartman JL, Keiper EA, Greshock JD, et al. (2001) Characterization of adjacent breast tumors using oligonucleotide microarrays. Breast cancer research. 3, 336–341. 10. Kallioniemi A, Kallioniemi OP, Sudar D, Rutovitz D, Gray JW, Waldman F, et al. (1992) Comparative genomic hybridization for molecular cytogenetic analysis of solid tumors. Science (New York, NY). 258, 818–821. 11. Pinkel D, Segraves R, Sudar D, Clark S, Poole I, Kowbel D, et al. (1998) High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nature genetics. 20, 207–211. 12. Solinas-Toldo S, Lampel S, Stilgenbauer S, Nickolenko J, Benner A, Dohner H, et al. (1997) Matrix-based comparative genomic hybridization: biochips to screen for genomic imbalances. Genes, chromosomes & cancer. 20, 399–407. 13. Albertson DG, Ylstra B, Segraves R, Collins C, Dairkee SH, Kowbel D, et al. (2000) Quantitative mapping of amplicon structure by array
CGH identifies CYP24 as a candidate oncogene. Nature genetics. 25, 144–146. 14. Snijders AM, Nowak N, Segraves R, Blackwood S, Brown N, Conroy J, et al. (2001) Assembly of microarrays for genome-wide measurement of DNA copy number. Nature genetics. 29, 263–264. 15. Pollack JR, Perou CM, Alizadeh AA, Eisen MB, Pergamenschikov A, Williams CF, et al. (1999) Genome-wide analysis of DNA copynumber changes using cDNA microarrays. Nature genetics. 23, 41–46. 16. Barrett MT, Scheffer A, Ben-Dor A, Sampas N, Lipson D, Kincaid R, et al. (2004) Comparative genomic hybridization using oligonucleotide microarrays and total genomic DNA. Proceedings of the National Academy of Sciences of the United States of America. 101, 17765–17770. 17. Lindblad-Toh K, Tanenbaum DM, Daly MJ, Winchester E, Lui WO, Villapakkam A, et al. (2000) Loss-of-heterozygosity analysis of smallcell lung carcinomas using single-nucleotide polymorphism arrays. Nature biotechnology. 18, 1001–1005. 18. Zender L, Lowe SW. (2008) Integrative oncogenomic approaches for accelerated cancer-gene discovery. Current opinion in oncology. 20, 72–76. 19. Lowenberg B, Downing JR, Burnett A. (1999) Acute myeloid leukemia. The New England journal medicine. 341, 1051–1062. 20. Rowley JD. (2001) Chromosome translocations: dangerous liaisons revisited. Nature reviews. Cancer. 1, 245–250. 21. Bullinger L, Dohner K, Bair E, Frohling S, Schlenk RF, Tibshirani R, et al. (2004) Use of gene-expression profiling to identify prognostic subclasses in adult acute myeloid leukemia. The New England journal of medicine. 350, 1605–1616. 22. Valk PJ, Verhaak RG, Beijen MA, Erpelinck CA, Barjesteh van Waalwijk van Doorn-Khosrovani S, Boer JM, et al. (2004) Prognostically useful geneexpression profiles in acute myeloid leukemia. The New England journal of medicine. 350, 1617–1628. 23. Qian Z, Fernald AA, Godley LA, Larson RA, Le Beau MM. (2002) Expression profiling of CD34+ hematopoietic stem/ progenitor cells reveals distinct subtypes of therapy-related acute myeloid leukemia. Proceedings of the National Academy of Sciences of the United States of America. 99, 14925–14930. 24. Yeoh EJ, Ross ME, Shurtleff SA, Williams WK, Patel D, Mahfouz R, et al. (2002) Classification,
270
Martínez-Climent et al.
subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer cell. 1, 133–143. 25. Ferrando AA, Neuberg DS, Staunton J, Loh ML, Huard C, Raimondi SC, et al. (2002) Gene expression signatures define novel oncogenic pathways in T cell acute lymphoblastic leukemia. Cancer cell. 1, 75–87. 26. Kari L, Loboda A, Nebozhyn M, Rook AH, Vonderheid EC, Nichols C, et al. (2003) Classification and prediction of survival in patients with the leukemic phase of cutaneous T cell lymphoma. The Journal of experimental medicine. 197, 1477–1488. 27. Thiede C, Steudel C, Mohr B, Schaich M, Schakel U, Platzbecker U, et al. (2002) Analysis of FLT3-activating mutations in 979 patients with acute myelogenous leukemia: association with FAB subtypes and identification of subgroups with poor prognosis. Blood. 99, 4326–4335. 28. Armstrong SA, Staunton JE, Silverman LB, Pieters R, den Boer ML, Minden MD, et al. (2002) MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nature genetics. 30, 41–47. 29. Armstrong SA, Kung AL, Mabon ME, Silverman LB, Stam RW, Den Boer ML, et al. (2003) Inhibition of FLT3 in MLL. Validation of a therapeutic target identified by gene expression based classification. Cancer cell. 3, 173–183. 30. Rosenwald A, Wright G, Chan WC, Connors JM, Campo E, Fisher RI, et al. (2002) The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. The New England journal of medicine. 346, 1937–1947. 31. Hans CP, Weisenburger DD, Greiner TC, Gascoyne RD, Delabie J, Ott G, et al. (2004) Confirmation of the molecular classification of diffuse large B-cell lymphoma by immunohistochemistry using a tissue microarray. Blood. 103, 275–282. 32. Lossos IS, Czerwinski DK, Alizadeh AA, Wechser MA, Tibshirani R, Botstein D, et al. (2004) Prediction of survival in diffuse largeB-cell lymphoma based on the expression of six genes. The New England journal of medicine. 350, 1828–1837. 33. Lossos IS, Morgensztern D. (2006) Prognostic biomarkers in diffuse large B-cell lymphoma. Journal of clinical oncology. 24, 995–1007. 34. Lam LT, Davis RE, Pierce J, Hepperle M, Xu Y, Hottelet M, et al. (2005) Small molecule inhibitors of IkappaB kinase are selectively toxic for subgroups of diffuse large B-cell lymphoma defined by gene expression profiling. Clinical cancer research. 11, 28–40.
35. Lam LT, Wright G, Davis RE, Lenz G, Farinha P, Dang L, et al. (2008) Cooperative signaling through the signal transducer and activator of transcription 3 and nuclear factor-{kappa} B pathways in subtypes of diffuse large B-cell lymphoma. Blood. 111, 3701–3713. 36. Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar RC, et al. (2002) Diffuse large B-cell lymphoma outcome prediction by geneexpression profiling and supervised machine learning. Nature medicine. 8, 68–74. 37. Monti S, Savage KJ, Kutok JL, Feuerhake F, Kurtin P, Mihm M, et al. (2005) Molecular profiling of diffuse large B-cell lymphoma identifies robust subtypes including one characterized by host inflammatory response. Blood. 105, 1851–1861. 38. Chen L, Monti S, Juszczynski P, Daley J, Chen W, Witzig TE, et al. (2008) SYK-dependent tonic B-cell receptor signaling is a rational treatment target in diffuse large B-cell lymphoma. Blood. 111(4), 2230–2237. 39. Su TT, Guo B, Kawakami Y, Sommer K, Chae K, Humphries LA, et al. (2002) PKC-beta controls I kappa B kinase lipid raft recruitment and activation in response to BCR signaling. Nature immunology. 3, 780–786. 40. Smith PG, Wang F, Wilkinson KN, Savage KJ, Klein U, Neuberg DS, et al. (2005) The phosphodiesterase PDE4B limits cAMP-associated PI3K/AKT-dependent apoptosis in diffuse large B-cell lymphoma. Blood. 105, 308– 316. 41. Robertson MJ, Kahl BS, Vose JM, de Vos S, Laughlin M, Flynn PJ, et al. (2007) Phase II study of enzastaurin, a protein kinase C beta inhibitor, in patients with relapsed or refractory diffuse large B-cell lymphoma. Journal of clinical oncology. 25, 1741–1746. 42. Shipp MA. (2007) Molecular signatures define new rational treatment targets in large B-cell lymphomas. Hematology/the Education Program of the American Society of Hematology. 2007, 265–269. 43. Polo JM, Dell’Oso T, Ranuncolo SM, Cerchietti L, Beck D, Da Silva GF, et al. (2004) Specific peptide interference reveals BCL6 transcriptional and oncogenic mechanisms in B-cell lymphoma cells. Nature medicine. 10, 1329– 1335. 44. Polo JM, Juszczynski P, Monti S, Cerchietti L, Ye K, Greally JM, et al. (2007) Transcriptional signature with differential expression of BCL6 target genes accurately identifies BCL6-dependent diffuse large B cell lymphomas. Proceedings of the National Academy of Sciences of the United States of America. 104, 3207–3212.
Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies
45. Parekh S, Polo JM, Shaknovich R, Juszczynski P, Lev P, Ranuncolo SM, et al. (2007) BCL6 programs lymphoma cells for survival and differentiation through distinct biochemical mechanisms. Blood. 110, 2067–2074. 46. Savage KJ, Monti S, Kutok JL, Cattoretti G, Neuberg D, De Leval L, et al. (2003) The molecular signature of mediastinal large B-cell lymphoma differs from that of other diffuse large B-cell lymphomas and shares features with classical Hodgkin lymphoma. Blood. 102, 3871–3879. 47. Rosenwald A, Wright G, Leroy K, Yu X, Gaulard P, Gascoyne RD, et al. (2003) Molecular diagnosis of primary mediastinal B cell lymphoma identifies a clinically favorable subgroup of diffuse large B cell lymphoma related to Hodgkin lymphoma. The journal of experimental medicine. 198, 851–862. 48. Kuppers R, Klein U, Schwering I, Distler V, Brauninger A, Cattoretti G, et al. (2003) Identification of Hodgkin and Reed-Sternberg cell-specific genes by gene expression profiling. The journal of clinical investigation. 111, 529–537. 49. Klein U, Tu Y, Stolovitzky GA, Mattioli M, Cattoretti G, Husson H, et al. (2001) Gene expression profiling of B cell chronic lymphocytic leukemia reveals a homogeneous phenotype related to memory B cells. The journal of experimental medicine. 194, 1625–1638. 50. Orchard JA, Ibbotson RE, Davis Z, Wiestner A, Rosenwald A, Thomas PW, et al. (2004) ZAP-70 expression and prognosis in chronic lymphocytic leukaemia. Lancet. 363, 105–111. 51. Wiestner A, Rosenwald A, Barry TS, Wright G, Davis RE, Henrickson SE, et al. (2003) ZAP-70 expression identifies a chronic lymphocytic leukemia subtype with unmutated immunoglobulin genes, inferior clinical outcome, and distinct gene expression profile. Blood. 101, 4944–4951. 52. Crespo M, Bosch F, Villamor N, Bellosillo B, Colomer D, Rozman M, et al. (2003) ZAP-70 expression as a surrogate for immunoglobulin-variable-region mutations in chronic lymphocytic leukemia. The New England journal of medicine. 348, 1764–1775. 53. Hummel M, Bentink S, Berger H, Klapper W, Wessendorf S, Barth TF, et al. (2006) A biologic definition of Burkitt’s lymphoma from transcriptional and genomic profiling. The New England journal of medicine. 354, 2419–2430. 54. Dave SS, Fu K, Wright GW, Lam LT, Kluin P, Boerma EJ, et al. (2006) Molecular diagnosis of Burkitt’s lymphoma. The New England journal of medicine. 354, 2431–2442.
271
55. Dave SS, Wright G, Tan B, Rosenwald A, Gascoyne RD, Chan WC, et al. (2004) Prediction of survival in follicular lymphoma based on molecular features of tumor-infiltrating immune cells. The New England journal of medicine. 351, 2159–2169. 56. Husson H, Carideo EG, Neuberg D, Schultze J, Munoz O, Marks PW, et al. (2002) Gene expression profiling of follicular lymphoma and normal germinal center B cells using cDNA arrays. Blood. 99, 282–289. 57. Rosenwald A, Wright G, Wiestner A, Chan WC, Connors JM, Campo E, et al. (2003) The proliferation gene expression signature is a quantitative integrator of oncogenic events that predicts survival in mantle cell lymphoma. Cancer cell. 3, 185–197. 58. Martinez N, Camacho FI, Algara P, Rodriguez A, Dopazo A, Ruiz-Ballesteros E, et al. (2003) The molecular signature of mantle cell lymphoma reveals multiple signals favoring cell survival. Cancer research. 63, 8226–8232. 59. Basso K, Liso A, Tiacci E, Benedetti R, Pulsoni A, Foa R, et al. (2004) Gene expression profiling of hairy cell leukemia reveals a phenotype related to memory B cells with altered expression of chemokine and adhesion receptors. The journal of experimental medicine. 199, 59–68. 60. Shaughnessy JD, Jr, Zhan F, Burington BE, Huang Y, Colla S, Hanamura I, et al. (2007) A validated gene expression model of high-risk multiple myeloma is defined by deregulated expression of genes mapping to chromosome 1. Blood. 109, 2276–2284. 61. Davies FE, Dring AM, Li C, Rawstron AC, Shammas MA, O’Connor SM, et al. (2003) Insights into the multistep transformation of MGUS to myeloma using microarray expression analysis. Blood. 102, 4504–4511. 62. Zhan F, Barlogie B, Arzoumanian V, Huang Y, Williams DR, Hollmig K, et al. (2007) Genexpression signature of benign monoclonal gammopathy evident in multiple myeloma is linked to good prognosis. Blood. 109, 1692–1700. 63. Zhan F, Huang Y, Colla S, Stewart JP, Hanamura I, Gupta S, et al. (2006) The molecular classification of multiple myeloma. Blood. 108, 2020–2028. 64. Krivtsov AV, Twomey D, Feng Z, Stubbs MC, Wang Y, Faber J, et al. (2006) Transformation from committed progenitor to leukaemia stem cell initiated by MLL-AF9. Nature. 442, 818–822. 65. Ngo VN, Davis RE, Lamy L, Yu X, Zhao H, Lenz G, et al. (2006) A loss-of-function RNA interference screen for molecular targets in cancer. Nature. 441, 106–110.
272
Martínez-Climent et al.
66. Lenz G, Davis RE, Ngo VN, Lam L, George TC, Wright GW, et al. (2008) Oncogenic CARD11 mutations in human diffuse large B cell lymphoma. Science (New York, NY). 319, 1676–1679. 67. Peer D, Park EJ, Morishita Y, Carman CV, Shimaoka M. (2008) Systemic leukocytedirected siRNA delivery revealing cyclin D1 as an anti-inflammatory target. Science (New York, NY). 319, 627–630. 68. Schlabach MR, Luo J, Solimini NL, Hu G, Xu Q, Li MZ, et al. (2008) Cancer proliferation gene discovery through functional genomics. Science (New York, NY). 319, 620–624. 69. Silva JM, Marran K, Parker JS, Silva J, Golding M, Schlabach MR, et al. (2008) Profiling essential genes in human mammary cells by multiplex RNAi screening. Science (New York, NY). 319, 617–620. 70. Palomero T, Sulis ML, Cortina M, Real PJ, Barnes K, Ciofani M, et al. (2007) Mutational loss of PTEN induces resistance to NOTCH1 inhibition in T-cell leukemia. Nature medicine. 13, 1203–1210. 71. Lu J, Getz G, Miska EA, Alvarez-Saavedra E, Lamb J, Peck D, et al. (2005) MicroRNA expression profiles classify human cancers. Nature. 435, 834–838. 72. Lim LP, Lau NC, Garrett-Engele P, Grimson A, Schelter JM, Castle J, et al. (2005) Microarray analysis shows that some microRNAs downregulate large numbers of target mRNAs. Nature. 433, 769–773. 73. Calin GA, Dumitru CD, Shimizu M, Bichi R, Zupo S, Noch E, et al. (2002) Frequent deletions and down-regulation of micro-RNA genes miR15 and miR16 at 13q14 in chronic lymphocytic leukemia. Proceedings of the National Academy of Sciences of the United States of America. 99, 15524–15529. 74. Calin GA, Liu CG, Sevignani C, Ferracin M, Felli N, Dumitru CD, et al. (2004) MicroRNA profiling reveals distinct signatures in B cell chronic lymphocytic leukemias. Proceedings of the National Academy of Sciences of the United States of America. 101, 11755–11760. 75. Pekarsky Y, Santanam U, Cimmino A, Palamarchuk A, Efanov A, Maximov V, et al. (2006) Tcl1 expression in chronic lymphocytic leukemia is regulated by miR-29 and miR-181. Cancer research. 66, 11590–11593. 76. Pinkel D, Albertson DG. (2005) Comparative genomic hybridization. Annual review of geno mics and human genetics. 6, 331–354. 77. Fukuhara N, Tagawa H, Kameoka Y, Kasugai Y, Karnan S, Kameoka J, et al. (2006) Characterization of target genes at the 2p15–16 amplicon
in diffuse large B-cell lymphoma. Cancer science. 97, 499–504. 78. Kasugai Y, Tagawa H, Kameoka Y, Morishima Y, Nakamura S, Seto M. (2005) Identification of CCND3 and BYSL as candidate targets for the 6p21 amplification in diffuse large B-cell lymphoma. Clinical cancer research. 11, 8265–8272. 79. Werner CA, Dohner H, Joos S, Trumper LH, Baudis M, Barth TF, et al. (1997) High-level DNA amplifications are common genetic aberrations in B-cell neoplasms. The American journal of pathology. 151, 335–342. 80. Bea S, Tort F, Pinyol M, Puig X, Hernandez L, Hernandez S, et al. (2001) BMI-1 gene amplification and overexpression in hematological malignancies occur mainly in mantle cell lymphomas. Cancer research. 61, 2409–2412. 81. Sanchez-Izquierdo D, Buchonnet G, Siebert R, Gascoyne RD, Climent J, Karran L, et al. (2003) MALT1 is deregulated by both chromosomal translocation and amplification in B-cell nonHodgkin lymphoma. Blood. 101, 4539–4546. 82. Willis TG, Dyer MJ. (2000) The role of immunoglobulin translocations in the pathogenesis of B-cell malignancies. Blood. 96, 808–822. 83. Rubio-Moscardo F, Climent J, Siebert R, Piris MA, Martin-Subero JI, Nielander I, et al. (2005) Mantle-cell lymphoma genotypes identified with CGH to BAC microarrays define a leukemic subgroup of disease and predict patient outcome. Blood. 105, 4445–4454. 84. Zhu C, Mills KD, Ferguson DO, Lee C, Manis J, Fleming J, et al. (2002) Unrepaired DNA breaks in p53-deficient cells lead to oncogenic gene amplification subsequent to translocations. Cell. 109, 811–821. 85. Oshiro A, Tagawa H, Ohshima K, Karube K, Uike N, Tashiro Y, et al. (2006) Identification of subtype-specific genomic alterations in aggressive adult T-cell leukemia/lymphoma. Blood. 107, 4500–4507. 86. He L, Thomson JM, Hemann MT, HernandoMonge E, Mu D, Goodson S, et al. (2005) A microRNA polycistron as a potential human oncogene. Nature. 435, 828–833. 87. Tagawa H, Seto M. (2005) A microRNA cluster as a target of genomic amplification in malignant lymphoma. Leukemia. 19, 2013–2016. 88. Fontana L, Pelosi E, Greco P, Racanicchi S, Testa U, Liuzzi F, et al. (2007) MicroRNAs 17-5p-20a106a control monocytopoiesis through AML1 targeting and M-CSF receptor upregulation. Nature cell biology. 9, 775–787. 89. Lu Y, Thomson JM, Wong HY, Hammond SM, Hogan BL. (2007) Transgenic over-expression
Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies
of the microRNA miR-17-92 cluster promotes proliferation and inhibits differentiation of lung epithelial progenitor cells. Developmental biology. 310, 442–453. 90. Koralov SB, Muljo SA, Galler GR, Krek A, Chakraborty T, Kanellopoulou C, et al. (2008) Dicer ablation affects antibody diversity and cell survival in the B lymphocyte lineage. Cell. 132, 860–874. 91. Ventura A, Young AG, Winslow MM, Lintault L, Meissner A, Erkeland SJ, et al. (2008) Targeted deletion reveals essential and overlapping functions of the miR-17 through 92 family of miRNA clusters. Cell. 132, 875–886. 92. Xiao C, Srinivasan L, Calado DP, Patterson HC, Zhang B, Wang J, et al. (2008) Lymphoproliferative disease and autoimmunity in mice with increased miR-17-92 expression in lymphocytes. Nature immunology. 9, 405–414. 93. Dreyling MH, Bullinger L, Ott G, Stilgenbauer S, Muller-Hermelink HK, Bentz M, et al. (1997) Alterations of the cyclin D1/ p16-pRB pathway in mantle cell lymphoma. Cancer research. 57, 4608–4614. 94. Chim CS, Wong KY, Loong F, Lam WW, Srivastava G. (2007) Frequent epigenetic inactivation of Rb1 in addition to p15 and p16 in mantle cell and follicular lymphoma. Human pathology. 38, 1849–1857. 95. Faderl S, Kantarjian HM, Estey E, Manshouri T, Chan CY, Rahman Elsaied A, et al. (2000) The prognostic significance of p16(INK4a)/ p14(ARF) locus deletion and MDM-2 protein expression in adult acute myelogenous leukemia. Cancer. 89, 1976–1982. 96. Gallucci M, Guadagni F, Marzano R, Leonardo C, Merola R, Sentinelli S, et al. (2005) Status of the p53, p16, RB1, and HER-2 genes and chromosomes 3, 7, 9, and 17 in advanced bladder cancer: correlation with adjacent mucosa and pathological parameters. Journal of clinical pathology. 58, 367–371. 97. Kim CH, Yoo JS, Lee CT, Kim YW, Han SK, Shim YS, et al. (2006) FHIT protein enhances paclitaxel-induced apoptosis in lung cancer cells. International journal of cancer. 118, 1692–1698. 98. Krug U, Ganser A, Koeffler HP. (2002) Tumor suppressor genes in normal and malignant hematopoiesis. Oncogene. 21, 3475–3495. 99. Mattioli E, Vogiatzi P, Sun A, Abbadessa G, Angeloni G, D’Ugo D, et al. (2007) Immunohistochemical analysis of pRb2/p130, VEGF, EZH2, p53, p16(INK4A), p27(KIP1), p21(WAF1), Ki-67 expression patterns in gastric cancer. Journal of cellular physiology. 210, 183–191.
273
100. Mestre-Escorihuela C, Rubio-Moscardo F, Richter JA, Siebert R, Climent J, Fresquet V, et al. (2007) Homozygous deletions localize novel tumor suppressor genes in B-cell lymphomas. Blood. 109, 271–280. 101. Tagawa H, Karnan S, Suzuki R, Matsuo K, Zhang X, Ota A, et al. (2005) Genome-wide array-based CGH for mantle cell lymphoma: identification of homozygous deletions of the proapoptotic gene BIM. Oncogene. 24, 1348–1358. 102. Pasqualucci L, Compagno M, Houldsworth J, Monti S, Grunn A, Nandula SV, et al. (2006) Inactivation of the PRDM1/BLIMP1 gene in diffuse large B cell lymphoma. The Journal of experimental medicine. 203, 311–317. 103. Tam W, Gomez M, Chadburn A, Lee JW, Chan WC, Knowles DM. (2006) Mutational analysis of PRDM1 indicates a tumor-suppressor role in diffuse large B-cell lymphomas. Blood. 107, 4090–4100. 104. Ross CW, Ouillette PD, Saddler CM, Shedden KA, Malek SN. (2007) Comprehensive analysis of copy number and allele status identifies multiple chromosome defects underlying follicular lymphoma pathogenesis. Clinical cancer research. 13, 4777–4785. 105. Fitzgibbon J, Iqbal S, Davies A, O’Shea D, Carlotti E, Chaplin T, et al. (2007) Genome-wide detection of recurring sites of uniparental disomy in follicular and transformed follicular lymphoma. Leukemia. 21, 1514–1520. 106. Honma K, Tsuzuki S, Nakagawa M, Karnan S, Aizawa Y, Kim WS, et al. (2008) TNFAIP3 is the target gene of chromosome band 6q23.3-q24.1 loss in ocular adnexal marginal zone B cell lymphoma. Genes, chromosomes & cancer. 47, 1–7. 107. Kim WS, Honma K, Karnan S, Tagawa H, Kim YD, Oh YL, et al. (2007) Genome-wide array-based comparative genomic hybridization of ocular marginal zone B cell lymphoma: comparison with pulmonary and nodal marginal zone B cell lymphoma. Genes, chromosomes & cancer. 46, 776–783. 108. Thelander EF, Ichimura K, Corcoran M, Barbany G, Nordgren A, Heyman M, et al. (2008) Characterization of 6q deletions in mature B cell lymphomas and childhood acute lymphoblastic leukemia. Leukemia & lymphoma. 49, 477–487. 109. de Leeuw RJ, Davies JJ, Rosenwald A, Bebb G, Gascoyne RD, Dyer MJ, et al. (2004) Comprehensive whole genome array CGH profiling of mantle cell lymphoma model genomes. Human molecular genetics. 13, 1827–1837.
274
Martínez-Climent et al.
110. Hodgson G, Hager JH, Volik S, Hariono S, Wernick M, Moore D, et al. (2001) Genome scanning with array CGH delineates regional alterations in mouse islet carcinomas. Nature genetics. 29, 459–464. 111. Mao JH, Perez-Losada J, Wu D, Delrosario R, Tsunematsu R, Nakayama KI, et al. (2004) Fbxw7/Cdc4 is a p53-dependent, haploinsufficient tumour suppressor gene. Nature. 432, 775–779. 112. Snijders AM, Nowak NJ, Huey B, Fridlyand J, Law S, Conroy J, et al. (2005) Mapping segmental and sequence variations among laboratory mice using BAC array CGH. Genome research. 15, 302–311. 113. Mullighan CG, Goorha S, Radtke I, Miller CB, Coustan-Smith E, Dalton JD, et al. (2007) Genome-wide analysis of genetic alterations in acute lymphoblastic leukaemia. Nature. 446, 758–764. 114. Cobaleda C, Jochum W, Busslinger M. (2007) Conversion of mature B cells into T cells by dedifferentiation to uncommitted progenitors. Nature. 449, 473–477. 115. Xie H, Ye M, Feng R, Graf T. (2004) Stepwise reprogramming of B cells into macrophages. Cell. 117, 663–676. 116. Akasaka T, Balasas T, Russell LJ, Sugimoto KJ, Majid A, Walewska R, et al. (2007) Five members of the CEBP transcription factor family are targeted by recurrent IGH translocations in B-cell precursor acute lymphoblastic leukemia (BCP-ALL). Blood. 109, 3451–3461. 117. Dohner H, Stilgenbauer S, Benner A, Leupolt E, Krober A, Bullinger L, et al. (2000) Genomic aberrations and survival in chronic lymphocytic leukemia. The New England journal of medicine. 343, 1910–1916. 118. Zenz T, Dohner H, Stilgenbauer S. (2007) Genetics and risk-stratified approach to therapy in chronic lymphocytic leukemia. Best practice & research. Clinical haematology. 20, 439–453. 119. Schwaenen C, Nessling M, Wessendorf S, Salvi T, Wrobel G, Radlwimmer B, et al. (2004) Automated array-based genomic profiling in chronic lymphocytic leukemia: development of a clinical tool and discovery of recurrent genomic alterations. Proceedings of the National Academy of Sciences of the United States of America. 101, 1039–1044. 120. Kohlhammer H, Schwaenen C, Wessendorf S, Holzmann K, Kestler HA, Kienle D, et al. (2004) Genomic DNA-chip hybridization in t(11;14)-positive mantle cell lymphomas shows a high frequency of aberrations and allows a refined characterization of consensus regions. Blood. 104, 795–801.
121. Bea S, Ribas M, Hernandez JM, Bosch F, Pinyol M, Hernandez L, et al. (1999) Increased number of chromosomal imbalances and high-level DNA amplifications in mantle cell lymphoma are associated with blastoid variants. Blood. 93, 4365–4374. 122. Salaverria I, Zettl A, Bea S, Moreno V, Valls J, Hartmann E, et al. (2007) Specific secondary genetic alterations in mantle cell lymphoma provide prognostic information independent of the gene expression-based proliferation signature. Journal of clinical oncology. 25, 1216–1222. 123. Saddler C, Ouillette P, Kujawski L, Shangary S, Talpaz M, Kaminski M, et al. (2008) Comprehensive biomarker and genomic analysis identifies P53 status as the major determinant of response to MDM2 inhibitors in chronic lymphocytic leukemia. Blood. 111(3), 1584–1593. 124. Raghavan M, Lillington DM, Skoulakis S, Debernardi S, Chaplin T, Foot NJ, et al. (2005) Genome-wide single nucleotide polymorphism analysis reveals frequent partial uniparental disomy due to somatic recombination in acute myeloid leukemias. Cancer research. 65, 375–378. 125. Fitzgibbon J, Smith LL, Raghavan M, Smith ML, Debernardi S, Skoulakis S, et al. (2005) Association between acquired uniparental disomy and homozygous gene mutation in acute myeloid leukemias. Cancer research. 65, 9152–9154. 126. Baxter EJ, Scott LM, Campbell PJ, East C, Fourouclas N, Swanton S, et al. (2005) Acquired mutation of the tyrosine kinase JAK2 in human myeloproliferative disorders. Lancet. 365, 1054–1061. 127. Kralovics R, Passamonti F, Buser AS, Teo SS, Tiedt R, Passweg JR, et al. (2005) A gain-offunction mutation of JAK2 in myeloproliferative disorders. The New England journal of medicine. 352, 1779–1790. 128. Levine RL, Wadleigh M, Cools J, Ebert BL, Wernig G, Huntly BJ, et al. (2005) Activating mutation in the tyrosine kinase JAK2 in polycythemia vera, essential thrombocythemia, and myeloid metaplasia with myelofibrosis. Cancer cell. 7, 387–397. 129. Jones AV, Kreil S, Zoi K, Waghorn K, Curtis C, Zhang L, et al. (2005) Widespread occurrence of the JAK2 V617F mutation in chronic myeloproliferative disorders. Blood. 106, 2162–2168. 130. Flotho C, Steinemann D, Mullighan CG, Neale G, Mayer K, Kratz CP, et al. (2007) Genome-wide single-nucleotide polymorphism analysis in juvenile myelomonocytic leukemia
Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies
identifies uniparental disomy surrounding the NF1 locus in cases associated with neurofibromatosis but not in cases with mutant RAS or PTPN11. Oncogene. 26, 5816–5821. 131. Nielaender I, Martin-Subero JI, Wagner F, Martinez-Climent JA, Siebert R. (2006) Partial uniparental disomy: a recurrent genetic mechanism alternative to chromosomal deletion in malignant lymphoma. Leukemia. 20, 904–905. 132. Sellick GS, Goldin LR, Wild RW, Slager SL, Ressenti L, Strom SS, et al. (2007) A highdensity SNP genome-wide linkage search of 206 families identifies susceptibility loci for chronic lymphocytic leukemia. Blood. 110, 3326–3333. 133. Lockhart DJ, Winzeler EA. (2000) Genomics, gene expression and DNA arrays. Nature. 405, 827–836. 134. Pollack JR, Sorlie T, Perou CM, Rees CA, Jeffrey SS, Lonning PE, et al. (2002) Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors. Proceedings of the National Academy of Sciences of the United States of America. 99, 12963–12968. 135. Hyman E, Kauraniemi P, Hautaniemi S, Wolf M, Mousses S, Rozenblum E, et al. (2002) Impact of DNA amplification on gene expression patterns in breast cancer. Cancer research. 62, 6240–6245. 136. Martinez-Climent JA, Alizadeh AA, Segraves R, Blesa D, Rubio-Moscardo F, Albertson DG, et al. (2003) Transformation of follicular lymphoma to diffuse large cell lymphoma is associated with a heterogeneous set of DNA copy number and gene expression alterations. Blood. 101, 3109–3117. 137. Myers CL, Dunham MJ, Kung SY, Troyanskaya OG. (2004) Accurate detection of aneuploidies in array CGH and gene expression microarray data. Bioinformatics (Oxford, England). 20, 3533–3543. 138. Yi Y, Mirosevich J, Shyr Y, Matusik R, George AL, Jr. (2005) Coupled analysis of gene expression and chromosomal location. Genomics. 85, 401–412. 139. La Rosa P, Viara E, Hupe P, Pierron G, Liva S, Neuvial P, et al. (2006) VAMP: visualization and analysis of array-CGH, transcriptome and other molecular profiles. Bioinformatics (Oxford, England). 22, 2066–2073. 140. Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, et al. (2006) ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC bioinformatics. 7(Suppl 1), S7.
275
141. Garraway LA, Widlund HR, Rubin MA, Getz G, Berger AJ, Ramaswamy S, et al. (2005) Integrative genomic analyses identify MITF as a lineage survival oncogene amplified in malignant melanoma. Nature. 436, 117–122. 142. Yu J, Cao Q, Mehra R, Laxman B, Yu J, Tomlins SA, et al. (2007) Integrative genomics analysis reveals silencing of beta-adrenergic signaling by polycomb in prostate cancer. Cancer cell. 12, 419–431. 143. Overholtzer M, Zhang J, Smolen GA, Muir B, Li W, Sgroi DC, et al. (2006) Transforming properties of YAP, a candidate oncogene on the chromosome 11q22 amplicon. Proceedings of the National Academy of Sciences of the United States of America. 103, 12405–12410. 144. Zender L, Spector MS, Xue W, Flemming P, Cordon-Cardo C, Silke J, et al. (2006) Identification and validation of oncogenes in liver cancer using an integrative oncogenomic approach. Cell. 125, 1253–1267. 145. Kim M, Gans JD, Nogueira C, Wang A, Paik JH, Feng B, et al. (2006) Comparative oncogenomics identifies NEDD9 as a melanoma metastasis gene. Cell. 125, 1269–1281. 146. Chang TC, Yu D, Lee YS, Wentzel EA, Arking DE, West KM, et al. (2008) Widespread microRNA repression by Myc contributes to tumorigenesis. Nature genetics. 40, 43–50. 147. Martinez-Climent JA, Vizcarra E, Sanchez D, Blesa D, Marugan I, Benet I, et al. (2001) Loss of a novel tumor suppressor gene locus at chromosome 8p is associated with leukemic mantle cell lymphoma. Blood. 98, 3479–3482. 148. Rubio-Moscardo F, Blesa D, Mestre C, Siebert R, Balasas T, Benito A, et al. (2005) Characterization of 8p21.3 chromosomal deletions in B-cell lymphoma: TRAIL-R1 and TRAIL-R2 as candidate dosage-dependent tumor suppressor genes. Blood. 106, 3214–3222. 149. Ramalingam A, Duhadaway JB, SutantoWard E, Wang Y, Dinchuk J, Huang M, et al. (2008) Bin3 deletion causes cataracts and increased susceptibility to lymphoma during aging. Cancer research. 68, 1683–1690. 150. Takeyama K, Monti S, Manis JP, Cin PD, Getz G, Beroukhim R, et al. (2008) Integrative analysis reveals 53BP1 copy loss and decreased expression in a subset of human diffuse large B-cell lymphomas. Oncogene. 27, 318–322. 151. Huusko P, Ponciano-Jackson D, Wolf M, Kiefer JA, Azorsa DO, Tuzmen S, et al. (2004) Nonsense-mediated decay microarray analysis identifies mutations of EPHB2 in human prostate cancer. Nature genetics. 36, 979–983.
276
Martínez-Climent et al.
152. Zardo G, Tiirikainen MI, Hong C, Misra A, Feuerstein BG, Volik S, et al. (2002) Integrated genomic and epigenomic analyses pinpoint biallelic gene inactivation in tumors. Nature genetics. 32, 453–458. 153. Stransky N, Vallot C, Reyal F, Bernard-Pierrot I, de Medina SG, Segraves R, et al. (2006) Regional copy number-independent deregulation of transcription in cancer. Nature genetics. 38, 1386–1396. 154. Saito Y, Liang G, Egger G, Friedman JM, Chuang JC, Coetzee GA, et al. (2006) Specific activation of microRNA-127 with downregulation of the proto-oncogene BCL6 by chromatin-modifying drugs in human cancer cells. Cancer cell. 9, 435–443. 155. Carrasco DR, Tonon G, Huang Y, Zhang Y, Sinha R, Feng B, et al. (2006) High-resolution genomic profiles define distinct clinicopathogenetic subgroups of multiple myeloma patients. Cancer cell. 9, 313–325. 156. Keats JJ, Fonseca R, Chesi M, Schop R, Baker A, Chng WJ, et al. (2007) Promiscuous mutations activate the noncanonical NF-kappaB pathway in multiple myeloma. Cancer cell. 12, 131–144. 157. Annunziata CM, Davis RE, Demchenko Y, Bellamy W, Gabrea A, Zhan F, et al. (2007) Frequent engagement of the classical and alternative NF-kappaB pathways by diverse genetic abnormalities in multiple myeloma. Cancer cell. 12, 115–130. 158. Chowdary D, Lathrop J, Skelton J, Curtin K, Briggs T, Zhang Y, et al. (2006) Prognostic gene expression signatures can be measured in tissues collected in RNAlater preservative. The journal of molecular diagnostics. 8, 31–39. 159. Wang E, Miller LD, Ohnmacht GA, Liu ET, Marincola FM. (2000) High-fidelity mRNA amplification for gene profiling. Nature biotechnology. 18, 457–459. 160. Mazumder A, Wang Y. (2006) Gene-expression signatures in oncology diagnostics. Pharmacogenomics. 7, 1167–1173. 161. Florell SR, Coffin CM, Holden JA, Zimmermann JW, Gerwels JW, Summers BK, et al. (2001) Preservation of RNA for functional genomic studies: a multidisciplinary tumor bank protocol. Modern pathology. 14, 116–128. 162. Chen J, Byrne GE, Jr, Lossos IS. (2007) Optimization of RNA extraction from formalinfixed, paraffin-embedded lymphoid tissues. Diagnostic molecular pathology. 16, 61–72. 163. Barrett MT, Glogovac J, Prevo LJ, Reid BJ, Porter P, Rabinovitch PS. (2002) High-quality RNA and DNA from flow cytometrically sorted human epithelial cells and tissues. Biotechniques. 32, 888–890, 892, 894, 896.
164. Aoyagi K, Tatsuta T, Nishigaki M, Akimoto S, Tanabe C, Omoto Y, et al. (2003) A faithful method for PCR-mediated global mRNA amplification and its integration into microarray analysis on laser-captured cells. Biochemical and biophysical research communications. 300, 915–920. 165. Alizadeh A, Eisen M, Davis RE, Ma C, Sabet H, Tran T, et al. (1999) The lymphochip: a specialized cDNA microarray for the genomicscale analysis of gene expression in normal and malignant lymphocytes. Cold Spring Harbor symposia on quantitative biology. 64, 71–78. 166. Greshock J, Naylor TL, Margolin A, Diskin S, Cleaver SH, Futreal PA, et al. (2004) 1-Mb resolution array-based comparative genomic hybridization using a BAC clone set optimized for cancer gene analysis. Genome research. 14, 179–187. 167. Tagawa H, Suguro M, Tsuzuki S, Matsuo K, Karnan S, Ohshima K, et al. (2005) Comparison of genome profiles for identification of distinct subgroups of diffuse large B-cell lymphoma. Blood. 106, 1770–1777. 168. Fiegler H, Carr P, Douglas EJ, Burford DC, Hunt S, Scott CE, et al. (2003) DNA microarrays for comparative genomic hybridization based on DOP-PCR amplification of BAC and PAC clones. Genes, chromosomes & cancer. 36, 361–374. 169. Kallioniemi A, Visakorpi T, Karhu R, Pinkel D, Kallioniemi OP. (1996) Gene copy number analysis by fluorescence in situ hybridization and comparative genomic hybridization. Methods. 9, 113–121. 170. Chung YJ, Jonkers J, Kitson H, Fiegler H, Humphray S, Scott C, et al. (2004) A wholegenome mouse BAC microarray with 1-Mb resolution for analysis of DNA copy number changes by array comparative genomic hybridization. Genome research. 14, 188–196. 171. Thomas R, Fiegler H, Ostrander EA, Galibert F, Carter NP, Breen M. (2003) A canine cancer-gene microarray for CGH analysis of tumors. Cytogenetic and genome research. 102, 254–260. 172. Fan C, Long M. (2007) A new retroposed gene in Drosophila heterochromatin detected by microarray-based comparative genomic hybridization. Journal of molecular evolution. 64, 272–283. 173. Ishkanian AS, Malloff CA, Watson SK, DeLeeuw RJ, Chi B, Coe BP, et al. (2004) A tiling resolution DNA microarray with complete coverage of the human genome. Nature genetics. 36, 299–303. 174. Lin M, Wei LJ, Sellers WR, Lieberfarb M, Wong WH, Li C. (2004) dChipSNP: significance
Integrative Oncogenomic Analysis of Microarray Data in Hematologic Malignancies
curve and clustering of SNP-array-based loss-of-heterozygosity data. Bioinformatics (Oxford, England). 20, 1233–1240. 175. McCarroll SA, Hadnott TN, Perry GH, Sabeti PC, Zody MC, Barrett JC, et al. (2006) Common deletion polymorphisms in the human genome. Nature genetics. 38, 86–92. 176. Olshen AB, Venkatraman ES, Lucito R, Wigler M. (2004) Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics. 5, 557–572. 177. Cope LM, Irizarry RA, Jaffee HA, Wu Z, Speed TP. (2004) A benchmark for Affymetrix GeneChip expression measures. Bioinformatics (Oxford, England). 20, 323–331. 178. Wu Z, Irizarry RA. (2004) Preprocessing of oligonucleotide array data. Nature biotechnology. 22, 656–658; author reply 658.
277
179. Nannya Y, Sanada M, Nakazaki K, Hosoya N, Wang L, Hangaishi A, et al. (2005) A robust algorithm for copy number detection using high-density oligonucleotide single nucleotide polymorphism genotyping arrays. Cancer research. 65, 6071–6079. 180. Bengtsson H, Irizarry R, Carvalho B, Speed TP. (2008) Estimation and assessment of raw copy numbers at the single locus level. Bioinformatics (Oxford, England). 24, 759–767. 181. Dai M, Wang P, Boyd AD, Kostov G, Athey B, Jones EG, et al. (2005) Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic acids research. 33, e175. 182. Lai WR, Johnson MD, Kucherlapati R, Park PJ. (2005) Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data. Bioinformatics (Oxford, England). 21, 3763–3770.
Chapter 14 Cancer Gene Profiling in Pancreatic Cancer Felip Vilardell and Christine A. Iacobuzio-Donahue Summary High levels of RNases present in the normal pancreas and the abundance of desmoplastic stroma of most pancreatic cancers have traditionally caused difficulty in the extraction of high-quality RNA and gene expression profiling from pancreatic tissues. However, a variety of innovative strategies have made it possible to successfully perform a molecular analysis of pancreatic cancer, and the expression profiles that have been generated have provided tremendous insight into the nature of this aggressive disease. Here, we describe some of these techniques. Key words: Pancreas, Ductal adenocarcinoma, RNA extraction, Gene profiling
1. Introduction As recently as 10 years ago, it had been assumed that global expression profiling would be impossible in pancreatic tissues because of the high levels of RNases and other enzymes in the pancreas and the low neoplastic cellularity of most pancreatic cancers. However, these hurdles have been overcome by variety of approaches, and the resulting expression profiles that have been generated have literally provided a wealth of information regarding this aggressive disease (1–5, 7, 8, 10). A review of these studies indicates that different investigators have used varying and innovative strategies to discover those genes or pathways most characteristic of pancreatic cancers. The sample types that have proven most useful for gene expression profiling in the pancreas include human-derived cultured cell lines (normal and malignant) and surgically resected tissue specimens from the pancreas. Both sample types have inherent Robert Grützmann and Christian Pilarsky (eds.), Cancer Gene Profiling: Methods and Protocols, Methods in Molecular Biology, vol. 576 DOI 10.1007/978-1-59745-545-9_14, © Humana Press, a part of Springer Science + Business Media, LLC 2010
279
280
Vilardell and Iacobuzio-Donahue
advantages and disadvantages when performing a gene expression profiling study that are related to sample processing and interpretation of data, but a good understanding of these details should allow the researcher to successfully perform expression profiling experiments using pancreatic cancer tissues and obtain meaningful results. Pancreatic cell lines or cell line xenografts are very useful because they are pure populations of epithelial cells. One can therefore obtain an undiluted view of gene expression patterns (9). Additionally, neoplastic cell lines are particularly useful for evaluating the response of the neoplastic cells to various treatment strategies, delineating signaling cascades or cellular functions that may be altered by various experimental conditions. However, although cell lines are clearly useful, one must also appreciate their limitations. Cell lines are grown in artificial conditions that can result in the dysregulation of gene expression, particularly the downregulation of gene expression related to the interactions of epithelial cells (both normal and neoplastic) with their surrounding extracellular matrix components. Although this feature of cell lines may not affect some directed gene expression studies, it is nonetheless important to be aware of this limitation in interpretation of gene expression data based solely on the analysis of cell lines (6). Surgically resected tissue specimens, because they represent the neoplasm in its “native” state, are also essential for gene expression studies. However, two concerns exist regarding the use of surgically resected pancreatic tissue samples, i.e., the predominance of nonneoplastic stromal cells within the tumor tissue specimens, and the extent of messenger RNA (mRNA) degradation in pancreatic tissues. Typically, resected pancreatic cancers are composed of a minor population of infiltrating neoplastic epithelial cells surrounded by a predominance of dense fibrous (or desmoplastic) nonneoplastic stroma (Fig. 1). This stroma contains proliferating fibroblasts, small endothelial-lined vessels, inflammatory cells, and trapped residual atrophic parenchymal components of the organ invaded. A consistently low ratio of the infiltrating neoplastic epithelial cells to this abundant nonneoplastic desmoplastic response is rather unique to duct adenocarcinomas of the pancreas, in contrast to infiltrating carcinomas arising in other organ or tissue types. Microdissection or other methods of purification of the epithelial component have been used successfully to overcome this perceived obstacle (2, 4). Alternatively, some investigators have successfully used the approach of coanalyzing resected samples of chronic pancreatitis together with resected pancreatic cancers, or by coanalyzing cell lines with resected cancers, as a means to determine those genes solely overexpressed within the neoplastic tissues (3, 5). All approaches are valid, and knowledge of the gene(s) that serve as markers of a particular cell type within the expression profiling data generated
Cancer Gene Profiling in Pancreatic Cancer
281
Fig. 1. Histopathology of representative pancreatic samples. (a) Pancreatic ductal adenocarcinoma. The neoplastic glands show marked cytologic and nuclear atypia and are extensive desmoplasia. (b) Healthy pancreas. The healthy pancreas is predominantly comprised of pyramidal and basophilic acinar epithelial cells, with scattered islets of Langerhans, and normal pancreatic ducts lined by a low cuboidal epithelium also seen. (c) Advanced chronic pancreatitis. The pancreas shows a chronic lymphocytic infiltrate in association with atrophy of pancreatic acini and lobular fibrosis. (d) Pancreatic cancer cell line. Unlike the pancreatic cancer tissue shown in (a), cell lines represent a pure population of neoplastic epithelial cells seen as cells with a high nuclear/cytoplasm ratio and marked atypia. No contaminating normal cells or fibrosis are present.
can help in interpretation. Table 1 indicates those genes that are reproducible markers of the different normal and neoplastic cell populations analyzed in expression profiling. The second common perception regarding the use of pancreatic tissues is that they contain a large amount of endogenous RNases, which can potentially interfere with mRNA extraction methods preceding gene expression studies. RNases are a major secretory product of normal pancreatic acinar cells. However, there is commonly a significant loss of acinar cells within infiltrating pancreatic cancers due to atrophy or destruction of the gland by the neoplasm, thus facilitating the study of mRNA expression patterns within these otherwise stromal rich tissues. Thus, with careful technique, adequate amounts of mRNA can be extracted from quickly frozen surgically resected samples (5).
282
Vilardell and Iacobuzio-Donahue
Table 1 Markers of normal and neoplastic cell populations in expression profiling data Normal cellular function
Predominant cell population represented
Muc4
Apomucin, epithelial protection
Neoplastic duct epithelium
Claudin 4
Component of epithelial Neoplastic duct epithelium tight junctions
Fascin
Cytoskeletal protein, cellular motility
Neoplastic duct epithelium
Mesothelin
GPI-anchored protein
Neoplastic duct epithelium
PSCA
GPI-anchored protein
Neoplastic duct epithelium
Trypsin
Serine protease
Normal acinar epithelium
Chymotrypsin
Serine protease
Normal acinar epithelium
CA19–9
Tetrasaccharide carbohydrate, role in cell–cell recognition
Neoplastic duct epithelium
DUPAN-2
Tetrasaccharide carbohydrate, role in cell–cell recognition
Normal and neoplastic duct epithelium
Hsp47
Collagen-specific chaperone
Desmoplastic stroma
Name
Apolipoprotein C-1 Secreted lipid carrier
Desmoplastic stroma
Secreted lipid carrier
Desmoplastic stroma
Apolipoprotein D
Thrombospondin-1 Extracellular matrix (ECM) component
Neoplastic duct epithelium Desmoplastic stroma
MMP2
Matrix remodeling
Desmoplastic stroma
MMP11
Matrix remodeling
Desmoplastic stroma
2. Materials (Before beginning see Note 1) 2.1. For RNA Extraction from Bulk Tissues and Determination of RNA Integrity
1. Frozen specimens of pancreatic adenocarcinoma. 2. RNase-free iron plate. 3. Suitably sized vessels for tissue homogenization (e.g., 1.5-ml tubes).
Cancer Gene Profiling in Pancreatic Cancer
283
4. Containers with dry ice and wet ice. 5. Sterile surgical blades. 6. Scientific precision balance. 7. Polytron in a cold room. 8. Microcentrifuge in a cold room. 9. QIAGEN RNA extraction kit (depending on the starting amount of tissue, the suitable kit may range from Mini Kit to Maxi Kit) or PicoPure RNA Isolation Kit (Catalog #KIT0204, suitable for large amounts of tissue). 10. Diethylpyrocarbonate (DEPC)-treated distilled water. 11. Agarose-LE (Catalog #AM9040). 12. 10× Gel Prep/Running #016R057898A).
Buffer
(AMBION
Catalog
13. Power supply. 14. Electrophoresis chamber, gel casting tray, and sample combs. 15. Ethidium bromide, a fluorescent dye used for staining nucleic acids (mutagenic). 16. Glyoxal Load Dye (AM #8551), which contains bromophenol blue and also ethidium bromide. 17. Transilluminator (ultraviolet light box) or a more modern molecular imager such as a Gel Doc XR System (BioRad Laboratories), which will be used to visualize ethidium bromide-stained nucleic acids in gels. 2.2. For Sectioning of Frozen Tissue Samples (See Note 2)
1. Frozen samples of normal pancreas and pancreatic adenocarcinoma, preferably snap frozen. 2. Sterile surgical blades and forceps. 3. Container with dry ice. 4. Cryostat with disposable microtome blades. 5. Tissue-Tek® OCT compound (VWR Catalog #25608-930). 6. 100% ethanol for cleaning the knife holder and antiroll plate in the cryostat. 7. Hematoxylin and eosin for routine histologic staining. 8. Polystyrene distyrene 80, dibutyl phthalate plasticizer, and xylene (DPX) mounting media. 9. Glass slides.
2.3. For RNA Extraction from Pancreatic Microdissected Tissue
1. Microscope with laser capture microdissection (LCM) system or with P.A.L.M. laser pressure catapulting system (Zeiss/ P.A.L.M. Laser Technologies). 2. If the LCM system is going to be used, the HistoGene™ LCM Frozen Section Staining Kit is recommended for preparing and staining tissues, preserving intact nucleic acids from the captured cell populations.
284
Vilardell and Iacobuzio-Donahue
3. Slides covered with membrane of polyethylene naphthalate (PEN-membrane slides, by P.A.L.M. Laser Technologies) if the P.A.L.M. system is going to be used. These PEN-membrane slides can be acquired DNase and RNase free. Otherwise, RNase can be inactivated by heating at 180°C for 4 h. 4. Tightly sealed containers for freezing and rethawing the slides, such as a 50-ml Falcon tube or a microslide box. 5. A small desiccator or slide box containing desiccant (anhydrous calcium sulfate and cobaltous chloride) to transport the mounted slides from one location to another (VWR Scientific Products Catalog #22890-900). 6. Ice-cold 70% ethanol. 7. Ice-cold RNase-free water. 8. Histological staining such as hematoxylin and eosin, Methyl Green, Cresyl Violet, or Nuclear Fast Red. 9. PALM AdhesiveCaps or usual RNase-free plastic tubes of 0.5-ml size for collecting catapulted samples. 10. Lysis buffer, e.g., QIAGEN RLT buffer, and RNeasy® Micro Kit (QIAGEN, #74004), for RNA extraction from P.A.L.M. microdissected tissue. 11. PicoPure RNA Isolation Kit (Arcturus, Catalog #KIT0202), or RNAqueous®-Micro Kit Catalog #AM1931 for RNA extraction from LCM microdissected cells. 2.4. For RNA Extraction from Cultured Cells
1. Cultured cells, 80–90% confluent in 75-cm2 tissue culture flasks. 2. Phosphate-buffered saline (PBS): 8.0 mM Na2HPO4·2H2O (1.44 g/l), 1.5 mM KH2PO4 (0.30 g/l), 2.7 mM KCl (0.20 g/l), 0.137 mM NaCl (8.00 g/l), adjust pH to 7.8 with K2HPO4. Store at room temperature and cool before using. 3. RNeasy® Mini, Midi, or Maxi RNA isolation kits (QIAGEN). 4. RNase-free DNAse set (QIAGEN). 5. 100% ethanol. 6. DEPC-treated water. 7. Cycloheximide (optional).
3. Methods 3.1. Checking RNA Integrity
Checking the integrity of the RNA of bulk tissue samples is recommended before beginning any other more sophisticated
Cancer Gene Profiling in Pancreatic Cancer
285
procedure such as laser microdissection. First, identification of samples with poor RNA quality at this stage will avoid wasting valuable time and reagents for microdissection. Second, the finding of poor RNA quality after laser microdissection of bulk samples that were first deemed satisfactory may indicate RNA degradation during the microdissection procedure. Extractions for checking RNA integrity may easily be performed by means of a QIAGEN RNeasy® Mini Kit from snap-frozen bulk tissue as follows: 1. Use RNase-free pincers and a sterile scalpel blade for cutting the specimen on an RNase-free iron plate (e.g., 15 × 10 cm) placed in a container with dry ice. 2. Place the container with dry ice near a balance. To weigh the portion of tissue, the weight of the 1.5-ml tube must be reduced by the tare of an empty tube. Determine the amount of tissue. Do not use more than 30 mg. All of the remaining steps have to be performed in wet ice. 3. Place the fragment of tissue into a clean 1.5-ml tube and add 594 ml of lysis buffer RLT plus 6 ml of b-mercaptoethanol. 4. Keeping the tube with the sample and the lysis buffer in a beaker with wet ice, proceed to disrupt the tissue by means of a Polytron placed in a cold room. Homogenize the sample three times at full speed for 30 s each, waiting 15 s between each homogenization. To avoid heating of the sample, an alternative is to perform more homogenizations but at shorter lengths. Allow the homogenized solution to stand for 5 min. 5. Proceed with the rest of the RNeasy® Mini Handbook protocol. 6. Check the RNA quality of the eluted RNA by gel electrophoresis. This may be performed using an Agilent BioAnalyzer, or by routine agarose gel electrophoresis described below. (a) Before beginning, spray or wipe the surfaces of the glassware and electrophoresis equipment to be used with RNaseZap to removing RNases. Rinse twice with DEPCtreated water. (b) Place the gel tray containing combs in the electrophoresis chamber. The combs must be placed closest to the cathode (negative/black) lead. (c) From the stock of 10× Gel Prep/Running Buffer, prepare a 1× dilution, e.g., taking 100 ml of 10× Running Buffer and adding up to 1,000 ml of DEPC-treated water. (d) Prepare a 1% agarose gel, melting 1 g of agarose in 100 ml of 1× Running Buffer for every 100 ml of gel needed. Heat in a microwave oven until the agarose is in complete solution.
286
Vilardell and Iacobuzio-Donahue
(e) Add 9 ml of an ethidium bromide solution at 10 mg/ml to the melted agarose. Ethidium bromide is known to be mutagenic and should be handled carefully. Let the solution cool to approximately 60°C. (f) Up to 30 mg of RNA can be loaded in each well. Usually we load 20 ml of prepared sample in each well, comprising 10 ml of sample RNA and an equal volume of Glyoxal Load Dye. For larger sample volumes, less Glyoxal Load Dye can be used, but never use less than one half volume. Incubate the samples in a heating block at 50°C for 30 min (60 min if less than one volume of Load Dye was used). (g) After incubation, briefly spin the samples. If they will not be loaded into the gel immediately, place them in ice. The samples can also be stored at −20°C at this stage for several days. (h) Pour the gel into the casting tray to approximately 6 mm in thickness and pop the bubbles with an RNase-free cover slide or pipet tip. Let the gel solidify. (i) Fill the electrophoresis chamber with 1× Running Buffer dilution up to covering the gel with approximately 1 cm of Running Buffer. Remove the combs. (j) Load the samples into the gel by mean RNase-free pipet tips, placing the tip inside the top of the well. Be careful not to trap air at the end of the tip when picking up every sample. (k) Run the gel at 5 V/cm of distance between electrodes. RNA and bromophenol blue will migrate toward the anode (positive/red) electrode. Free ethidium will migrate in the opposite direction of the RNA, running off the top of the gel. Run the gel until the bromophenol blue front migrates almost to the bottom of the gel. (l) The gel can be viewed and photographed under UV light on a transilluminator or in a Gel Doc. Place plastic wrap beneath the gel to avoid contaminating RNases from the transilluminator surface. (m) The 28S and 18S ribosomal RNA (rRNA) bands should be clearly visible in the intact RNA sample, and the 28S rRNA band should be approximately twice as intense as the 18S rRNA band. This 2:1 ratio (28S:18S) is a good indication of RNA quality. Partially degraded RNA will have a smeared appearance, will lack the sharp rRNA bands, or will not exhibit a 2:1 ratio. Completely degraded RNA will appear as a very low molecular weight smear (Fig. 2) (see Note 3).
Cancer Gene Profiling in Pancreatic Cancer
287
Fig. 2. Analysis of RNA quality. Extracted RNA was run on an Agilent 2100 Bioanalyzer (Agilent, Palo Alto, CA). From left to right, the gel images for the RNA ladder, six samples of RNA extracted from pancreatic cancer tissues, and one sample of RNA extracted from a cell line are shown. RNA from tumor samples 1–4 and the cell line show a 2:1 ratio abundance of the 28S to the 18S ribosomal RNA and are acceptable for use in expression profiling. By contrast, tumor samples 5 and 6 show significant degradation.
3.2. Sample Selection 3.2.1. From Fresh Tissue
1. Embed the specimen in embedding media, usually a viscous compound called OCT (Tissue-Tek®), to allow histologic sectioning (see Note 6). Place an empty, labeled cryomold on dry ice for 1 min. It should remain on dry ice during the entire embedding procedure. 2. Cover the bottom of the cryomold with embedding medium, approximately 1–2 mm deep. Place the tissue for freezing against the bottom of the cryomold in the medium. To facilitate cutting, the tissue should be relatively small (1 cm in maximum dimension) and the desired cutting surface should be faced against the bottom. 3. Fill the cryomold containing the base of embedding medium and tissue with more embedding medium. Cover the dry ice
288
Vilardell and Iacobuzio-Donahue
container and allow the embedding medium to harden (it will turn from translucent to white when frozen). The blocks of frozen tissue can be stored at −80°C until needed. 4. Within the cryochamber of the cryostat, remove the block of tissue encased within embedding media from the cryomold and attach it to the specimen holder disk of the cryostat with additional embedding medium. 5. Allow the block to equilibrate to the cryochamber temperature (−20°C) for at least 15 min. 3.2.2. From Previously Snap-Frozen Tissue
Tissue frozen in liquid nitrogen usually yields higher quality RNA than tissue frozen in dry ice. 1. Remove the tube containing the sample from the −80°C freezer and place in dry ice for transferring to the cutting room. 2. Cool a cryomold in dry ice, add OCT embedding media until approximately two thirds full, and leave the OCT to become viscous but not hard. Take the frozen specimen with clean forceps and dip in the OCT, pressing the tissue down against the bottom of the cryomold. 3. Cover the specimen completely by adding more OCT, and freeze the tissue completely by keeping the cryomold on dry ice for 5 min. Sectioning the tissue can now be performed, or the frozen OCT tissue block can be stored at −80°C until ready for use.
3.3. RNA Extraction from Microdissected Tissue
Tissue microdissection performed by either a laser capture microdissection (LCM) system or a P.A.L.M. laser pressure catapulting system (Zeiss/P.A.L.M. Laser Technologies) is explained in detail in its specific chapter (also see Notes 4, 5 and 7).
3.4. RNA Extraction from Cultured Cells
1. Remove the medium from cultured cells that are 80–90% confluent in a 75-cm2 flask. Wash the cells carefully with sterile PBS at 4°C (see Note 8). 2. Add 5 ml PBS. Put the flask on ice and collect the cells by scraping with a sterile cell scraper. Transfer the cells to a new 12-ml tube (white cap). Add 5 ml PBS to the original flask and continue scraping. Collect all cells. 3. Microcentrifuge for 10 min at 1,500 rpm (150 g) at room temperature. 4. Carefully remove the supernatant by inverting the tube. Optional: wash the pellet first with cold sterile PBS. 5. Put the tube on wet ice. 6. Proceed with the suitable RNeasy® Micro, Midi, or Maxi Kit (QIAGEN) according to the instructions for RNA isolation from cell cultures.
Cancer Gene Profiling in Pancreatic Cancer
289
4. Notes 1. A special area for working with RNA is critical. The surface of the workspace should be cleaned of RNases with special cleaning solutions, e.g., RNaseZap (AMBION, #9780), or RNase AWAY ® (Molecular BioProducts). Gloves should be frequently changed and a mask should be worn. All plastic tubes and filtered tips should be sterile. All of the glass reservoirs and tools should be treated with diethylpyrocarbonate (DEPC) and baked at 180°C several hours before using. In addition, all solutions should be prepared in advance to contain 0.1% DEPC and be kept on ice until use. 2. A major reason for preparing frozen sections is not for RNA extraction but to confirm the histology (healthy, pancreatitis, cancer) when using bulk tissues. 3. Extracted RNA should be resuspended in RNA elution buffer or ethanol, but not in DEPC-treated water. 4. Before microdissecting, make sure to confirm the sample histology. For this purpose, proceed with cutting a section at 5 mm onto a standard glass slide and quickly perform hematoxylin and eosin (H&E) staining. (a) Dip the previously fixed slides five to six times in RNasefree deionized water. (b) Stain for 1 min in Mayer’s hematoxylin solution. (c) Rinse for 1 min in DEPC-treated water. (d) Stain for 10 s in Eosin. (e) Quickly dehydrate in 70% ethanol, then 96% ethanol, and then 100% ethanol, for 15 s each. Make sure to continuously dip the slide during dehydration. (f ) Mount the H&E-stained section with DPX and check that the histology of the tissue corresponds to the expected histology. If not, reject the current OCT block and choose another one. Repeat the process with the new OCTembedded specimen. (g) If the new sample shows the expected histology, go ahead with the procedure, making 8-mm-thick sections of the same mounted specimen. Mount the sections at the center of a room-temperature LCM microslide or a precooled P.A.L.M. PEN-membrane slide. 5. H&E staining is also frequently used for laser microdissection. If hematoxylin staining is chosen, the duration of staining with hematoxylin should be minimized, and the eosin can be avoided entirely if a good visualization of the cytoplasm is not required
290
Vilardell and Iacobuzio-Donahue
during the microdissection. Other stains, such as Cresyl Violet, Methyl Green, or Nuclear Fast Red can be regarded as alternatives. For tissues particularly rich in RNases such as normal pancreatic tissue, a short staining procedure such the Cresyl Violet method is recommended as follows:
(a) Dissolve solid cresyl violet acetate at a concentration of 1% (w/v) in 100% EtOH at room temperature by stirring for several hours or even overnight. Filter the solution before use. (b) Dip the previously fixed slides for 1 min in the 1% cresyl violet acetate solution. (c) Remove the excess of staining on absorbent surface. (d) Dip into 70% ethanol. (e) Dip into 100% ethanol.
(f) Air-dry on a Kimwipe for 1–2 min. Methyl Green is a good staining procedure because it is very fast and therefore helpful to preserve RNA quality. (a) Dip the previously fixed slides five to six times in RNasefree deionized water. (b) Dip the slides in Methyl Green solution (DAKO, #S1962) for 1 min. (c) Rinse for 30 s in a jar with DEPC-treated water. (d) Rinse for 30 s in a second jar with DEPC-treated water. (e) Air-dry on a Kimwipe for 1–2 min. The lack of dehydration steps in the Methyl Green staining method may cause a poor morphologic appearance of the tissue sections. This may be improved by first mounting the slide on the P.A.L.M. microscope, then adding 5 ml of 70% or 100% ethanol onto the area of interest with an RNase-free pipet tip. When using 100% ethanol, some destaining of the tissue may occur, but the tissue dries much more quickly than when using 70% ethanol. 6. Some helpful features to distinguish a well-differentiated ductal adenocarcinoma from trapped and reactive glands in chronic pancreatitis with atrophy and fibrosis are provided in Table 2. 7. Extract RNA from microdissected samples according to the PicoPure RNA Isolation Kit (Arcturus, Catalog #KIT0202) procedures for LCM-obtained samples or by means of RNeasy® Micro Kit (QIAGEN, #74004) for P.A.L.M. microdissected samples. Both procedures involve passing cell extracts through affinity spin columns, which bind and immobilize the RNA. The bound RNA is then washed and total RNA can be eluted to a final volume of 10 ml. If the RNeasy® Micro Kit is used, adding 50% ethanol to the cleared lysate containing buffer RLT instead of 70% ethanol can increase the RNA yield.
Cancer Gene Profiling in Pancreatic Cancer
291
Table 2 Helpful features to distinguish ductal adenocarcinoma from chronic pancreatitis Carcinoma
Pancreatitis
Glandular architecture
Haphazard
Lobular
Variation in nuclear size
Variable (4:1 or more)
Uniform
Nucleoli
Huge irregular nucleoli
Single, regular
Luminal necrosis
May be present
Absent
Incomplete glands
May be present
Absent
Perineural invasion
May be present
Absent
Vascular invasion
May be present
Absent
Glands immediately adjacent to muscular artery
May be present
Absent
8. Optional use of cycloheximide in RNA extractions from cell lines. Cycloheximide is a protein translation inhibitor and may have a positive effect on the level of mutant transcripts present. If desired, in a fume hood, prepare a 100 mg/ml solution of cycloheximide. Incubate the cultured cells in the presence of 100 mg/ml of cycloheximide for 4–8 h prior to harvesting.
References 1. Buchholz, M., M. Braun, A. Heidenblut, H. A. Kestler, G. Kloppel, W. Schmiegel, S. A. Hahn, J. Luttges, and T. M. Gress. (2005). Transcriptome analysis of microdissected pancreatic intraepithelial neoplastic lesions. Oncogene 24, 6626–36. 2. Crnogorac-Jurcevic, T., E. Efthimiou, T. Nielsen, J. Loader, B. Terris, G. Stamp, A. Baron, A. Scarpa, and N. R. Lemoine. (2002). Expression profiling of microdissected pancreatic adenocarcinomas. Oncogene 21, 4587–94. 3. Friess, H., J. Ding, J. Kleeff, L. Fenkell, J. A. Rosinski, A. Guweidhi, J. F. Reidhaar-Olson, M. Korc, J. Hammer, and M. W. Buchler. (2003). Microarray-based identification of differentially expressed growth- and metastasis-associated genes in pancreatic cancer. Cell Mol Life Sci 60, 1180–99. 4. Grutzmann, R., C. Pilarsky, O. Ammerpohl, J. Luttges, A. Bohme, B. Sipos, M. Foerder, I. Alldinger, B. Jahnke, H. K. Schackert, H. Kalthoff, B. Kremer, G. Kloppel, and H. D. Saeger. (2004). Gene expression profiling of
microdissected pancreatic ductal carcinomas using high-density DNA microarrays. Neoplasia 6, 611–22. 5. Iacobuzio-Donahue, C. A., R. Ashfaq, A. Maitra, N. V. Adsay, G. L. Shen-Ong, K. Berg, M. A. Hollingsworth, J. L. Cameron, C. J. Yeo, S. E. Kern, M. Goggins, and R. H. Hruban. (2003). Highly expressed genes in pancreatic ductal adenocarcinomas: a comprehensive characteri zation and comparison of the transcription profiles obtained from three major technologies. Cancer Res 63, 8614–22. 6. Iacobuzio-Donahue, C. A., B. Ryu, R. H. Hruban, and S. E. Kern. (2002). Exploring the host desmoplastic response to pancreatic carcinoma: gene expression of stromal and neoplastic cells at the site of primary invasion. Am J Pathol 160, 91–9. 7. Kim, H. N., D. W. Choi, K. T. Lee, J. K. Lee, J. S. Heo, S. H. Choi, S. W. Paik, J. C. Rhee, and A. W. Lowe. (2007). Gene expression profiling in lymph node-positive and lymph node-negative pancreatic cancer. Pancreas 34, 325–34.
292
Vilardell and Iacobuzio-Donahue
8. Logsdon, C. D., D. M. Simeone, C. Binkley, T. Arumugam, J. K. Greenson, T. J. Giordano, D. E. Misek, R. Kuick, and S. Hanash. (2003). Molecular profiling of pancreatic adenocarcinoma and chronic pancreatitis identifies multiple genes differentially regulated in pancreatic cancer. Cancer Res 63, 2649–57. 9. Ryu, B., J. Jones, N. J. Blades, G. Parmigiani, M. A. Hollingsworth, R. H. Hruban, and S. E.
Kern. (2002). Relationships and differentially expressed genes among pancreatic cancers examined by large-scale serial analysis of gene expression. Cancer Res 62, 819–26. 10. Ryu, B., J. Jones, M. A. Hollingsworth, R. H. Hruban, and S. E. Kern. (2001). Invasionspecific genes in malignancy: serial analysis of gene expression comparisons of primary and passaged cancers. Cancer Res 61, 1833–8l.
Chapter 15 Cancer Gene Profiling in Prostate Cancer Adam Foye and Phillip G. Febbo Summary Gene profiling and expression analysis using microarrays have made a significant impact on our biological understanding of prostate cancer. The procedures for generating high-quality expression data from prostate cancer cell lines and tumors are not trivial. However, during the past 9 years, methods by which to process samples for gene profiling have been developed. In this chapter, techniques to process prostate cancer specimens either en bloc (macrodissection) or using laser capture microdissection are presented in detail along with extensive technical notes. Although we focus on prostate cancer and discuss the specific methods utilized in our lab, the processes discussed are generalizable to other tumors and amenable to the substitution of alternative instruments and/or commercially available kits. Key words: Prostate cancer, Expression analysis, Laser capture microdissection, En bloc analysis
1. Introduction Prostate cancer remains the most common nondermatological cancer in men in the United States and the second leading cause of death in men (1). Understanding the molecular pathogenesis of prostate cancer has great potential to improve disease control and cure rates for men diagnosed with this disease. Expression analysis has been applied to prostate cancer cell lines, xenografts, and human tumors and has improved our biological understanding of prostate cancer and is contributing to the improved clinical management of patients. Expression analysis has resulted in perhaps the most profound discovery in prostate cancer biology this decade; the identification of chromosomal translocations involving the ETS transcription factors in prostate cancer. Using a novel approach to gene Robert Grützmann and Christian Pilarsky (eds.), Cancer Gene Profiling: Methods and Protocols, Methods in Molecular Biology, vol. 576, DOI 10.1007/978-1-59745-545-9_15, © Humana Press, a part of Springer Science + Business Media, LLC 2010
293
294
Foye and Febbo
expression called cancer outlier profile analysis (COPA), a group of investigators identified aberrant expression of ERG (21q22.3) and ETV1 (7p21.2) in a subset of prostate cancers compared with other solid tumors. These genes were subsequently found to be genetically translocated downstream of TMPRSS2, an androgenregulated protease (2). Multiple groups have now confirmed the frequent presence of these translocations in early prostate cancer and this discovery has had profound implications on the direction of prostate cancer research. A second biological discovery with a major impact on our understanding of prostate cancer resulting from microarray analysis was the finding that increased androgen receptor (AR) RNA expression is the single most consistent RNA change associated with castration-refractory growth and that increased AR expression was both sufficient and necessary for castrationresistant growth of the xenografts (3). The finding of continued AR signaling despite castrate levels of testosterone was subsequently demonstrated in human samples of castration-resistant prostate (4). Interestingly, even with increased expression of the AR gene and androgen-metabolizing enzymes, the transcriptional activity of AR still decreases in castration-resistant prostate cancer (5, 6). This suggests that while the levels of AR activity likely decrease in hormone-refractory prostate cancer, the energy allotted to maintain some AR signaling increases, underscoring the importance of maintaining at least some AR activity. These observations have reinvigorated investigations on how to further inhibit the AR in advanced prostate cancer so as to improve the duration and quality of life for men with advanced prostate cancer. Finally, multiple groups using microarrays have demonstrated significant differential gene expression between malignant and benign prostate cancer specimens (7–12). Of the genes found to be differentially expressed by most groups, AMACR has been further studied and validated in multiple independent studies using immunohistochemistry (9, 13) and now serves as a clinically deployed biomarker to help differentiate between tumor and benign prostate tissues. Although these examples represent only a fraction of the body of work where microarray analysis of prostate cancer has provided biological or clinical insight, they demonstrate the central role global transcriptional analysis of prostate cancer cell lines, xenografts, and tumors has played and portent the continued importance of expression analysis as an investigational method with which to interrogate prostate cancer. In this chapter, we provide the materials and methods required for expression analysis of prostate cancer. In addition, the methods are annotated with procedural insight and helpful hints to facilitate the adoption and deployment of this technique. This chapter is focused on
Cancer Gene Profiling in Prostate Cancer
295
reparing samples for oligonucleotide arrays such as those availp able from Affymetrix, but they are readily adapted for use if the use of spotted dual hybridization arrays (i.e., Agilent) or beadbased assays (i.e., Illumina) are preferred.
2. Materials 2.1. Fresh-Frozen Tissue Block Preparation, Storage, and Microdissection
1. Cryostat (Leica 1800, Leica CE knife holder for disposable low-profile blades) with heat extractor. 2. Sakura® Accu-Edge® low-profile disposable microtome blades. 3. Tissue pallets for cryostat, minimum of 5. 4. Sakura® Tissue-Tek® Optimal Cutting Temperature (OCT) compound, embedding medium. 5. Sakura® Tissue-Tek® Mega-Cassette® plastic tissue storage cartridges. 6. Liquid nitrogen-based storage system. 7. Small syringe needle (for tissue manipulation, ~18 g). 8. Straight-edged razor blades.
2.2. Slide Staining and Dehydration
1. 70% ethanol (95% ethanol and nuclease-free water). 2. 95% ethanol. 3. 100% ethanol stored with molecular sieves (Sigma-Aldrich, cat. no. 69839-250G). 4. Gill #2 formulation hematoxylin. 5. Alcoholic eosin-Y. 6. Xylene. 7. Nuclease-free water. 8. 11 Coplin staining jars, 55 mL. 9. Compressed air or nitrogen gas line.
2.3. Laser Capture Microdissection
1. Arcturus (Molecular Devices) Veritas LCM Instrument. 2. CapSure® Macro LCM Caps or HS LCM Caps (Molecular Devices, cat. no. LCM0212 or LCM0214). 3. GeneAmp thin-walled reaction tubes, 0.5 mL.
2.4. Frozen Tissue Macrodissection
1. Thermo Savant FastPrep® FP120 Homogenizer. 2. MP Biomedicals Lysing Matrix A. 3. Straight-edged razor blades.
296
Foye and Febbo
4. Cryostat (Leica 1800, Sakura® Tissue-Tek® disposable microtome blades) with heat extractor. 2.5. RNA Isolation
1. Stratagene® Absolutely RNA® Nanoprep Kit. 2. 70% ethanol (see Subheading 2.2). 3. Ambion® RNase Zap®. 4. Heating blocks preset to 37°C and 60°C. 5. Ambion® mirVana™ miRNA Isolation. 6. Heating block preset to 95°C. 7. 100% ethanol (see Subheading 2.2).
2.6. RNA Quality Verification
1. Agilent 2100 Bioanalyzer. 2. Agilent RNA 6000 Pico Kit. 3. Molecular biology-grade water, nuclease free (Cellgro, cat. no. 46-000-Cl). 4. Turner Biosystems TBS-380 mini-fluorometer. 5. Molecular Probes Inc. RiboGreen® RNA Quantification Kit. 6. Wheaton 200-mL round glass cuvettes, 5 × 31 mm.
2.7. RNA Amplification and Biotin Labeling
1. NuGEN Ovation™ Biotin RNA Amplification and Labeling System. 2. Zymo Research DNA Clean & Concentrator™-25 purification kit. 3. Qiagen DyeEx 2.0 Spin kit. 4. 80% ethanol (room temperature, prepared with 95% ethanol and nuclease-free water, see Subheading 2.2). 5. NanoDrop® ND-1000 spectrometer.
3. Methods 3.1. Fresh-Frozen Tissue Block Preparation and Storage
1. Set cryostat temperature to −20°C. Position the heat extractor for perpendicular movement to the pallet holding platform. 2. Prepare several tissue collection pallets by applying OCT compound directly to the cooled aluminum cryostat pallets. 3. Once the OCT compound has become partially opaque (the circles on the surface of the pallet will disappear, see Note 1), use the heat extractor cylinder to put a flat surface on the top of the OCT pallet. 4. Separate the OCT compound and pallet from the heat extractor using a sharp straight-edged razor blade.
Cancer Gene Profiling in Prostate Cancer
297
5. Repeat steps 1–3 for each sample pallet necessary to collect the tissue sample. 6. Place the pallets in either a pallet holder surrounded with dry ice or simply place the pallets on dry ice (see Note 1). Keeping the pallets cold is essential to freezing the fresh OCT compound. Bring OCT compound, gloves, and a small syringe needle to the collection. 7. At the collection, place tissue samples directly onto the pallets of OCT. The tissue will freeze on contact so align the tissue prior to making contact. 8. Surround the tissue with OCT compound, forming a perimeter around the fresh tissue. Slowly add more OCT over the top, making sure that the previous layer does not solidify (become opaque) before adding more. Allowing the layers to solidify can cause block fragmenting during sectioning. 9. Cover the tissue specimen completely with OCT and wait for the compound to solidify. If the tissue specimen is too large to cover on the pallet, cover as much as possible and recoat the specimen on the cryostat. 10. At the cryostat, separate the original OCT layer on the pallet from the layer surrounding the frozen tissue using a razor blade. The layers should separate easily. 11. Invert the tissue block so that the exposed tissue is facing up. Place just enough OCT compound over the tissue to cover the top of the block once the heat spreader is lowered. 12. Lower the heat spreader to flatten the surface and freeze the OCT. Separate the OCT from the extractor with a razor blade as before. 13. The frozen tissue should now be completely surrounded with OCT. Place the tissue block into a labeled Mega Cassette® cartridge and store in a freezer box (9 × 9 cardboard grid with every other vertical divider removed) in vapor phase liquid nitrogen. Storage at −80°C is considered adequate for short-term storage (6,000 ms) to maintain melting and cap adhesion reliability. Ablation is also a key consideration with RNA work because it requires extra time and introduces UV damage to tissue neighboring ablated cells. A balance must be achieved between eliminating unwanted tissue and minimizing time. If more than a minute of ablation is necessary to clean up a cap, there may be other methods of prepping tissue that are more efficient. Some of these methods are mentioned later in this section.
Cancer Gene Profiling in Prostate Cancer
317
9. Preparing tissue for LCM For most tissue types, preparation beyond dehydration during staining is not necessary. Folded tissue can provide a challenge for cap placement, but can be dealt with simply by using a straightedged razor blade to cut and scrape the raised part of the section off of the slide. Tissue sections involving bone can be particularly challenging for LCM. Bone fragments often create large discrepancies in tissue depth and cells of interest may have to be targeted in different layers. LCM is primarily a two-dimensional process once the cap has been lowered, so, before loading the slide, steps should be taken to ensure that your targeted cells are in one accessible plane. The following methods can be used to clear a tissue section of bone fragments and other unwanted, raised tissue: Macrodissection: Using a sharp straight-edged razor blade or a scalpel (and magnifying glass, if necessary), carefully cut off and scrape away sections of dense bone that rise above the 7-mm (tissue section depth) tissue height. Also cut away any pieces of tissue that are detached from the slide because these can hinder cap melting performance or become stuck to the cap when LCM is completed and the cap is raised off of the tissue. Adhesive Paper: Bone fragments and other loose pieces of tissue can be removed with a sticky piece of paper such as the bottom of Post-It® notes. This is a sensitive process because a very small amount of adhesion is needed to pull up tissue. With a gloved hand, tap the adhesive section of the paper repeatedly to decrease the adhesiveness. Little or no pressure is required to stick loose tissue to the paper, so start as lightly as possible. It is very easy to accidentally remove the majority of the section with this method; however, it is fast and effective at removing bone. If this method is removing too much tissue, try the cap clearing method below. Cap Clearing Method: Another method of clearing unwanted tissue from a slide is to use an extra macro cap. When using this method, be sure to have extra caps loaded for each session. Complete the LCM method through step 7. Instead of using the cap to target cells of interest, use it to adhere to loose tissue fragments and to pull up raised regions of tissue. It is very common for bone fragments to stick to the cap membrane even if those areas are not targeted with the IR laser because bone does not adhere to the glass slide surface as well as other tissues. By using a clearing cap, most fragments that would contaminate the actual sample cap will be removed, resulting in a more pure sample (Fig. 1). Starting from step 7, set up the LCM laser using a higher power than normal (e.g., 85–100 mW). Given the rough surface of tissue prior to clearing, membrane melting performance can be inconsistent and higher laser power will increase the chance of sticking to fragments. Use either the line tool or spot tool to quickly mark areas for removal. Regions do not need to be
318
Foye and Febbo
covered, just tagged with LCM spots. When tagging is complete, simply drag the cap to an offload bay, return to the tissue, and resume the LCM protocol at step 6. 10. IR laser settings The parameters for adjusting the IR laser will more than likely have to be adjusted for every sample. It is helpful to become familiar with trends in adjusting particularly laser power and pulse duration. This is best done through practice rather than reading a protocol, but the following notes address some basic trends. Laser Power: The default setting for laser power is 70 mW. For a well-prepared, dry, and flat sample, this is too high. An ideal laser spot will be a clearly defined black circle with a very clear center region. The thick border represents the pour formed in the melted membrane. If the spot appears blurry in the center, the pour did not dip low enough to hit tissue and thus power should be increased. If a black dot appears in the center of the melted spot, this represents overmelted membrane, either from splashing back toward the cap or from prolonged melting in the center region (Fig. 8). If this appears, lower the laser power. With inconsistent melting on uneven tissue, it is often necessary to allow some overmelting to achieve any melting at all in other regions of tissue. Overmelted membrane spots often still capture fine and the tissue will not be adversely affected.
Fig. 8. The top image is a comparison of varying pulse duration with constant power (single 70-mW shot). The spot diameter increases consistently with increased pulse duration up to approximately 4,500 ms, with diminishing returns thereafter. The lower image is a comparison of varying power settings with constant pulse duration (single 2,500-ms shot). Overmelting is seen at 100 mW, while the power was not enough to completely melt the membrane at 40 mW.
Cancer Gene Profiling in Prostate Cancer
319
Pulse Duration: Pulse duration represents the length of time the IR laser is on during each fire. The default value is 2,500 ms and this is often a good value for a medium spot size. On dry tissue, settings of 4,000–5,500 ms can yield very large spot sizes (>60 mm), which can be excellent for larger areas of cells. Generally, the higher the pulse duration, the larger the melted spot. Laser power typically has to be lowered as pulse duration is increased to avoid an excessively long, powerful laser pulse. Short durations (70 mW) to achieve spot sizes below 7 mm (Fig. 8). Pulse: This is the number of times the laser diode will fire for a single spot on the target window. The default setting is one pulse. Using two pulses will often increase the reliability of spot size and has a negligible effect on capture time. Pulses greater than 2 will have a similar effect as increased pulse duration (increased spot size), but can increase capture time depending on pulse duration. 11. Setting spot size Setting a proper spot size is crucial for LCM accuracy, timing, and sample yield. There is a certain amount of variability in each cap membrane as well as in the surface of the tissue section that will lead to inconsistencies in spot melting. When the IR laser is test fired during setup, the spot size may change during the capturing process. For this reason, a margin of error should be used when targeting tissue and recording the spot size. When marking regions, allow a buffer between the tissue of interest and the adjacent unwanted tissue. This will account for any spot size increase during capture and prevents overlap that can contaminate your sample. For the same reason, mark the edge of the spot inside the actual edge of the melted circle when measuring spot size. This will adjust how the LCM software marks the tissue by recording a smaller than actual spot diameter. If the melted spot size decreases during the capture process, there will be less tissue lost to gaps between spots. Spot overlap can also be adjusted from the capture groups control box. Select the properties button in the bottom right corner of the control box. Select the capture properties tab. Under the glass slide column, horizontal and vertical overlap can be adjusted. A value of 40 for each results in a spot overlap that increases reliability of capture. Increasing the value on samples with a high number of melting inconsistencies could improve LCM results, but adjusting IR laser settings may prove more helpful. 12. Picking regions for LCM Region selection often determines the length of capturing time. Numerous small regions often take longer to capture than one large region, and smaller spot sizes yield higher capture times.
320
Foye and Febbo
Especially when working with RNA, capture time should be kept to a minimum to reduce RNA degradation. When selecting an area for capture, keep an eye on the time spent targeting as well as the area. Brief test experiments comparing capture area with RNA yield showed diminishing returns in RNA yield once the targeted area of malignant cells surpassed 3.0 mm2. Yields vary tremendously with tissue type, condition, and many other factors. However, we typically target roughly 2–3 mm2 of cells and find that with larger regions of metastatic disease, this area is the best balance of capture time and RNA yield. 13. Combined UV/IR laser use and alternatives to UV ablation There are generally two ways in which unwanted tissue adheres to a cap membrane. LCM performed on a glass slide is more of an adhesion and tissue tearing process than a cutting process. The melted membrane attaches to the tissue surface, but once the cap is lifted, the region of tissue is torn from original section. Depending on tissue type, a border of tissue near the edge of the targeted region can be pulled up with the cap. This issue can be addressed by using the cut and capture LCM tools instead of simply the capture tools. When using a glass slide, this will add a perimeter cut via the UV laser to separate the targeted region of tissue from surrounding tissue. If using the cut and capture tools does not solve the problem, use this method in conjunction with the “cap clearing” method above. Instead of doing both the cutting and LCM on the same cap, perform the cutting on the first cap, discard, then perform LCM using a clean cap. The unwanted tissue near the border of the UV guided perimeter should be stuck to the first cap. Another way of picking up unwanted cells happens when tissue that is not well fixed to the glass slide sticks to the cap and is lifted when the cap is removed. This can be handled using the cap clearing method mentioned above. Often in the case of poorly fixed pieces of tissue, little or no LCM on the clearing cap is necessary. Simply placing a cap down over the region of interest and lifting the cap can pull unwanted tissue off the area, however, using the IR laser to create tags on the unwanted sections will increase the effectiveness of the method. 14. Troubleshooting Getting the laser to melt a reliable spot is the key to successful LCM and thus is also the source of most problems. There is no single trick to achieving a reliable cap melt, but there are several approaches that can increase the chances of success. The following list addresses possible solutions in the order in which they should be addressed. The first several involve adjusting the laser and LCM cap. These attempt to fix problems while salvaging the tissue section and LCM session. Once problems arise, keep an eye on the time spent tinkering with LCM settings. A cutoff time will likely need to be set to ensure good RNA quality if the run
Cancer Gene Profiling in Prostate Cancer
321
ultimately works. A good amount of information can be learned from botched LCM runs, so even if you determine you will not get RNA from a given section, continue working the laser and cap adjustments so that a remedy may be found more quickly during future sessions. Increase Laser Power: Typically, a misfiring IR laser just needs more power to melt the cap down to the tissue. This should be the first adjustment made if a test fire does not melt properly. Increase Laser Pulse Duration: For the same reasons mentioned with laser power, sometimes the membrane will melt if given a longer laser pulse. Pulse duration typically adjusts spot size (longer durations yield larger spot diameters). Refocus IR Laser: If the IR laser is out of focus, the full power is not being channeled to the cap membrane. This will greatly affect cap melting. Note that the actual laser seen on the screen is in fact a 650-nm visible light laser diode that is set to mimic the IR laser and facilitate targeting and focus. The actual IR laser used for LCM is not visible. Reposition the Cap: Cap positioning has a major impact on LCM performance. With good positioning, a huge range of laser settings will yield good LCM performance. If test fires of the IR laser are not working at all, chances are quite good that the cap is either positioned over a fold in the tissue or an uneven region of the section. The cap is essentially dropped by a robotic arm over the tissue section, so any unevenness on the surface of the section (even 1-m differences) will result in varying gaps between the cap membrane and the tissue. There is no user-enabled function that can “level” a cap once it has been placed, so the only alternative is to find a flat region of tissue. The cap does not need to be centered over the targeted region of tissue. Try placing the cap off center so that the majority of the cap mass is over a flat region. Keep in mind that the section itself is a 7-mm plateau, so placing the cap over the edge can lead to inconsistent spot sizes. If you must place a cap over the edge of the section, make sure more than 50% of the surface area of the cap is on the tissue section. That will not guarantee successful cap melting, but it will increase the likelihood that the cap will stay somewhat level over the tissue. Replace the Cap: Caps can have defects that prevent them from melting. It is rare, but sometimes a membrane will separate from the cap. Other times, a cap may just refuse to melt properly. It is unlikely that a faulty cap is the cause of poor melting, but it is a possibility and should be considered if other options fail to solve the problem. Problems with Tissue Preparation: Tissue dryness is one of the most common causes of LCM problems. Any moisture in the tissue will either prevent the cap from melting properly or it can prevent the membrane from adhering to the tissue. See Note 6 for steps on further dehydration of tissue sections.
322
Foye and Febbo
5. Combining small samples for higher RNA yield 1 Some samples will have fewer than 1,000 mm2 of cells that can be captured. A higher RNA yield can be achieved by combining lysed cells from numerous LCM caps during the isolation protocol. For these samples, use a total volume of 50 mL for the lysis buffer mix and freeze separately (freeze each tube immediately after LCM). During isolation, up to 200 mL (four caps of tissue) of lysed cells can be passed through the filter cup at a time (total capacity of 400 mL, 1:1 mixture with 70% EtOH). With multiple 200 mL passes during isolation, there is technically no limit on the number of LCM caps that can be combined into one isolation tube. However, even with the smallest bone biopsy samples, our protocol has never require more than three passes. 16. Macrodissection When coarse cutting a frozen tissue block, be careful to control the tissue on either side of the razor blade. Simply cutting down into the block will often cause one or both pieces to shoot away from the blade. Place the razor between your thumb and index finger in such a way that the fingertips are in contact with the tissue. Advancing the blade between the fingers (by a rolling/ squeezing motion) will lower the razor through the tissue slowly while maintaining fingertip contact with both pieces of tissue. Be sure to minimize the time that fingers are in contact with the tissue block to avoid melting. If the block becomes soft, allow it to equilibrate to the cryostat chamber temperature before continuing any macrodissection. 17. Cell agitation During the agitation process, it is normal for the sample to become foamy. The two spins are essential to achieve proper separation of the beads from the lysed tissue suspension. After each spin, the beads will form the bottom layer and the supernatant will contain the lysed cells in buffer. 18. Kit selection The Stratagene kit provides both the ability to get RNA from very few cells and a very low elution volume requirement, which increases yield concentration for the small sample sizes associated with LCM. The Stratagene Absolutely RNA® kits do not isolate microRNAs and this should be considered if analysis of microRNA is to be included in your project. 9. Stratagene® RNA isolation notes 1 RNA Handling: Every precaution must be taken to preserve the small amount of RNA present when isolating RNA from LCM samples. Be sure every step of the isolation process is done wearing gloves that have been washed with RNaseZap or an equivalent RNase inhibitor wash. All bench-top surfaces, centrifuge rotors and buttons, and heating block surfaces should be washed
Cancer Gene Profiling in Prostate Cancer
323
with the inhibitor as well. All plasticware should be handled with gloves and stored in closed containers to avoid dust and human contact that can introduce or activate RNase enzymes. Elution: The elution step is important for determining the resulting concentration of RNA, which is critical for the amplification protocol to follow. Elution into 15 mL allows for one 10-mL aliquot for amplification and roughly 4 mL for running quantification and quality assays (RiboGreen®, Molecular Probes Inc., and Agilent RNA 6000 Pico Assay, respectively). Smaller elution volumes will create higher RNA concentrations, however, less than 10 mL is not recommended and will likely not provide enough sample to run the necessary assays in addition to amplification. 20. Filter cartridge loading The filter cartridge can only handle 700 mL of liquid per spin. If combining samples for higher RNA yield, or if the lysate/ethanol mix of one sample exceeds this volume, serial spins can be performed. Load £700 mL at a time, perform the spin, and discard the filtrate. Repeat until the entire sample has been loaded. 21. Preparing the RNA ladder Remove the ladder from the reagent kit immediately upon arrival. Transfer the contents (10 mL) to a certified nuclease-free 1.5-mL tube. Heat denature the tube for 2 min at 70°C. Immediately cool on ice. Add 90 mL of certified RNase-free water and mix thoroughly. Prepare aliquots from this stock. Three-microliter aliquots will allow enough ladder for two RNA 6000 Pico chips. Store the aliquots at −80°C. 22. Preparing filtered gel aliquots Pipet 550 mL of RNA 6000 Pico gel matrix (red tube) into one of the provided spin filters. Spin at 1,500 × g for 10 min at room temperature. Aliquot 65 mL of the filtered gel into 1.5-mL microcentrifuge tubes. Store aliquots at 4°C. One aliquot is enough gel to run two RNA 6000 Pico chips. 23. Verifying RNA quality: 18 s and 28 s peaks The electropherogram should yield three distinctive peaks for a good sample of RNA. The first peak occurs quickly (~20–25 s into the run) and represents the marker. After running for approximately 35–40 s, the 18 s peak should appear. The height of the 28 s peak that follows is the best indicator of RNA quality. Robust RNA samples from cultured cells should yield 28 s peaks that are double the height of the 18 s peak. Given the stress of the LCM process on RNA quality, 18 s and 28 s peaks of equal height indicate a higher-quality RNA sample. 28 s peaks as low as half the height of the corresponding 18 s peak can often yield good results after amplification (Fig. 5). The cutoff for a “good” sample will depend on the specific project, the quality and success
324
Foye and Febbo
of amplification, and the genomic data provided by the microarray chip. 24. Troubleshooting The RNA 6000 Pico kit is much more sensitive than the RNA 6000 Nano kit and thus errors are more likely to arise. Keeping Electrodes Clean: Thorough cleaning of the electrodes before and after each run will reduce failed runs. Limiting RNA concentration to within the range of the assay is also important for keeping clean electrodes (no more than 5 ng/mL of RNA). Higher concentrations can be run with the pico chip, but more extensive and frequent electrode cleaning will be necessary and sample dilution may a simpler alternative. Identifying a Failed Run: The entire run is dependent on the RNA ladder. This is the first sample to load and has a distinctive pattern of successive peaks. If the ladder peaks appear too late in the electropherogram or if the peaks are not present, none of the samples will run properly. It is therefore very important to use a fresh aliquot of ladder for each run and be sure to heat denature the original ladder stock. Air Bubbles in Chip Wells: Each sample must be loaded without introducing air bubbles for the electrodes to get a proper read of each sample. Pipet into the wells with the tip touching the bottom of the well, but at an angle to allow the liquid to dispense slowly. Do not take the pipet plunger into the “blow out” range, simply stop the plunger at the set volume and slowly retract the pipet tip. This is especially important for loading the gel-dye mix and the ladder because these wells affect each sample. 25. Reagent preparations for RiboGreen® Ribosomal RNA Standard: The RNA standards included in the RiboGreen® kit are 100 mg/mL each. Dilute 10 mL of the stock RNA standard in 500 mL of 1× TE buffer to create a 2.0 mg/mL standard. Create 50-mL aliquots of this standard for use with the assay and store at −80°C. Dye Reagent: The RiboGreen® dye reagent is light sensitive. Overhead lighting should be turned off for the assay and both the stock dye vial and any dilution tubes should be covered with foil to eliminate light-induced degradation. The dye also has a high freezing point relative to RNA samples diluted in water or elution buffers. The dye should be thawed wrapped in foil and placed just outside of an ice bucket to reduce thawing time. The dye can be thawed at room temperature but should be returned to ice once thawed. 26. Assay sensitivity and troubleshooting Due to the repeated dilutions and sensitivity of the dye reagent, the RiboGreen® assay is extremely sensitive to errors and deviations in timing. Time differential between adding dye and reading on the
Cancer Gene Profiling in Prostate Cancer
325
fluorometer will cause discrepancies between samples. Therefore, sample reading should be done quickly and with as much consistency as possible. Slight errors in performing dilutions can also lead to misread samples or inaccurate results. It is helpful to perform the standard readings and make a decision on assay reliability before running samples. The R2 value of the linear regression will be available before diluting RNA samples if the Excel file is created in advance. If the standards are inaccurate, create a new set of standard dilutions in row B of the plate. The quantity of 2 mg/mL RNA standard, 1:2,000 dye reagent, and 1× TE created is enough to run two sets of standards per assay as a precaution. 27. Amplification kit selection NuGEN’s Ovation amplification and labeling system offers an allin-one solution for preparing biotin-labeled cDNA for use with Affymetrix GeneChip® microarrays from small amounts RNA (5 ng total RNA or less). The kit provides a reliable, isothermal linear DNA amplification. Although most of our work has been done using the Ovation version 1 system, the WT-Ovation™ amplification system has since been released and can allow for smaller initial RNA concentrations as well as providing amplification to cDNA without a 3¢ bias. This kit must be combined with the FL-Ovation™ Biotin V2 kit to offer a complete solution from total RNA to biotin-labeled cDNA. 28. RNA quality and quantity requirements RNA should be analyzed using an Agilent 2100 Bioanalyzer for quality control prior to amplification. See 3.7.1 for more information. RNA of low quality will typically not amplify with an accurate representation of the genome, leading to skewed array results. RNA quantity is a little more flexible than quality, but running an amplification with total RNA levels lower than 5 ng is taking a risk. Sometimes less than 5 ng of RNA will amplify to enough cDNA to place on an array chip. Keeping the 5 ng threshold in mind is a good indicator of amplification success, but it is not a black and white boundary that should deter running an important yet lower concentration sample. 29. Qiagen DyeEx 2.0 purification The DyeEx columns have fragile resin blocks once spun down. These columns cannot be spun any greater than 750 × g. If possible, try spinning them at 700 × g since the resin blocks can crack even at the recommended centrifugation settings. Using a fixed angle rotor will result in an angled resin surface for sample loading, where a pivoting bucket rotor will create a flat surface. It is very important that a pipet tip does not come into contact with the resin block or it can crack or fragment. If this happens to a column, use a new one if possible.
326
Foye and Febbo
References 1. Jemal, A., R. Siegel, E. Ward, T. Murray, J. Xu, and M.J. Thun, (2007). Cancer statistics, 2007. CA Cancer J Clin, 57(1), p. 43–66. 2. Tomlins, S.A., D.R. Rhodes, S. Perner, S.M. Dhanasekaran, R. Mehra, X.W. Sun, et al., (2005). Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science, 310(5748), p. 644–8. 3. Chen, C.D., D.S. Welsbie, C. Tran, S.H. Baek, R. Chen, R. Vessella, et al., (2004). Molecular determinants of resistance to antiandrogen therapy. Nat Med, 10(1), p. 33–9. 4. Stanbrough, M., G.J. Bubley, K. Ross, T.R. Golub, M.A. Rubin, T.M. Penning, et al., (2006). Increased expression of genes converting adrenal androgens to testosterone in androgen-independent prostate cancer. Cancer Res, 66(5), p. 2815–25. 5. Holzbeierlein, J., P. Lal, E. LaTulippe, A. Smith, J. Satagopan, L. Zhang, et al., (2004). Gene expression analysis of human prostate carcinoma during hormonal therapy identifies androgen-responsive genes and mechanisms of therapy resistance. Am J Pathol, 164(1), p. 217–27. 6. Tomlins, S.A., R. Mehra, D.R. Rhodes, X. Cao, L. Wang, S.M. Dhanasekaran, et al., (2007). Integrative molecular concept modeling of prostate cancer progression. Nat Genet, 39(1), p. 41–51. 7. Dhanasekaran, S.M., T.R. Barrette, D. Ghosh, R. Shah, S. Varambally, K. Kurachi, et al., (2001). Delineation of prognostic biomarkers in prostate cancer. Nature, 412(6849), p. 822–6. 8. Lapointe, J., C. Li, J.P. Higgins, M. van de Rijn, E. Bair, K. Montgomery, et al., (2004).
Gene expression profiling identifies clinically relevant subtypes of prostate cancer. Proc Natl Acad Sci USA, 101(3), p. 811–6. 9. Luo, J., S. Zha, W.R. Gage, T.A. Dunn, J.L. Hicks, C.J. Bennett, et al., (2002). Alphamethylacyl-CoA racemase: a new molecular marker for prostate cancer. Cancer Res, 62(8), p. 2220–6. 10. Luo, J., D.J. Duggan, Y. Chen, J. Sauvageot, C.M. Ewing, M.L. Bittner, et al., (2001). Human prostate cancer and benign prostatic hyperplasia: molecular dissection by gene expression profiling. Cancer Res, 61(12), p. 4683–8. 11. Singh, D., P.G. Febbo, K. Ross, D.G. Jackson, J. Manola, C. Ladd, et al., (2002). Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 1(2), p. 203–9. 12. Welsh, J.B., L.M. Sapinoso, A.I. Su, S.G. Kern, J. Wang-Rodriguez, C.A. Moskaluk, et al., (2001). Analysis of gene expression identifies candidate markers and pharmacological targets in prostate cancer. Cancer Res, 61(16), p. 5974–8. 13. Rubin, M.A., M. Zhou, S.M. Dhanasekaran, S. Varambally, T.R. Barrette, M.G. Sanda, et al., (2002). Alpha-Methylacyl coenzyme A racemase as a tissue biomarker for prostate cancer. JAMA, 287(13), p. 1662–70. 14. Febbo, P.G., A. Thorner, M.A. Rubin, M. Loda, P.W. Kantoff, W.K. Oh, et al., (2006). Application of oligonucleotide microarrays to assess the biological effects of neoadjuvant imatinib mesylate treatment for localized prostate cancer. Clin Cancer Res, 12(1), p. 152–8.
Chapter 16 Cancer Gene Profiling for Response Prediction B. Michael Ghadimi and Marian Grade Summary Preoperative treatment strategies are now recommended for a variety of human cancers. Unfortunately, the response of individual tumors to a preoperative treatment is not uniform, and ranges from complete regression to resistance. This poses a considerable clinical dilemma, because patients with a priori resistant tumors could either be spared exposure to radiation or DNA-damaging drugs, i.e., they could be referred to primary surgery or dose-intensified protocols could be pursued. Because the response of an individual tumor as well as therapy-induced side effects represent the major limiting factors of current treatment strategies, identifying molecular markers of response or for treatment toxicity have become exceedingly important. However, complex phenotypes such as tumor responsiveness to multimodal treatments probably do not depend on the expression levels of just one or a few genes and proteins. Therefore, methods that allow comprehensive interrogation of genetic pathways and networks hold great promise in delivering such tumor-specific signatures, because expression levels of tens of thousands of genes can be monitored simultaneously. During the past few years, microarray technology has emerged as a central tool in addressing pertinent clinical questions, the answers to which are critical for the realization of a personalized genomic medicine, in which patients will be treated based on the biology of their tumor and their genetic profile (1–4). Key words: Gene expression profiling, Microarrays, Rectal cancer, Preoperative chemoradiotherapy, Response prediction, Personalized medicine
1. Introduction The major advantage of microarray technology over other techniques that study expression levels of genes is that tens of thousands of genes can be studied simultaneously in one single experiment. It has been shown that gene expression profiles of cancer cell lines correlate with drug activity (5–7) or radiosensitivity (8). Robert Grützmann and Christian Pilarsky (eds.), Cancer Gene Profiling: Methods and Protocols, Methods in Molecular Biology, vol. 576, DOI 10.1007/978-1-59745-545-9_16, © Humana Press, a part of Springer Science + Business Media, LLC 2010
327
328
Ghadimi and Grade
It has also been demonstrated that gene expression signatures predicting sensitivity to chemotherapeutic drugs in vitro can also accurately predict clinical response in patients treated with these drugs in vivo (9). In analogy to these model systems, gene expression signatures have been identified that predict the response of breast cancers to preoperative chemotherapy (for review, see ref.10), of esophageal carcinomas to preoperative chemoradiotherapy (11), or colon cancers to postoperative chemotherapy (12). In addition, prognostic signatures have been established for patients with breast cancers (13–15) and non-small cell lung cancer (16), leading to the initiation of multicenter trials to test the clinical effectiveness of these gene sets (17, 18). We recently demonstrated that gene expression profiling might be useful for predicting the response of locally advanced rectal cancers to preoperative chemoradiotherapy (19). These results led us to initiate prospective profiling of tumor samples from patients enrolled in the ongoing CAO/ARO/AIO-04 trial of the German Rectal Cancer Study Group, which is integrated into a Clinical Research Unit (KFO 179) funded by the German Research Foundation (DFG). This chapter describes the general principles of a microarray experiment. First, total RNA is isolated from frozen tumor samples. Second, the messenger RNA (mRNA) is amplified, during which time, amino allyl UTP nucleotides are incorporated that are later chemically coupled to an N-hydroxysuccinimidyl (NHS) ester dye. After purification to remove unincorporated nucleotides, the labeled sample is combined with a differentially labeled common-reference sample and subsequently hybridized onto a spotted oligonucleotide microarray slide (see Fig. 1).
2. Materials 2.1. Sample Accrual and Storage
• RNAlater (Ambion, Austin, TX). Make aliquots of 1 ml in polypropylene tubes and store at room temperature for up to 6 months.
2.2. RNA Isolation
1. TRIzol (Invitrogen, Carlsbad, CA). Cover the bottle with aluminum foil. 2. Anatomical forceps that can be sterilized. You need one forceps per sample. 3. Handheld homogenizer, e.g., Polytron (Kinematica, Littau, Switzerland). 4. Chloroform (Mallinckrodt Baker, Phillipsburg, NJ).
Cancer Gene Profiling for Response Prediction
329
Fig. 1. Principle of a two-color microarray experiment. RNA samples from tumor and control tissues are individually amplified, labeled with different fluorescent dyes, and hybridized to a single DNA microarray. The fluorescence intensity is measured for each probe in both samples, and relative gene expression levels can be calculated.
330
Ghadimi and Grade
5. Glycogen, 20 mg/ml (Invitrogen, Carlsbad, CA). 6. Isopropyl alcohol (Mallinckrodt Baker, Phillipsburg, NJ). 7. 200-proof ethyl alcohol (Warner-Graham, Cockeysville, MD). 8. DEPC-treated water (Research Genetics, Huntsville, AL). 9. Recommended: Spectrophotometer (Nanodrop, Rockland, DE). 10. Recommended: Bioanalyzer 2100 (Agilent Technologies, Palo Alto, CA). 11. RNA-, DNA-, RNase-, DNase-free sterile, cotton-plugged pipette tips. 12. RNA-, DNA-, RNase-, DNase-free microcentrifuge tubes. 13. Always wear gloves! 2.3. RNA Amplification
• Amino Allyl MessageAmp aRNA kit (Ambion, Austin, TX)
2.4. Indirect Labeling and Hybridization
1. The NHS ester dyes are provided as dried samples (Amersham Biosciences, Piscataway, NJ). Dissolve one vial of each Cy3 ester and Cy5 ester in 73 ml dimethyl sulfoxide (Amino Allyl MessageAmp aRNA kit). Prepare 5-ml aliquots, and store them at −80°C (seal tubes with Parafilm). 2. Nuclease-free water (Amino Allyl MessageAmp aRNA kit). 3. Coupling buffer (Amino Allyl MessageAmp aRNA kit). 4. Hydroxylamine (Amino Allyl MessageAmp aRNA kit). 5. Antisense RNA (aRNA) binding buffer (Amino Allyl MessageAmp aRNA kit). 6. 200-proof ethyl alcohol. 7. aRNA filter cartridge (aRNA collection tube; Amino Allyl MessageAmp aRNA kit). 8. aRNA wash buffer (Amino Allyl MessageAmp aRNA kit). 9. aRNA collection tube (Amino Allyl MessageAmp aRNA kit). 10. Microcon YM-30 columns (Millipore, Billerica, MA). 11. 10 × fragmentation buffer (Amino Allyl MessageAmp aRNA kit). 12. Stop solution (Amino Allyl MessageAmp aRNA kit). 13. LifterSlips (25 × 60 mm; Erie Scientific, Portsmouth, NH). 14. Spotted oligonucleotide microarray glass slides (Hs-Operon; Advanced Technology Center of the National Cancer Institute, Gaithersburg, MD). 15. Deionized formamide (Ambion).
Cancer Gene Profiling for Response Prediction
331
16. Prepare prehybridization solution (5 × standard sodium citrate [SSC], 1% bovine serum albumin [BSA], 0.1% sodium dodecyl sulfate [SDS]) and warm up to 42°C. 17. Prepare 2 × hybridization buffer (50% deionized formamide, 10 × SSC, 0.2% SDS) and warm up to 48°C. 18. Hybridization cassettes (TeleChem International, Sunnyvale, CA). 19. Prepare washing solutions: Solution 1: 2 × SSC, 0.1% SDS (200 ml total); solution 2: 1 × SSC (200 ml total); solution 3: 0.2 × SSC (200 ml total).
3. Methods The set up of gene expression microarray experiments largely depends on two factors: the amount of RNA of a given sample, and the microarray platform. Depending on the samples that are used, RNA amplification may be required. Some microarray manufacturers like Agilent Technologies have already included an RNA amplification step in their protocol. If RNA amplification is necessary to obtain sufficient amounts of RNA (e.g., for repeat hybridizations), there are many companies that provide special kits. We have experience with Arcturus’ RiboAmp RNA amplification kit (Mountain View, CA) and Ambion’s Amino Allyl MessageAmp aRNA kit, and both yielded good results. However, it should be noted that some kits generate sense RNA, while others generate antisense RNA. This is not a factor when hybridizing to complementary DNA (cDNA) arrays, but must be taken into consideration for oligonucleotide arrays. Additionally, some kits enable two rounds of amplification for higher yield. The decision of which kit to use is therefore based on the design of the microarray platform, because (spotted) microarrays can represent single- or double-stranded sequences. Another very important aspect to consider is whether one-color or dual-color hybridizations should be performed. Both techniques are accepted in the microarray field for use with specific platforms, and each has advantages and disadvantages, which are discussed elsewhere (20, 21). As mentioned above, we performed dual-color hybridizations. Accordingly, one tumor sample is hybridized against a common reference aRNA pool. It should also be noted that Cy5 is very sensitive to high environmental ozone concentrations (22). This problem can obviously be overcome if one-color hybridizations with Cy3 are being performed. If dual-color hybridizations are required, chemical preservatives can be added to the washing solutions.
332
Ghadimi and Grade
Other alternatives are to install carbon filters in the air handling system, or to perform the hybridization and washing steps in a closed hood with an activated charcoal filter through which the air is purified (23). Finally, because of the dynamic nature of this technology, it should be noted that this protocol is optimized for those microarrays that we purchased. Commercially available microarrays obviously require different protocols. The general considerations outlined here, and the protocol for RNA isolation, however, hold true for those too. 3.1. Sample Accrual and Storage
The time interval between sampling and storage is very important because even partial degradation dramatically impairs microarray analyses. For gene expression profiling, we therefore strongly recommend accruing tissue samples directly in the operating room or in an endoscopic unit. The samples should be immediately stored in an RNA stabilization reagent or frozen directly in liquid nitrogen. We and others have good experience with RNAlater from Ambion. The advantage of RNA stabilization reagents is that they are ready for use, and can be stored in cups or tubes at room temperature for months.
3.2. RNA Isolation
The single most critical factor for a successful microarray experiment is the RNA, i.e., its purity and integrity. Many different protocols are available for RNA isolation. Because we have not only focused on the cellular transcriptome, but also on genomic and proteomic analysis, we have been primarily using TRIzol. The isolation protocol described here is based on the manufacturer’s recommendation with minor modifications according to our experience. In our hands, we have been able to successfully isolate sufficient amounts of RNA from cancer biopsies with weights ranging from 5 to 150 mg. 1. Thaw tumor samples that have been stored in RNAlater and, using a sterile forceps, transfer the tissue immediately into a 15-ml polypropylene tube containing 4 ml of the TRIzol reagent (see Note 1). 2. Thoroughly homogenize samples to disrupt cells and dissolve components, which usually takes approximately 30 s, and incubate for 5 min at room temperature to dissociate nucleoprotein complexes. 3. Add 0.8 ml chloroform (0.2 ml chloroform per 1 ml of TRIzol), tightly cap tubes, and shake vigorously for 30 s. 4. Allow phase separation for 15 min on ice, and centrifuge at 12,000 × g for 15 min at 4°C (phase separation). 5. Transfer very carefully only the upper aqueous phase (colorless), containing mostly RNA, to a new 15-ml polypropylene tube (see Note 2).
Cancer Gene Profiling for Response Prediction
333
6. Add 1 ml glycogen and mix briefly (see Note 3). 7. Add 2.0 ml isopropyl alcohol (0.5 ml isopropyl alcohol per 1 ml TRIzol) to precipitate the RNA. 8. Vortex tube and incubate for at least 1 h at −20°C (see Note 4). 9. Centrifuge at 12,000 × g for 30 min at 4°C. 10. Remove the supernatant (see Note 5), and add 4 ml of 75% ethanol (1 ml ethanol per 1 ml TRIzol) to wash off residuals of TRIzol. 11. Break up the pellet by pipetting up and down and vortex for a few seconds. 12. Wash the pellet for 10 min at room temperature on a rotator (see Note 6). 13. Centrifuge at 7,500 × g for 15 min at 4°C to pellet the RNA. 14. Remove the supernatant, and add 1 ml of 75% ethanol. 15. Break up the pellet by pipetting up and down, and transfer it to a new RNase-free 1.5-ml microcentrifuge tube. 16. Vortex, and wash the pellet for 10 min at room temperature on a rotator. 17. Centrifuge at 7,500 × g for 15 min at 4°C. 18. Carefully remove the supernatant, and briefly air-dry the pellet at room temperature (see Note 7). 19. Resuspend the pellet in 20–100 ml DEPC-treated water (see Note 8), and incubate at 65°C for 5 min on a shaking Thermomixer. 20. Cool down the sample on ice, and determine the quantity, purity, and integrity of your RNA (see Note 9). 21. Store RNA at −80°C (see Note 10). 3.3. RNA Amplification
The basic principle of the Amino Allyl MessageAmp aRNA amplification procedure is as follows: First, total RNA is reverse transcribed into cDNA using an oligo(dT) primer, to which a T7 promotor is attached. Second, T7 RNA polymerase is used to transcribe the cDNA into antisense RNA (aRNA), which represents the actual amplification step. The Ambion protocol is straightforward, and all required reagents are included in the kit: First-strand and second-strand cDNA synthesis (reverse transcription), cDNA purification, in vitro transcription (aRNA synthesis), aRNA purification, dye-coupling reaction, and purification of labeled aRNA. The handbook is in principle designed like a cookbook, and we followed the protocol without any modifications. Therefore, we simply refer
334
Ghadimi and Grade
to Ambion’s website (http://www.ambion.com/techlib/prot/ fm_1752.pdf). There are, however, several points to consider, which are discussed below: 1. It is recommended to use 100–2,000 ng of total RNA. Since we needed 5 mg aRNA for subsequent hybridizations, we decided to start with 5 mg of total RNA (which is the maximum suggested RNA input). 2. The in vitro transcription can be performed for 6–14 h. For convenience, we therefore incubated overnight. 3. The Amino Allyl MessageAmp aRNA kit allows one round or two rounds of amplification. 4. A DNase I treatment is optional, and we recommend including this step (especially if further validation using real-time polymerase chain reaction [PCR] is intended). 5. Common-reference pool: For dual-color hybridizations, we strongly recommend creating a pool of amplified reference RNA. Because it is mandatory to use aliquots of the same reference pool for all microarray analyses within a given experiment, estimate the total number of anticipated hybridizations and amplify sufficient quantities of reference RNA. To guarantee the stability of this aRNA pool over time, quality controls using a Bioanalyzer 2100 should be performed routinely. 3.4. Microarray Hybridization 3.4.1. NHS Ester Coupling
During in vitro transcription of cDNA into aRNA, amino allyl UTP nucleotides (aaUTP) have been incorporated. Within the coupling reaction, NHS ester dyes form a chemical bond to the reactive primary amino group of the aaUTP (C5 position of uracil). 1. Thaw resuspended NHS–Cy dyes in the dark. 2. Heat up the frozen RNA sample for 5 min at 65°C and put on ice (see Note 11). 3. Transfer 5 mg aRNA into a new tube and vacuum dry in a Speed Vac (see Note 12). 4. Add 9 ml coupling buffer, pipette up and down, vortex, and spin down briefly. 5. Incubate for 10 min at 37°C. 6. Add 5 ml NHS–Cy dyes, flick, and spin down briefly. 7. Incubate for 1 h at room temperature in the dark. 8. Add 4.5 ml of 4 M hydroxylamine, flick, and spin down briefly, and incubate for 15 min at room temperature in the dark.
3.4.2. Purification
1. Heat up nuclease-free water to 60°C. 2. Add 80.5 ml preheated nuclease-free water to each tube, resulting in a final volume of 100 ml per tube, and mix well by pipetting up and down.
Cancer Gene Profiling for Response Prediction
335
3. Add 350 ml aRNA binding buffer, and mix well by pipetting up and down. 4. Add 250 ml of 100% ethanol, and mix well by pipetting up and down. 5. Immediately transfer the entire volume to an aRNA filter cartridge (aRNA collection tube), and spin for 1 min at 10,000 × g. 6. Discard the flow-through, and place the filter cartridge back in the original tube, taking care not to touch the tip of the cartridge. 7. Add 650 ml aRNA wash buffer, and spin for 1 min at 10,000 × g. 8. Discard the flow-through and spin again for 1 min at 10,000 × g. 9. Transfer the filter cartridge to a new aRNA collection tube. 10. Add 50 ml preheated nuclease-free water to the middle of the filter, incubate for 2 min at room temperature, and spin for 2 min at 10,000 × g. 11. Repeat step 10, and place the samples on ice (see Note 13). 12. Measure the incorporated dye concentration for the labeled aRNA with a Nanodrop. 3.4.3. Fragmentation
1. Combine tumor and reference sample (see Note 14) and transfer the entire volume to a Microcon YM-30 column. 2. Spin down for 6 min at 8,000 × g to decrease the reaction volume. 3. Flip the column, place it into a new vial, and spin for 3 min at 1,000 × g. 4. If necessary, add nuclease-free water to end up with a final volume of 9 ml (see Note 15). 5. Add 1 ml of the 10× fragmentation buffer, flick, and spin down briefly. 6. Incubate for 15 min at 70°C. 7. Spin down briefly, add 1 ml stop solution, and place the tube on ice.
3.4.4. Prehybridization of Array Slides
1. Wash LifterSlips and array slide with isopropyl alcohol. 2. Place LifterSlip onto the array and carefully apply 80 ml warm prehybridization solution at one end. The solution should be drawn under the LifterSlip onto the hybridization area. 3. Place the array slide into a hybridization chamber and incubate for 30–60 min in a 42°C waterbath.
336
Ghadimi and Grade
4. Wash array slide in DEPC-treated water and isopropyl alcohol by dipping them into the corresponding solutions, and air-dry for no longer than 1 h at room temperature. 3.4.5. Preparation of Hybridization Solution
1. Add 29 ml nuclease-free water to bring the final volume to 40 ml. 2. Denature the probe mix at 90°C for 2 min and immediately place on ice. 3. Add 40 ml preheated hybridization buffer, mix thoroughly, and spin down briefly.
3.4.6. Hybridization
1. Place the LifterSlip onto array slide. 2. Spin down the probe briefly, and apply the entire volume (80 ml) to one end of the LifterSlip, allowing it to be drawn under to the hybridization area. 3. Add 20 ml of nuclease-free water to the hybridization cassette to prevent evaporation of the hybridization solution. 4. Tightly seal the chamber and incubate submerged at the bottom of a 42°C waterbath for 16 h.
3.4.7. Washing
1. Wash slides for 2 min in solution 1 (shaking). 2. Wash slides for 2 min in solution 2 (shaking). 3. Wash slides for 2 min in solution 3 (shaking). 4. Spin array slides in a centrifuge at 650 rpm for 3 min to dry, and store them in the dark.
3.4.8. Scanning
Scan the slides within 24 h, preferably directly after the washes. Use settings according to the recommendations in the scanner manual.
4. Notes 1. TRIzol is toxic and should be handled under a fume hood. We recommend weighing the tissue samples at this point, because it might give a rough estimate on the amount of RNA to expect. Additionally, you can freeze down your samples at this point at −80°C. 2. Three phases should be visible: a lower red phenol-chloroform phase (proteins), an interphase (DNA), and a colorless upper aqueous phase (RNA). Be careful not to disturb the interphase when removing the upper phase; it is better to loose some RNA than to risk contamination with DNA. If you wish to subsequently isolate DNA and proteins as well, you need to keep the phenol–chloroform phase and the interphase (please read the manufacture’s protocol).
Cancer Gene Profiling for Response Prediction
337
3. Glycogen serves as an inert co-precipitant and increases nucleic acid recovery. It also helps to visualize the RNA pellet after precipitation, and does not inhibit further reactions. 4. One may wish to stop at this point and leave the tubes overnight at −20°C. 5. Depending on the size of the tissue samples, this pellet might be very small and difficult to see (see Note 3). 6. You can stop at this point, and store your sample at −20°C (months). 7. Do not vacuum-dry the RNA pellet, and do not air-dry it completely, otherwise, its solubility will be decreased. 8. The volume of DEPC-treated water to be added is strongly influenced by the amplification protocol that you wish to use. For our purposes, we needed an RNA concentration of >0.5 mg/ml. 9. To evaluate RNA quantity and purity, perform spectrophotometric readings at wavelengths of 260 and 280 nm. Depending on the quality of the RNA, expect 260/280 ratios between 1.9 and 2.1. For subsequent microarray experiments, we strongly recommend analyzing your samples with Agilent’s Bioanalyzer or a similar technique. A spectrophotometer does not provide information about the RNA integrity, and even RNA with a perfect 260/280 ratio can be degraded. Even though a Bioanalyzer is not capable of determining the percentage of full-length mRNA, it is, in our opinion, more reliable than a conventional denaturing agarose gel and requires a smaller amount of RNA. 10. There is an ongoing discussion about long-term storage of RNA in DEPC-treated water, which is slightly acidic and may result in RNA degradation. In our own experiments, and by comparing RNA samples over time, this has not been a specific problem. Alternatives for long-term storage are 70% ethanol or 0.1× TE, although the EDTA may need to be removed to prevent inhibition of subsequent enzymatic reactions in the protocol. 11. Heating up samples at 65°C destroys secondary structures. Even though it is well known that continuous freezing– thawing impairs RNA integrity, this is a necessary step. 12. Make sure not to over-dry RNA samples. Clean the Speed Vac prior to use. 13. It usually takes time to perform the subsequent measurements and calculations, therefore the labeled aRNA should be placed on ice. 14. Make sure to use equal amounts of the two samples to be hybridized based on the dye concentration, not the aRNA
338
Ghadimi and Grade
concentration. In general, measured dye concentrations of 1.5 pmol/ml result in good hybridization signals. If you wish, you may stop at this point and store the labeled probes at −80°C. 15. If the volume remains >9 ml, vacuum dry briefly.
Acknowledgments The authors thank Drs. Michael J. Difilippantonio and Jochen Gaedcke, and Mr. Patrick Hörmann for their advice. This work was supported by the Deutsche Krebshilfe and the Deutsche Forschungsgemeinschaft (KFO 179). References 1. Quackenbush J. (2006) Microarray analysis and tumor classification. N Engl J Med. 354, 2463–2472. 2. Jensen EH, McLoughlin JM, Yeatman TJ. (2006) Microarrays in gastrointestinal cancer: is personalized prediction of response to chemotherapy at hand? Curr Opin Oncol. 18, 374–380. 3. Bol D, Ebner R. (2006) Gene expression profiling in the discovery, optimization and development of novel drugs: one universal screening platform. Pharmacogenomics. 7, 227–235. 4. Nevins JR, Potti A. (2007) Mining gene expression profiles: expression signatures as cancer phenotypes. Nat Rev Genet. 8, 601–609. 5. Ross DT, Scherf U, Eisen MB, Perou CM, Rees C, Spellman P, Iyer V, Jeffrey SS, Van de Rijn M, Waltham M, Pergamenschikov A, Lee JC, Lashkari D, Shalon D, Myers TG, Weinstein JN, Botstein D, Brown PO. (2000) Systematic variation in gene expression patterns in human cancer cell lines. Nat Genet. 24, 227–235. 6. Scherf U, Ross DT, Waltham M, Smith LH, Lee JK, Tanabe L, Kohn KW, Reinhold WC, Myers TG, Andrews DT, Scudiero DA, Eisen MB, Sausville EA, Pommier Y, Botstein D, Brown PO, Weinstein JN. (2000) A gene expression database for the molecular pharmacology of cancer. Nat Genet. 24, 236–244. 7. Mariadason JM, Arango D, Shi Q, Wilson AJ, Corner GA, Nicholas C, Aranes MJ, Lesser M, Schwartz EL, Augenlicht LH. (2003) Gene expression profiling-based prediction
of response of colon carcinoma cells to 5fluorouracil and camptothecin. Cancer Res. 63, 8791–8812. 8. Torres-Roca JF, Eschrich S, Zhao H, Bloom G, Sung J, McCarthy S, Cantor AB, Scuto A, Li C, Zhang S, Jove R, Yeatman T. (2005) Prediction of radiation sensitivity using a gene expression classifier. Cancer Res. 65, 7169–7176. 9. Potti A, Dressman HK, Bild A, Riedel RF, Chan G, Sayer R, Cragun J, Cottrill H, Kelley MJ, Petersen R, Harpole D, Marks J, Berchuck A, Ginsburg GS, Febbo P, Lancaster J, Nevins JR. (2006) Genomic signatures to guide the use of chemotherapeutics. Nat Med. 12, 1294–1300. 10. Lønning PE, Knappskog S, Staalesen V, Chrisanthar R, Lillehaug JR. (2007) Breast cancer prognostication and prediction in the postgenomic era. Ann Oncol. 18, 1293–1306. 11. Luthra R, Wu TT, Luthra MG, Izzo J, Lopez-Alvarez E, Zhang L, Bailey J, Lee JH, Bresalier R, Rashid A, Swisher SG, Ajani JA. (2006) Gene expression profiling of localized esophageal carcinomas: association with pathologic response to preoperative chemoradiation. J Clin Oncol. 24, 259–267. 12. Del Rio M, Molina F, Bascoul-Mollevi C, Copois V, Bibeau F, Chalbos P, Bareil C, Kramar A, Salvetat N, Fraslon C, Conseiller E, Granci V, Leblanc B, Pau B, Martineau P, Ychou M. (2007) Gene expression signature in advanced colorectal cancer patients select drugs and response for the use of leucovorin, fluorouracil, and irinotecan. J Clin Oncol. 25, 773–780.
Cancer Gene Profiling for Response Prediction
13. van ‘t Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT, Schreiber GJ, Kerkhoven RM, Roberts C, Linsley PS, Bernards R, Friend SH. (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature. 415, 530–536. 14. van de Vijver MJ, He YD, van’t Veer LJ, Dai H, Hart AA, Voskuil DW, Schreiber GJ, Peterse JL, Roberts C, Marton MJ, Parrish M, Atsma D, Witteveen A, Glas A, Delahaye L, van der Velde T, Bartelink H, Rodenhuis S, Rutgers ET, Friend SH, Bernards R. (2002) A geneexpression signature as a predictor of survival in breast cancer. N Engl J Med. 347, 1999–2009. 15. Buyse M, Loi S, van’t Veer L, Viale G, Delorenzi M, Glas AM, d’Assignies MS, Bergh J, Lidereau R, Ellis P, Harris A, Bogaerts J, Therasse P, Floore A, Amakrane M, Piette F, Rutgers E, Sotiriou C, Cardoso F, Piccart MJ. (2006) TRANSBIG Consortium. Validation and clinical utility of a 70-gene prognostic signature for women with node-negative breast cancer. J Natl Cancer Inst. 98, 1183–1192 16. Potti A, Mukherjee S, Petersen R, Dressman HK, Bild A, Koontz J, Kratzke R, Watson MA, Kelley M, Ginsburg GS, West M, Harpole DH Jr, Nevins JR. A genomic strategy to refine prognosis in early-stage non-small-cell lung cancer. N Engl J Med. 355, 570–580. 17. Bogaerts J, Cardoso F, Buyse M, Braga S, Loi S, Harrison JA, Bines J, Mook S, Decker N, Ravdin P, Therasse P, Rutgers E, van ‘t Veer LJ, Piccart M; TRANSBIG consortium. Gene signature evaluation as a prognostic tool: challenges in the design of the MINDACT trial. Nat Clin Pract Oncol. 3, 540–551.
339
18. Anguiano A, Potti A. (2007) Genomic signatures individualize therapeutic decisions in non-small-cell lung cancer. Expert Rev Mol Diagn. 7, 837–844. 19. Ghadimi BM, Grade M, Difilippantonio MJ, Varma S, Simon R, Montagna C, Füzesi L, Langer C, Becker H, Liersch T, Ried T. (2005) Effectiveness of gene expression profiling for response prediction of rectal adenocarcinomas to preoperative chemoradiotherapy. J Clin Oncol. 23, 1826–1838. 20. de Reyniès A, Geromin D, Cayuela JM, Petel F, Dessen P, Sigaux F, Rickman DS. (2006) Comparison of the latest commercial short and long oligonucleotide microarray technologies. BMC Genomics. 7, 51. 21. Patterson TA, Lobenhofer EK, Fulmer-Smentek SB, Collins PJ, Chu TM, Bao W, Fang H, Kawasaki ES, Hager J, Tikhonova IR, Walker SJ, Zhang L, Hurban P, de Longueville F, Fuscoe JC, Tong W, Shi L, Wolfinger RD. (2006) Performance comparison of one-color and two-color platforms within the MicroArray Quality Control (MAQC) project. Nat Biotechnol. 24, 1140–50. 22. Fare TL, Coffey EM, Dai H, He YD, Kessler DA, Kilian KA, Koch JE, LeProust E, Marton MJ, Meyer MR, Stoughton RB, Tokiwa GY, Wang Y. (2003) Effects of atmospheric ozone on microarray data quality. Anal Chem. 75, 4672–4675. 23. Branham WS, Melvin CD, Han T, Desai VG, Moland CL, Scully AT, Fuscoe JC. (2007) Elimination of laboratory ozone leads to a dramatic improvement in the reproducibility of microarray gene expression measurements. BMC Biotechnol. 7, 8.
Chapter 17 The EGFR Pathway as an Example for Genotype: Phenotype Correlation in Tumor Genes Ulrike Mogck, Eray Goekkurt, and Jan Stoehlmacher Summary Tumor-specific and germ-line variations of DNA significantly contribute to tumor growth and its ability to develop resistance. Among several mechanisms that cause resistance to cancer treatment, the genotype of certain growth factors, like epidermal growth factor receptor (EGFR), is critical. EGFR signals requests for proliferation and survival toward the nucleus of the cancer cell. Several polymorphic DNA sequences of EGFR and the mutational status of the Kirsten-Ras (KRAS) gene appear to be determinants of response to new drugs that inhibit EGFR. We describe the correlation between the EGFR genotype, including the KRAS mutation, and the consequences of the resulting genotype for anti-EGFR therapy in colorectal cancer. Key words: KRAS, EGFR polymorphism, Genotype, Colorectal cancer
1. Introduction The phenotypes of genes involved in cancer development, growth, and resistance to chemotherapy have been shown to be highly relevant in the treatment of solid tumors. The majority of these processes are mostly determined by the individual genetic set-up. Tumor-specific aberrations of the genome may lead to significant changes in expression and/or functionality of the encoded gene. The efficacy of cell replication and response to cancer treatment is linked to the current state of functionality of essential tumor genes like growth factors or DNA repair genes. Differences of this genetically determined phenotype may translate directly into variations of tumor behavior. These differences are a daily clinical phenomenon (1). Robert Grützmann and Christian Pilarsky (eds.), Cancer Gene Profiling: Methods and Protocols, Methods in Molecular Biology, vol. 576 DOI 10.1007/978-1-59745-545-9_17, © Humana Press, a part of Springer Science + Business Media, LLC 2010
341
342
Mogck, Goekkurt, and Stoehlmacher
Changes within these genes may occur as polymorphic DNA sequences or mutations. Most of these variations are without effect to further development of the cancer cell. However, some genetic aberrations cause significant impact on protein function (2). These tumor-specific changes are the basis for treatment responsiveness of the tumor (1). In addition, DNA sequence variations among individuals that are inherited and transmitted from one generation to the next, so-called germ-line polymorphisms, are responsible for a lot of differences in phenotype with respect to cancer treatment-related side effects. Thus, genotypedependent differences are responsible for both the main and side effects of current antitumor therapies. The epidermal growth factor receptor (EGFR) pathway has become of major interest for the treatment of several solid tumors including colorectal cancer, lung cancer, and cancers of the head and neck (3, 4). It has been demonstrated that the activation of processes downstream from EGFR are related to genetic changes in the coding sequence of the receptor, which include different polymorphisms and mutations (5). The effects of EGFR activation for growth, survival, and apoptosis are transmitted into the nucleus by processes involving the Kirsten-Ras (KRAS) pathway (6). The activating character of point mutations within codons 12 and 13 of the exon 1 of the KRAS gene have been known for many years to contribute to the carcinogenesis of different cancers. In modern oncology treatment regimens, several drugs target EGFR or its tyrosine kinases, therefore, these mutations became very important. Recent data in colorectal cancer demonstrated that the strategy of EGFR inhibition is ineffective in patients possessing a KRAS mutation (7, 8). Therefore, the constantly activated EGFR pathway based on a single KRAS point mutation determines efficacy of cancer treatment and tumor growth. Acknowledging these close genotype–phenotype interactions, the European Medicines Agency (EMEA) approved panitumumab, an anti-EGFR directed antibody, for the treatment of colorectal cancer only in patients who are demonstrated to have wild-type KRAS. In addition to the KRAS status, the EGFR polymorphisms R497K, C-191A, G-216T, A61G, (CA)nVNTR in exon 1 have been linked to tumor growth under EGFR inhibition.
2. Materials 2.1. DNA Extraction from Blood
1. Lysis buffer: 1 mM NH4HCO3, 115 mM NH4Cl in water. 2. White cell lysis buffer: 100 mM Tris–HCl (pH 7.6), 40 mM EDTA (pH 8.0), 50 mM NaCl, 0.05% sodium
The EGFR Pathway as an Example for Genotype
343
acetate, 0.2% SDS in water. Note: autoclave buffer before adding SDS. 3. Saturated NaCl (~6 M) in water. 4. Ethanol absolute; store at 4°C. 5. TE-buffer: 10 mM Tris base; 1 mM EDTA in water. 2.2. DNA Extraction from Formalin-Fixed, Paraffin-Embedded (FFPE) Tissue
1. Xylole. 2. Lysis buffer: 10 mM Tris–HCl (pH 7.6), 1 mM EDTA (pH 0.8), 0.01% SDS in water. 3. 20 mg/ml proteinase K in water (Sigma); store in single-use aliquots at −20°C. 4. Phenol:chloroform:isopropanol alcohol (25:24:1); store in a glass bottle at 4°C. 5. Chloroform; store in a glass bottle at 4°C. 6. Sodium acetate: 3 M, pH 4.6, in water. 7. Isopropanol.
2.3. Direct Sequencing
1. Sodium acetate: 3 M, pH 4.6 in water. 2. Ethanol absolute; store at 4°C. 3. AmpliTaq BigDye terminator premix (Applied Biosystems). 4. Sequence-specific primer.
2.4. Polymerase Chain Reaction (PCR)Restriction Fragment Length Polymorphisms (RFLP)
1. Taq DNA polymerase (Invitrogen) or Hot Start Taq DNA polymerase (Qiagen). 2. 100 mM dNTP stock solution (Invitrogen). 3. Q-solution (Qiagen). 4. Suitable enzyme (New England Biolabs). 5. 3% (w/v) Agarose (Invitrogen) in 1× TBE; prepare 10× stock solution: 108 g Tris base, 55 g boric acid, 9.3 g EDTA per liter in water, pH 8.3.
2.5. GeneScan Analyses
1. TBE buffer (10×): 108 g Tris base, 55 g boric acid, 9.3 g EDTA per liter in water, pH 8.3. 2. Urea. 3. Long Ranger (Bio-Rad). 4. Ammonium persulfate: 10% solution in water; store in singleuse aliquots at −20°C. 5. TEMED (Sigma). 6. Formamide. 7. Loading buffer: 50% glycerol, 0.05% bromphenol blue, 100 mM EDTA. 8. ROX standard 500 (Applied Biosystems).
344
Mogck, Goekkurt, and Stoehlmacher
3. Methods 3.1. DNA Extraction
DNA, which is the basis of genotype–phenotype analysis, can be extracted from host leucocytes (normal cells) or from tumor cells from either FFPE or fresh-frozen tissue. In terms of germline polymorphisms, genotyping from normal host cells might be sufficient but might not always reflect the tumor genotype. Indeed, it has to be taken into account that tumor-specific changes of the genome (e.g., chromosomal aberrations) might lead to discrepancies between genotyping results from normal cells and from tumor cells. Therefore, in the case of tumor-specific mutations, DNA from tumor cells has to be analyzed. As an alternative to the following DNA extraction manual, several commercially kits for DNA extraction from all tissue types are available.
3.1.1. DNA Extraction from Blood
1. Collect 5 ml of whole blood in a 15-ml Falcon tube and add 10 ml of lysis buffer. Mix completely by inversion. 2. Spin for 10 min in a table centrifuge. 3. Discard the supernatant, resuspend the cell pellet in 10 ml lysis buffer, and repeat the centrifugation step. 4. Discard the supernatant and resuspend the cell pellet in 1.8 ml white cell lysis buffer. The cell/buffer solution should be gelatinous. 5. Cell lysate can be stored at 4°C. 6. For final DNA extraction, add 150 ml saturated NaCl solution to 400 ml white cell lysate. 7. Mix by inverting and vortexing well and incubate on ice for 10 min. 8. Pipet the supernatant into a new 1.5-ml tube and add 1 ml of absolute ethanol. Mix by inversion, and a DNA precipitate should be visible. 9. Spin in a microcentrifuge for 1 min at maximum speed and discard the supernatant. 10. Wash the pellet with 1 ml of 70% ethanol, air-dry, and resuspend in TE buffer (see Note 1).
3.1.2. DNA Extraction from FFPE Tissue by Microdissection
1. One 3-mm-thick slice is needed for hematoxylin and eosin (HE) staining and two to five 10-mm-thick slices are needed for DNA isolation. 2. The HE-stained thin section will be reviewed by a pathologist and areas of interest should be outlined (see Note 2). 3. Compare the HE-stained slice with the 10-mm-thick slices, and, with a blade, mark the areas of interest on the thick slices.
The EGFR Pathway as an Example for Genotype
345
4. With a blade, carefully scrape the tissue from the slide and put it into a 1.5-ml tube. Use one tube for a sample. 5. For deparaffinization, add 1.2 ml Xylole to the tube and mix thoroughly. Centrifuge for 5 min at maximum speed. 6. Carefully remove the supernatant with a pipet and add 1.2 ml absolute ethanol to remove the Xylole. Mix carefully and centrifuge for 5 min at maximum speed. 7. Carefully remove the supernatant and repeat once. 8. Add 1 ml of 70% ethanol, mix carefully, and centrifuge for 5 min at maximum speed; remove the supernatant with a pipet and air-dry the pellet at 37°C until the ethanol is complete removed. 9. Resuspend the pellet in 500 ml lysis buffer supplemented with proteinase K at a concentration of 2 mg/ml. 10. Mix well by vortexing and incubate overnight at 56°C until all tissue fragments are dissolved completely. 11. On the next day, add 500 ml phenol:chloroform:isopropanol alcohol and mix thoroughly by vortexing. 12. Centrifuge at 13,000 ´ g for 10 min and, with a 100-ml pipet, transfer the upper phase into a new tube. Add 1 volume of chloroform, and mix thoroughly by vortexing. Centrifuge for 5 min at 13,000 ´ g. 13. Carefully remove the upper aqueous phase and transfer it into a new tube. Add 0.1 volume sodium acetate and 1 volume ice-cooled isopropanol and incubate overnight at −20°C. 14. Centrifuge at 13,000 ´ g at 4°C for 15 min and the DNA should visible as a small pellet on the bottom. 15. Carefully discard the supernatant and wash the pellet once with 70% ethanol. Centrifuge for 5 min at 13,000 ´ g at 4°C and remove the supernatant. Air-dry the pellet. 16. Dissolve the DNA in 50 ml (or an appropriate volume) water. 17. For optimal quality of DNA, storage is recommended at −20°C. 3.2. Genotyping Methods
Depending on the DNA variation of interest, various methods for genotyping can be considered. Allele-specific direct sequencing provides a method for detecting all kinds of mutations/polymorphisms. The most often seen DNA variations are point mutations, either as germline single-nucleotide polymorphisms (SNP) or tumor-specific mutations. In this cases, PCR-based RFLP techniques or real-time PCR strategies (not shown) provide easyto-perform genotyping methods. In case of a tandem repeat or insertion/deletion polymorphisms, genotyping can be performed by GeneScan analyses using a genetic analyzer (sequencer),
346
Mogck, Goekkurt, and Stoehlmacher
which is able to detect sequence variations of 1-bp difference. Comprehensive SNP analyses may be carried out using SNP chip technology either with whole-genome SNP chip arrays (screening) or customized chip arrays (e.g., metabolic pathways). 3.2.1. Direct Sequencing
1. For sequencing, PCR is performed in a total volume of 25 ml with specific primers and suitable salt and dNTP conditions. 2. The PCR product has to be purified by ethanol. Therefore, the volume is adjusted to 150 ml by water, 15 ml sodium acetate (3 M, pH 4.6) and 375 ml ethanol absolute (4°C) are mixed and centrifuged at 15,000 ´ g for 15 min at room temperature. After centrifugation, the supernatant is carefully removed by pipetting, the pellet is washed with 250 ml of 70% ethanol, and a second centrifugation is performed at 15,000 ´ g for 5 min. 3. The pellet is air-dried and is resuspended in water. 4. 2 ng DNA is mixed with 4 ml AmpliTaq BigDye terminator premix and 0.2 mM sequence primer (same primer as used for PCR) in a total volume of 10 ml. 5. The sequence reaction is performed in a thermal cycler system with the following conditions: denaturation for 10 s at 95°C and an annealing and extension step for 90 s at 58°C. This is repeated 19 times (see Note 3). 6. To remove unincorporated ddNTP, a second ethanol precipitation must be performed (10 ml reaction mix, 140 ml water, 15 ml sodium acetate, 375 ml ethanol). The dry pellet is resuspended in 20 ml HPLC-quality water and can be analyzed in a Genetic Sequencer (see Note 4). 7. Data interpretation is carried out by ABI Sequence Analysis 5.2 software (Fig. 1).
3.2.2. Pcr-rflp
1. PCR is performed in a total reaction volume of 25–50 ml containing 50 ng template DNA, 0.4 mM specific primer, 2 mM MgCl2, 2 mM each dNTP, and 1 U Taq DNA polymerase. 2. Under suitable conditions, PCR is performed in a thermal cycling system (Table 1). 3. Due to of the high GC content in the promotor area, an optimized PCR is used: Q-solution is added to the reaction mix and instead of Taq DNA polymerase, a Hot Start Taq DNA polymerase is used. Hot Start polymerase is easily activated by a 15-min 95°C incubation step. 4. For RFLP, mix 15 ml PCR product with 3 U enzyme and suitable buffer in a 25-ml reaction volume. 5. The mix is incubated for 3 h at 37°C and loaded onto a 3% agarose gel. DNA fragments are visible in UV light and documented by taking a picture (Fig. 2).
The EGFR Pathway as an Example for Genotype
347
Fig. 1. Direct sequencing of parts of exon 1 from KRAS with examples for mutations in exon 12 and exon 13.
3.2.3. GeneScan Analyses
1. The instructions assume the use of a Genetic Analyzer 377 XL from Applied Biosystems. 2. Prepare a 0.7-mm-thick 10% gel by mixing 18 g urea, 25.5 g Aqua-Dest, 5 ml of 10× TBE buffer, 5 ml Long Ranger, 250 ml ammonium persulfate solution, and 35 ml TEMED. 3. Glass plates are cleaned by Alconox, rinsed with water, and dried with special tissues. 4. Pour the gel and it should polymerize for at least 2 h or, preferably, overnight (see Note 5). 5. Prepare the running buffer by a ten times dilution of concentrated TBE buffer. 6. Prepare the samples by mixing 2 ml PCR product with 5 ml FLS (formamide and loading buffer 4:1) and 0.55 ml ROX 500 Standard. Denature the samples for 2 min at 95°C and store on ice immediately after denaturation (see Note 6). 7. Rinse the wells with a needle with running buffer before loading a 1.8-ml sample. 8. The run will be performed with a gel temperature at 51°C, laser power of 200 mV, and a collection time of 2 h. 9. Data are analyzed with the ABI PRISM GeneScan software (Fig. 3).
Kras 12/13
EGFR(CA)n
EGFR C-191A
EGFR G-216T
EGFR R497K
F 5¢ TGTCACTAAAGGAAAGGA 3¢
EGF A61G
R 5¢ CATGAAAATGGTCAGAGAAACC 3¢
F 5¢ TAGTGGTGGAGTATTTGATAGT 3¢
R 5¢ TTC TTC TGC ACA CTT GGC AC 3¢
F _FAM 5¢ GTT TGA AGA ATT TGA GCC AAC C 3¢
R 5¢ GAGGTGGCCTGTCGTCCGGTCT 3¢
F 5¢ TCTGCTCCTCCCGATCCCTCCT 3¢
R 5¢ GAGGTGGCCTGTCGTCCGGTCT 3¢
F 5¢ TCTGCTCCTCCCGATCCCTCCT 3¢
R 5¢ CCA GAA GGT TGC ACT TGT CC 3¢
F 5¢ TGC TGT GAC CCA CTC TGT CT 3¢
R 5¢ TTCACAGAGTTTAACAGCCC 3¢
Primer sequence
Polymorphisms
275 bp
116–128 bp
224 bp
224 bp
155 bp
150 bp
Fragment length
Table 1 Primer and PCR conditions for genotyping the EGFR pathway
–
–
Sac II
BseR I
BstN I
Alu I
Enzyme
95°C 30¢, 54°C 30¢, 72°C 45¢, 35 cycles
94°C 30¢, 55°C 30¢, 72°C 30¢, 30 cycles
98°C 5¢, 59°C 10¢, 72°C 20¢, 38 cycles
98°C 5¢, 59°C 10¢, 72°C 20¢, 38 cycles
94°C 60¢, 62°C 45¢, 72°C 30¢, 35 cycles
95°C 30¢, 51°C 30¢, 72°C 30¢, 35 cycles
Cycle conditions
348 Mogck, Goekkurt, and Stoehlmacher
The EGFR Pathway as an Example for Genotype
349
Fig. 2. RFLP of EGFR promoter polymorphisms (−191 and −216). Lanes 1 and 8: 50-bp ladder; lane 2: EGFR-191 wild type; lane 3: EGFR-191 mutant; lane 4: EGFR-191 heterozygote; lane 5: EGFR-216 wild type; lane 6: EGFR-216 mutant; lane 7: EGFR-216 heterozygote.
Fig. 3. Detection of the EGFR (CA)n intron 1 polymorphism using the GeneScan method, (a) heterozygote (18/20), (b) homozygote (18/18).
4. Notes 1. DNA should not be resuspended by pipetting. Store the solution overnight at 4°C to resuspend the DNA completely. For optimal quality of DNA, storage is recommended at −20°C.
350
Mogck, Goekkurt, and Stoehlmacher
2. The surface of the tumor corresponds to the DNA concentration. To assess the surface of the tissue to be dissected, use a grid based on millimeter paper. 3. There are different chemistries for the sequence PCR. For short fragments, use a mix with a high concentration of ddNTPs. 4. When using a capillary sequencer, choose the right polymer (e.g., POP4 is better for sequencing short fragments; POP7 is recommended for high throughput). 5. It is very important that are no air bubbles in gel. Pour the gel while knocking on the glass plates. 6. FLS is not stable, it can be stored at 4°C for 1 week. For better results, always prepare fresh solution. References 1. McLeod H. (2006) Individualizing cancer chemotherapy. Clin Adv Hematol Oncol. 4(4), 259–61. 2. Evans W. E., and Relling M. V. (2004) Moving towards individualized medicine with pharmacogenomics. Nature 429, 464–8. 3. Overmann M. J., and Hoff P. M. (2007) EGFR-targeted therapies in colorectal cancer. Dis Colon Rectum 50, 1259–70. 4. Reuter C. W., Morgan M. A., and Eckardt A. (2007) Targeting EGF-receptor-signalling in squamous cell carcinomas of the head and neck. Br J Cancer 96, 408–16 5. Gebhardt F., Bürger H., and Brandt B. (2000) Modulation of EGFR gene transcription by secondary structures, a polymorphic repetitive sequence and mutations - a link between
genetics and epigenetics. Histol Histopathol 15, 929–36 6. McCubrey J. A., Steelman L. S., Chappell W. H., Abrams S. L., Wong E. W., Chang F., et al. (2007) Roles of the Raf/MEK/ERK pathway in cell growth, malignant transformation and drug resistance. Biochim Biophys Acta 1773, 1263–84. 7. Lievre A., Bachet J. B., LeCorre D., Boige V., Laudi B., Emile J. F., et al. (2006) KRAS mutation status is predictive of response to cetuximab therapy in colorectal cancer. Cancer Res 66, 3992–5. 8. Amado R. G., Wolf M., Freeman D., Peeters M., Van Cutsem E., Siena S., et al. (2007) Wild-type KRAS is required for panitumumab efficacy in patients with metastatic colorectal cancer: results from a randomized, controlled trial. ECCO 2007, LBA#7.
Chapter 18 Quantitation of CD39 Gene Expression in Pancreatic Tissue by Real-Time Polymerase Chain Reaction Martin Loos, Beat Künzli, and Helmut Friess Summary Within the past decade, the field of gene expression analysis has constantly evolved, with numerous technologies being available for RNA quantification, including differential display, serial analysis of gene expression (SAGE), quantitative real-time (qRT) polymerase chain reaction (PCR), and microarrays. Although every technique has its specific application, the high levels of accuracy, reproducibility, sensitivity, and specificity have established qRT-PCR as a standard method for detection and quantification of gene expression. In this chapter, all steps of the qRT-PCR procedure, including purification of total RNA from animal tissues, reverse transcription to complementary DNA (cDNA), and quantification of relative gene expression are discussed. We chose qRT-PCR analysis of CD39 in pancreatic tissue as an example that is applicable to any gene of interest. CD39/ecto-nucleoside triphosphate diphosphohydrolase-type-1 (ENTPD1) is the dominant vascular ecto-nucleotidase that hydrolyzes extracellular nucleotides to integrate purinergic signaling responses. It has recently been associated with tumor growth and proliferation in melanoma cells and linked to pancreatic cancer progression. Key words: Gene expression analysis, Quantitative real-time polymerase chain reaction (qRT-PCR), Pancreatic cancer, CD39
1. Introduction Gene expression analysis is widely used in biological and biomedical research. In cancer research, quantitative RNA analysis plays a fundamental role in the identification of aberrant gene expression, which is not only responsible for the development and progression but is also responsible for the resistance to treatment of malignant diseases. DNA microarrays are one of the most popular techniques for high-throughput transcriptional profiling. Robert Grützmann and Christian Pilarsky (eds.), Cancer Gene Profiling: Methods and Protocols, Methods in Molecular Biology, vol. 576, DOI 10.1007/978-1-59745-545-9_18, © Humana Press, a part of Springer Science + Business Media, LLC 2010
351
352
Loos, Künzli, and Friess
The power of microarrays lies mostly in the simultaneous quantification of thousands of different target genes, allowing extensive comparisons for gene expression changes between different groups, e.g., healthy and cancer tissues. Although the reproducibility of microarrays has been improved, other more accurate techniques have been established. Therefore, microarrays are mostly used as a broad messenger RNA (mRNA) “fishing” strategy to narrow down potential gene targets that should be further validated. Quantitative real-time (qRT) polymerase chain reaction (PCR) represents a powerful tool for detection and quantification of gene expression (1, 2). It is a refinement of the original PCR developed by Kary Mullis and co-workers in the mid-1980s (3). In addition to the amplification of a specific DNA template, qRT-PCR allows the investigator to quantify the amount of template prior to the start of the PCR process. All qRT-PCR systems depend on the detection and quantitation of a fluorescent reporter signal, which increases directly in proportion to the amount of PCR product in a reaction. Data are collected at each cycle, making it possible to monitor the PCR during the exponential amplification phase (where the amount of PCR product reflects the initial amount of target template), unlike conventional PCR. The most commonly used methods for detecting target templates include 5¢ nuclease assays. The 5¢ nuclease assay uses an oligonucleotide probe (TaqMan probe, Applied Biosystems, Foster City, CA, USA), which specifically anneals to a complementary sequence between the forward and reverse primer sites (4). For visualization, the oligonucleotide probe carries a fluorescein group such as 6-carboxyfluorescein (6-FAM) at the 5¢ end and a quencher such as 6-carboxy-tetramethyl-rhodamine (TAMRA) at 3¢ end. When the probe is intact, the proximity of the reporter dye to the quencher dye results in suppression of the reporter fluorescence primarily by Förster-type energy transfer (5). As the Taq DNA polymerase extends the PCR primer in the 5¢ to 3¢ direction, the 5¢ exonuclease activity of the polymerase will degrade the oligonucleotide probe. As a result, the reporter dye gets separated from the quencher dye, resulting in increased fluorescence of the reporter (Fig. 1). Because the fluorescent signal is generated only if the oligonucleotide probe hybridizes with its complementary target, nonspecific amplification is not detected. The increase in fluorescence can be plotted on a graph through a single reading at the end of each PCR cycle (amplification plot, Fig. 2). Relative quantitation can be used for quantitative measurement. In this method, a comparison within a sample is made between the gene of interest and a control gene. Therefore, the cycle threshold (Ct) of the control gene is subtracted from the Ct of the gene of interest. The Ct value is calculated based on the time (measured in PCR cycle numbers) at which the reporter
Quantitation of Cd39 Gene Expression in Pancreatic Tissue
353
Fig. 1. 5¢ Nuclease assay using TaqMan probe. This probe hybridizes to the template between the standard PCR primers. The oligonucleotide probe carries a fluorescein group (6-FAM) (a) at the 5¢ end and a quencher (TAMRA) (b) at the 3¢ end. During the extension phase, degradation of the probe by the activity of the Taq DNA polymerase cleaves the reporter dye from the probe and generates a fluorescent signal that can be detected.
Fig. 2. Amplification plot. The threshold is calculated as ten times the standard deviation of the average signal of the baseline fluorescent signal. A fluorescent signal above the threshold is considered a real signal that can be used to define the threshold cycle (Ct). The Ct is calculated based on time (measured in PCR cycle numbers) at which the reporter fluorescent emission increases beyond the threshold.
354
Loos, Künzli, and Friess
fluorescence emission increases beyond a threshold level (based on background levels). The Ct value is correlated to the level of starting mRNA. The higher the starting mRNA levels, the lower the Ct value, because fewer PCR cycles are required for the reporter fluorescence emission intensity to reach the threshold (6). The resulting difference in cycle number (DCt) is the exponent of the base 2 (due to the doubling function of PCR), representing the fold difference of template for these two genes. A prerequisite for the application of the relative quantitation method is that the genes analyzed have similar PCR efficiencies, preferably >95%. The PCR efficiency can be measured by performing a tenfold serial dilution of a positive control template, and by plotting the Ct as a function of log10 concentration of template (standard curve, Fig. 3). The resulting slope of the line will be a function of the PCR efficiency. A slope of −3.32 indicates the PCR is 100% efficient, meaning that the amount of template is doubled after each cycle. In this chapter, we describe the isolation of total RNA from frozen tissues, the reverse transcription to complementary DNA (cDNA), and the analysis of mRNA expression using qRT-PCR. We chose qRT-PCR analysis of CD39 in pancreatic tissues as an example that is applicable to any gene of interest (7). Relative mRNA expression of CD39 for normal pancreas, chronic pancreatitis, and pancreatic cancer is shown in Fig. 4. CD39 mRNA expression is significantly upregulated in chronic pancreatitis and pancreatic cancer, compared with healthy pancreas.
Fig. 3. Standard curve plot. The standard curve is used for calculation of PCR efficiency and quantification (PCR efficiency = 10L(1/S)). The resulting Ct values for each input amount of template are plotted as a function of the log10 concentration of input amounts (cross marks), and a linear trendline is fit to the data.
Quantitation of Cd39 Gene Expression in Pancreatic Tissue
355
Fig. 4. mRNA expression of CD39 in human pancreas. Relative mRNA expression for healthy pancreas, chronic pancreatitis, and pancreatic cancer is shown for CD39. Expression is shown as number of copies. Asterisk indicates statistical significant overexpression of CD39 in chronic pancreatitis and pancreatic cancer, compared with healthy pancreas.
2. Materials 2.1. Isolation of Total RNA
For isolation of total RNA, we use Qiagen Kit, RNeasy® Protect Mini Kit, Cat. #74124. 1. Steel mortars and pestles. 2. Sterile pipette tips. 3. Microcentrifuge (with rotor for 2-ml tubes). 4. Disposable gloves. 5. RNeasy Mini Spin Columns. 6. QIAshredder Spin Columns. 7. 1.5-ml Collection tubes. 8. 2-ml Collection tubes. 9. Buffer RW1. Store at room temperature (RT). 10. Buffer RLT. Store at RT. 11. Buffer RLC. Store at RT. 12. Buffer RPE. Store at RT. 13. 14.3 M 2-b mercaptoethanol (b-ME). Store at 4°C. 14. RNase-free water. Store at RT. 15. 96–100% Ethanol. Store at RT. 16. 70% Ethanol. Store at RT.
356
Loos, Künzli, and Friess
2.2. Reverse Transcription
For the preparation of cDNA, we use TaqMan® Reverse Transcription Reagents (Cat. #4304134). 1. MgCl2. 2. 25× dNTPs. 3. 10× RT buffer. 4. RT random primers. 5. RNase inhibitor. 6. Reverse transcriptase. 7. RNase-free water. 8. Total RNA. 9. Spectrophotometer, e.g., GeneQuant II RNA/DNA calculator (Pharmacia Biotech, Amersham Biosciences). 10. RNAsecure™ reagent (Ambion).
2.3. PCR
1. Double-distilled H2O (RNase free and DNase free). 2. 10× Taq buffer A (Applied Biosystems). Store at −20°C. 3. 10 mM dNTP mix. 4. 5 U/ml Taq DNA polymerase. Store at −20°C. 5. 200 mM Forward primer (e.g., of CD39 and 18S, Applied Biosystems). Store at 4°C. 6. 200 mM Reverse primer (e.g., of CD39 and 18S, Applied Biosystems). Store at 4°C. 7. 100 mM Probe (e.g., of CD39 and 18S, labeled with FAM and TAMRA, Applied Biosystems). Store at 4°C.
2.4. Determination of CD39 by RT-PCR
1. 96-Well optical reaction plates with optical caps (eight caps/ strip, both Applied Biosystems). 2. ABI PRISM 7700 Sequence Detection System (Applied Biosystems) or compatible real-time cycler with sequencedetection software.
3. Methods 3.1. Purification of Total RNA from Mouse Pancreatic Tissues
We use Qiagen Kit, RNeasy® Protect Mini Kit, Cat. #74124 according to the manufacturer’s protocol, with individual modifications. 1. b-ME must be added to Buffer RLT before use. Add 10 ml b-ME per 1 ml Buffer RLT. Dispense in a fume hood. The reagent can be stored at RT for up to 1 month. 2. Before using Buffer RPE, add 4 volumes of 96–100% ethanol to obtain a working solution.
Quantitation of Cd39 Gene Expression in Pancreatic Tissue
357
3. Place 30 mg of frozen, stabilized pancreatic tissue in the cooled mortar. Add liquid nitrogen into the mortar. 4. Grind the sample thoroughly by twisting the pestle. To obtain high RNA yields, the tissue must be finely pulverized (but not thawed). 5. Transfer the suspension of pulverized tissue and liquid nitrogen into an RNase-free, liquid nitrogen-cooled, 2-ml microcentrifuge tube. Place the tube on dry ice and allow the liquid nitrogen to evaporate, but do not allow the tissue to thaw. 6. Add 600 ml of Buffer RLT and immediately start homogenizing the tissue lysates. Load up to 700 ml of lysate onto a QIAshredder spin column placed in a 2-ml collection tube and spin for 2 min at 20,000 × g in a microcentrifuge. The lysate is homogenized as it passes through the spin column. 7. Centrifuge the lysate for 3 min at 20,000 × g at RT. Carefully remove the supernatant by pipetting, and transfer it to a new microcentrifuge tube. Use only this supernatant in subsequent steps. 8. Add 1 volume of 70% ethanol to the cleared lysate, and mix immediately by pipetting. Do NOT centrifuge. 9. Transfer up to 700 ml of the sample, including any precipitate that may have formed, to an RNeasy spin column placed in a 2-ml collection tube. Close the lid gently, and centrifuge for 15 s at >10,000 × g. Discard the flow-through. Reuse the collection tube in step 8. 10. Add 700 ml Buffer RW1 to the RNeasy spin column. Close the lid gently, and centrifuge for 15 s at >10,000 × g to wash the spin column membrane. Discard the flow-through. Reuse the collection tube in step 9. 11. Add 500 ml Buffer RPE to the RNeasy spin column. Close the lid gently, and centrifuge for 15 s at >10,000 × g to wash the spin column membrane. Discard the flow-through. Reuse the collection tube for step 10. 12. Add 500 ml Buffer RPE to the RNeasy spin column. Close the lid gently, and centrifuge for 2 min at >10,000 × g to wash the spin column membrane. Carefully remove the RNeasy spin column from the collection tube so that the column does not contact the flow-through to avoid carryover of ethanol. 13. Place the RNeasy spin column in a new 1.5-ml collection tube. Add 30–50 ml RNase-free water directly to the spin column membrane. Close the lid gently, and centrifuge for 1 min at 10,000 × g to elute the RNA. The RNA is now ready for reverse transcription. It can be stored at −20°C for several weeks.
358
Loos, Künzli, and Friess
3.2. RNA Concentration Measurement
The purity and quantity of total RNA must be determined spectrophotometrically. Here, we describe the use of a spectrophotometer designed for this purpose – GeneQuant II RNA/ DNA calculator (Pharmacia Biotech, Amersham Biosciences). The sample is taken up into a quartz capillary tube. This allows for the measurement of volumes of samples as small as 2 ml. The machine is referenced with nuclease-free dH2O containing 1× RNAsecure™ reagent. The spectral absorption at 260 and 280 nm is measured and the purity of RNA determined from the A260/280 ratio. Values between 1.8 and 2.0 are observed for high-quality RNA, whereas lower values correspond to poorquality RNA. The concentration of RNA is calculated from the spectral absorption at 260 nm using the Beer–Lambert Law: C = A/el, where C is the RNA concentration (in mg/ml); A is the absorption (260 nm); e is the RNA extinction coefficient (38 mg/ml); and l is the pathlength (0.05 cm).
3.3. Reverse Transcription of Total RNA to cDNA
1. For preparation of the Mastermix for a single 100-ml reaction, thaw all reagents on ice. Mix all reagents thoroughly by gentle pipetting and spin down as indicated in Table 1 (except reverse transcriptase). 2. Combine all reagents (except RNA) and mix thoroughly by gentle pipetting. Then spin down. 3. Aliquot the Mastermix into PCR tubes. 4. Add individual sample RNA to the appropriate PCR tubes.
Table 1 Preparation of the Mastermix for a single 100-ml reaction Reagent
Volume (ml)
Final concentration
25 mM MgCl2
22.0
5.5 mM
dNTP mixture
20.0
500 mM/dNTP 10×
TaqMan RT buffer
10.0
1×
Random hexamer (50 mM)
5.0
2.5 mM
RNase inhibitor (20 U/ml)
2.0
0.4 U/ml
Reverse transcriptase (50 U/ml)
2.5
1.25 U/ml
Total RNA content of each individual sample
0.5 mg
RNase-free water
Up to 100 ml of total volume
Quantitation of Cd39 Gene Expression in Pancreatic Tissue
359
5. Mix the contents by pipetting and spin down to remove any air bubbles. 6. Place the PCR tube into a thermal cycler. 7. Thermal cycling parameters: Incubation
10 min at 25°C
Reverse transcription
45 min at 42°C
Inactivation
5 min at 95°C
8. Store all cDNA at −20°C. 3.4. RT-PCR Using the 7700 Sequence Detector 3.4.1. Initial Preparation
1. Remove the TaqMan universal PCR Mastermix (TaqMan®, Cat. #4304437), the specific primer of interest (Applied Biosystem), and the individual cDNA probe from the −20°C freezer and thaw on ice before you begin the preparation of the Mastermix. 2. Dilute cDNA 1:5 in PCR water at RT and mix by gentle pipetting. 3. Prepare the Mastermix according to the following protocol: – 12.5 ml/well of TaqMan universal PCR Mastermix (TaqMan®, Cat. #4304437). – 1.25 ml/well of specific primer of interest (Applied Biosystem). – 6.25 ml/well of H2O. – 5 ml/well of 1:5 diluted individual cDNA probe. 4. For an internal control, we used the following primers from Applied Biosystems: – 18S ribosomal RNA (rRNA), Probe dye VIC-MGB (Cat. #4319413E-0312010). 5. Make sure you sufficiently mix the contents by gentle pipetting.
3.4.2. Basic PCR Plate Set Up
The final 96-well PCR plate should include the following samples (Table 2): 1. No-template control samples (NTC), including all PCR components, except the individual template (at least three NTC per gene). 2. Unknown samples (S1, S2, S3, …) of which, each one should be run at least in triplicate. 3. Control samples without reverse transcriptase (NRT), containing RNA instead of cDNA; at least one NRT per sample (S1, S2, S3, …).
3.4.3. Data Analysis
1. To read the prepared plate on the 7700 Sequence Detector, click on “Analyze” in the application menu of the sequence detection software and scroll to “Analyze data” to get the amplification curves.
360
Loos, Künzli, and Friess
Table 2 96-well plate with normal template control (NTC), control sample without reverse transcriptase (NRT), individual samples S1, S2, S3, …, and 18S RNA expression of sample 1 (18S(S1)), sample 2 (18S(S2)), and sample 3 (18S(S3)). Triplicates of samples are mandatory for a precise analysis 1
2
3
4
5
A
NTC
S1
S2
S3
…
B
NTC
S1
S2
S3
…
C
NTC
S1
S2
S3
…
D
NRT
18S(S1)
18S(S2)
18S(S3)
…
E
NRT
18S(S1)
18S(S2)
18S(S3)
…
F
NRT
18S(S1)
18S(S2)
18S(S3)
…
6
7
8
9
10
11
12
G H
Table 3 Example for calculating the ▵DCt value of a randomly chosen sample standardized to the expression of 18S. Standardization of the ▵DCt value of 18S (STDZ ▵D18S) is demonstrated Sample type
Detector gene
Ct value (well)
Ct value (well)
Ct value (well)
Average Ct value
STDZ ▵D18S
Relative expression
S1
CD39
32.07 (A2)
32.06 (B2)
32.09 (C2)
32.07
20.98
48.35
18S (S1)
18S
11.03 (D2) 11.15 (E2)
11.07 (F2)
11.09
2. Use the same threshold level when comparing the Ct values of the standards with one another or when comparing the Ct values of the unknown samples. 3. The PCR efficiency needs to be 95%. 4. Generate a report file and export the file in Microsoft Excel format for further analysis as displayed in Table 3. 5. To finally calculate the ▵Ct value of the detector gene CD39, the relative RNA expression is standardized to a defined housekeeping gene (e.g., 18S). Therefore, the standardized D18S value (STDZ ▵D18S) equals the subtraction of the average Ct value of sample 1 minus the average Ct value of total 18S of sample (STDZ D18S = average Ct [S1] − average Ct of [18S
Quantitation of Cd39 Gene Expression in Pancreatic Tissue
361
{S1}]). The relative RNA expression level of CD39 sample 1 (S1) is finally calculated by the formula (POWER[2,−{STDZ D18S}] multiplied by a correction factor 1e8) (in our example 48,35). The correction factor is randomly chosen to allow further analysis with easy manageable numbers. 6. An important note has to be made about comparison of data points. We can only compare relative RNA expression of samples standardized to a housekeeping gene within related sample populations, ideally displayed on the same 96-well plate.
4. Notes 1. Changes in the gene expression pattern can occur due to specific and nonspecific RNA/DNA degradation. RNA is easily degraded by ribonucleases (RNases), which are abundant in the environment and difficult to eliminate. Sterile, RNasefree microcentrifuge tubes and pipet tips should be used at all times. Gloves should be changed regularly and designated pipets should be used for RNA work. All solutions containing RNA need to be kept on ice at all times to minimize RNA degradation by contaminating ribonucleases. 2. As with all enzymatic reactions, mix all non-enzymatic components first and then add the enzymatic components. 3. Always wear a suitable lab coat, disposable gloves, and safety goggles when working with chemicals. Buffer RLC containing guanidine hydrochloride, Buffer RLT containing guanidine thiocyanate, and b-mercaptoethanol are harmful chemicals. Inhalation, ingestion, and skin and eye contact should be avoided at all times. 4. The use of a Mastermix markedly reduces the number of reagent transfers per sample and minimizes reagent loss and sample-to-sample variations. In addition, the use of multichannel pipettes is essential to minimize pipetting errors.
References 1. Higuchi, R., Fockler, C., Dollinger, G., Watson, R. (1993) Kinetic PCR analysis: real-time monitoring of DNA amplification reactions. Biotechnology 11, 1026. 2. Gibson, U.E., Heid, C.A., Williams, P.M. (1996) A novel method for real time quantitative RT-PCR. Genome Res. 6, 995–1001.
3. Mullis, K.B., Faloona, F.A. (1987) Specific synthesis of DNA in vitro via polymerasecatalyzed chain reaction. Methods Enzymol. 155, 335–350. 4. Holland, P.M., Abramson, R.D., Watson, R. (1991) Detection of specific polymerase chain reaction product by utilizing the 5¢-3¢
362
Loos, Künzli, and Friess
exonuclease activity of Thermus aquaticus DNA polymerase. Proc. Natl. Acad. Sci. USA 88, 7276–7280. 5. Foerster, V.T. (1948) Intermolecular energy transfer and fluorescence. Ann. Phys. 2, 55–75. 6. Winer, J., Jung, C.K., Shackl, I., Williams, P.M. (1999) Development and validation of real-time quantitative reverse transcriptasepolymerase chain reaction for monitoring gene
expression in cardiac myocytes in vitro. Anal. Biochem. 270, 41–49. 7. Künzli, B.M., Berberat, P.O., Giese, T., Csizmadia, E., Kaczmarek, E., Baker, C., Halaceli, I., Büchler, M.W., Friess, H., Robson, S.C. (2007) Upregulation of CD39/NTPDases and P2 receptors in human pancreatic disease. Am. J. Physiol. Gastrointest. Liver Physiol. 291, G223–G230.
Chapter 19 Functional profiling methods in cancer Joaquín Dopazo Summary The introduction of new high-throughput methodologies such as DNA microarrays constitutes a major breakthrough in cancer research. The unprecedented amount of data produced by such technologies has opened new avenues for interrogating living systems although, at the same time, it has demanded of the development of new data analytical methods as well as new strategies for testing hypotheses. A history of early successful applications in cancer boosted the use of microarrays and fostered further applications in other fields. Keeping the pace with these technologies, bioinformatics offers new solutions for data analysis and, what is more important, permits the formulation of a new class of hypotheses inspired in systems biology, more oriented to pathways or, in general, to modules of functionally related genes. Although these analytical methodologies are new, some options are already available and are discussed in this chapter. Key words: Functional profiling, Functional enrichment, Gene-set analysis, Pathway, Gene ontology, Systems biology, Microarray
1. Introduction Among the battery of high-throughput methodologies that are revolutionizing cancer research, DNA microarrays can be considered the standard due to their popularity and characteristics. Although many different questions can be addressed though microarray experiments, there are usually three types of objective in this context: “class comparison,” “class prediction,” and “class discovery” (1, 2). The first two objectives usually involve the application of tests to define differentially expressed genes, or the use of different procedures to predict class membership on the basis of the values observed for a number of “key” genes. Robert Grützmann and Christian Pilarsky (eds.), Cancer Gene Profiling: Methods and Protocols, Methods in Molecular Biology, vol. 576, DOI 10.1007/978-1-59745-545-9_19, © Humana Press, a part of Springer Science + Business Media, LLC 2010
363
364
Dopazo
Clustering methods belong to the last category, also known as unsupervised analysis, because no previous information about the class structure of the data set is used in the study. When strategies for microarray data analysis are considered from a historical perspective, an initial period can be distinguished in which almost all publications were related to reproducibility and sensitivity issues. Many classic microarray papers dating from the late 1990s were mainly proof-of-principle experiments (3, 4). Consequently, the methodological approaches used for analysis were mainly related to clustering and, in general, unsupervised approaches. This has caused a subsequent confusion with respect to the choice of the appropriate methodology for a proper data analysis, as noted by some authors (5). Later, sensitivity became a main concern as a natural reaction against very liberal interpretations of microarray experiments, such as the fold criteria, to select differentially expressed genes. It was soon obvious that genome-scale experiments should be carefully analyzed because many apparent associations happened merely by chance (6). In this scenario, methods for the adjustment of p-values, which are considered standard today, started to be extensively used (7, 8). The increasingly use of microarrays as predictors of clinical outcomes (9), despite not being free of criticisms (5), fueled the use of the methodology because of its practical implications. Comparative studies show that, although intra-platform reproducibility seems to be high, cross-platform and cross-laboratory coherence is still an issue (10). Another aspect that soon became of major importance was the interpretation of microarray experiments in terms of their biological implications, rather than restricting them to a mere comparison of lists of gene identifiers (11, 12). Thus, a number of methods that essentially search for the overrepresentation of functional modules within groups of genes previously defined in the experiment were developed. Examples of repositories widely used to define gene modules are Gene Ontology (GO) (13), KEGG pathways (14), or Biocarta (http://www. biocarta.com). Programs such as GOMiner (15), FatiGO (16), etc., can be considered representatives of a family of methods that use these gene module functional definitions to conjecture about the interpretation of the results of microarray experiments (17). The difficulties for defining repeatable lists of genes of interest across laboratories and platforms even using common experimental and statistical methods (18) has led several groups to propose different approaches that aim to select genes taking into account their functional properties. The Gene Set Enrichment Analysis (GSEA) (19, 20) has pioneered a family of methods devised not to find individual genes but to search for groups of functionally related genes with a coordinate (although not necessarily high) overexpression or underexpression across a list of genes ranked by differential expression between two classes, compared in microarray experiments. Different tests have recently been proposed for microarray data,
Functional Profiling methods in cancer
365
with this aim in mind (12, 21–25) and also for expressed sequence tags (ESTs) (26), and some of them are available on web servers (12, 27). In particular, the FatiScan procedure (12, 27) can deal with ordered lists of genes independently from the type of data that originated them. This interesting property allows for its application to a broad range of experimental designs (case–control, multiclass, survival, etc.) as well as to other type of high-throughput data apart from microarrays. Thus, in addition to the conventional study of individual genes and proteins, genome-wide approaches based on highthroughput methodologies have helped to uncover fundamental principles of tumorigenesis, and increasing evidence points to cooperative, systems-level events as important factors to understand the mechanisms by which cancer gene products coordinately promote cellular transformation (28, 29). Moreover, modern trends in the pharmaceutical industry also point toward the use of functional genomics and systems biologyoriented studies (30, 31) as fundamental steps of the drug discovery pipeline.
2. Materials 2.1. Definition of Gene Modules: Sources of Information
Any functional analysis relies on the definition of gene modules related by biological properties of interest. Probably the most widely used source of definition of functional modules is the Gene Ontology (GO) catalog (13). GO represents the biological knowledge as a tree (more precisely as a directed acyclic graph [DAG], in which a node can have more than one parent) where functional terms near the root of the tree make reference to more general concepts while deeper functional terms near the leaves of the tree make reference to more specific concepts. If a gene is annotated to a given level then it is automatically considered to be annotated at all of the upper levels (all of the parent levels) up to the root. Because genes are annotated at different levels of the GO hierarchy, it is common to use this abstraction to choose a predefined level in the hierarchy instead of using directly the original levels of annotation of the genes (11, 32), which increases the power of the enrichment tests (11, 12, 33, 34). The KEGG pathways database (14) or the Biocarta pathways (http://www.biocarta.com) are two extensively used sources of functional information. There are also databases that contain functional motifs mapped to proteins, such as the Interpro database (35) and many others. In addition, other types of modules, such as transcriptional ones, can be defined as groups of genes under the same regulatory control.
366
Dopazo
Databases that collect regulatory motifs are available. Among the most popular are CisRed (36) and Transfac, which contains predictions of transcription factor binding sites (37). In addition, negative regulation mediated by microRNAs has recently gained relevance. The miRBase (38) contains putative gene targets of such microRNAs. Genes sharing one or more of these regulatory motifs can be considered a putative regulatory module. Other ways of defining modules of different nature include the use of information obtained using text-mining procedures (39), chromosomal location (40, 41), protein–protein interactions, etc. 2 .2. Bioinformatics Tools
Beyond other technical or statistical considerations, the approximate level of acceptance of different gene-set analysis (GSA) methods among the scientific community is reported in Tables 1 and 2. Table 1 presents an exhaustive list of bioinformatics tools available for functional profiling that implement tests for functional enrichment. Here the number of Scholar Google citations has been used as an approximate popularity index, given that it is reflecting the number of academic documents (mostly papers) citing a particular paper. Following this criterion, the most popular tools having more than 200 citations are EASE (42), DAVID (43), GOMiner (15), Babelomics/FatiGO (12, 16, 34), MAPPFinder (44), GOStats (45), and Ontotools (46). In the case of GSA methods, Table 2 shows that more than the 75% of the Scholar Google citations are monopolized by two tools: GSEA and Babelomics.
3. Methods 3.1. Functional Enrichment Methods
In the conventional approach for the functional annotation of microarray experiments, known as functional enrichment analysis, the functional interpretation of the data is performed in two steps: in a first step, genes of interest are selected using different procedures. In a subsequent step, the selected genes of interest are compared with a background (usually the rest of the genes) to find enrichment in any gene module. This comparison with the background is essential because an apparently high proportion of a given functional module could easily be nothing but a reflection of a high proportion of this particular module in the whole genome but not a proper enrichment. Actually, both enrichments and depletions of gene modules are potentially of interest. Therefore, unless there is a specific reason not to consider enrichment or depletion, two-sided tests are appropriate (47). This comparison between the selected genes and the background can be carried out
Functional Profiling methods in cancer
367
Table 1 Functional enrichment data analysis tools with at least ten Scholar Google citations Tool
Application type or URL for web servers
References
Citationsa
EASE
Windows application
(42)
603
DAVID
http://www.DAVID.niaid.nih.gov
(43)
504
GOMiner
http://discover.nci.nih.gov/gominer/
(15, 55)
408
Babelomics
http://www.babelomics.org
(12, 16, 34, 50, 56)
402
MAPPFinder
http://www.GenMAPP.org
(44)
379
FatiGO
http://www.fatigo.org
(16)
341
GOStat
http://gostat.wehi.edu.au/
(45)
249
Ontotools
http://vortex.cs.wayne.edu/ ontoexpress/
(32, 46, 57–59)
223
GOTM
http://genereg.ornl.gov/gotm/
(60)
164
GO::TermFinder
Perl script
(61)
152
FunSpec
http://funspec.med.utoronto.ca
(62)
100
GeneMerge
http://www.oeb.harvard.edu/hartl/ lab/publications/GeneMerge.html
(63)
96
FuncAssociate
http://llama.med.harvard.edu/ Software.html
(64)
91
BINGO
Cytoscape plugin
(65)
75
GOToolBox
http://gin.univ-mrs.fr/GOToolBox
(66)
74
GFINDer
http://www.medinfopoli.polimi.it/ GFINDer/
(67, 68)
49
WebGestalt
http://bioinfo.vanderbilt.edu/webgestalt/
(69)
46
GOSurfer
R package
(70)
45
CLENCH
Perl script
(71)
26
Pathway Explorer
https://pathwayexplorer.genome. tugraz.at/
(72)
25
Ontology Traverser
R package
(73)
24
THEA
Java standalone
(74)
11
WebBayGO
http://blasto.iq.usp.br/~tkoide/ BayGO/
(75)
10
GOStat
R package
(76)
10
a Citations are taken from Scholar Google (by January 2008). Scholar Google is taken as an indirect estimation of the citation in papers but gives an idea on the impact in the scientific community
368
Dopazo
Table 2 Tools available for functional profiling by gene-set analysis with at least ten Scholar Google citations Tool
Application type or URL for web servers
References
Test
Citationsa
(19, 20)
GS, C
1,013
Babelomics (FatiGO + http://www.babelomics.org FatiScan)
(12, 16, 34, 50, 56)
FE/GS, C
402
FuncAssociate
http://llama.med.harvard. edu/Software.html
(64)
FE/GS, C
91
Global test
R package
(22)
GS, SC
89
PAGE
Python script
(25)
GS, C
42
ErmineJ
Java
(77)
GS, C
35
FatiScan
http://www.babelomics.org
(50)
GS, C
34
GO-mapper
Windows, Perl script
(24)
GS, C
33
SAFE
R package
(49)
GS, C
27
GOAL
http://microarrays.unife.it
(78)
GS, C
25
Catmap
Perl script
(79)
GS, C
19
PLAGE
http://dulci.biostat.duke. edu/pathways/
(80)
GS, SC
18
GODist
Mathlab program
(81)
GS, SC
17
t-Profiler
http://www.t-profiler.org/
(82)
GS, C
12
GSEA
http://www.broad.mit.edu/ gsea/
Type of test: GS gene set; C Competitive; FE functional enrichment; SC self-contained Citations are taken from Scholar Google (by January 2008). Scholar Google is taken as an indirect estimation of the citation in papers but gives an idea on the impact in the scientific community a
by means of the application of different tests, such as the hypergeometric, Fisher’s exact test c2 and binomial, which are considered to give similar results (47). Because many tests are conducted to check all the gene modules, adjustment for multiple testing, such as false discovery rate (FDR) (7) or others, must be used. 3.2. Gene-Set Analysis Methods
The interpretation of a genome-scale experiment using the twosteps functional enrichment approach is far from being optimal given that the thresholds imposed in the first step assuming independence preclude the detection of many gene modules. Methods directly inspired in systems biology focus on collective properties
Functional Profiling methods in cancer
369
of the genes more than on individual gene expression values. Modules of genes related by common functionality, regulation, or other interesting biological properties will simultaneously fulfill their roles in the cell and, consequently, they are expected to display a coordinated expression. In its simplest formulation, the GSA method uses a rank of values derived from the experiment analyzed. Mootha et al. (19) ranked the genes according to their differential expression when two predefined classes (diabetic versus healthy controls) were comparing by means of any appropriate statistical test (48). The position of the genes (that cooperatively act to define modules) within this ranked list is related to its participation in the trait studied in the experiment. Consequently, each module that is a causative agent of the differences between the classes compared will be found in the extremes of the ranked list with highest probability. Thus, instead of testing differential activities of genes, which implicitly assumes independent behavior (an aspect often ignored by the researchers applying the test), and later searching for enrichment in gene modules among the selected genes, GSA directly tests for gene modules significantly cumulated in the extremes of a ranked list of genes. In this way, artificial previous thresholds, which inadvertently change the meaning of our hypothesis testing schema, is avoided. Different methods have been proposed for this purpose, such as the GSEA (19, 20) or the SAFE (49) methods, which use a nonparametrical version of a Kolmogorov–Smirnov test. Other strategies proposed are the direct analysis of functional terms weighted with experimental data (24) or model-based methods (22). Methods with similar accuracy, although conceptually simpler and quicker, have also been proposed, for instance, the parametrical counterpart of the GSEA, the PAGE (25), or the segmentation test, Fatiscan (50). 3.3. Functional Profiling in Array–CGH Experiments
Genetic alterations, such as losses (deletions), gains (amplifications), or losses of heterozygosity (LOH) of genetic material that affect certain regions of the genome, have been shown to be the basis of many types of cancer (51). New technologies such as array– CGH, along with the use of expression arrays, offer for the first time the opportunity to accurately characterize the alterations in genomic copy number and the dependence of gene expression on the alterations (52). Despite the obvious fact that such alterations affect a large number of genes, most of the research is still focused in finding only one or a few genes responsible for a disease or a trait and ignores the chromosomal context (52). In particular, the putative impact that the local distribution of functions could have in the symptomatology of diseases that harbor copy number alterations or, in general, could have in gene regulation and/or silencing is largely unexplored. Actually, only a few attempts of
370
Dopazo
analyzing copy number alterations in terms of gains or losses of whole or parts of gene teams have been made to date (40, 41). Programs such as ISACGH (41) detect copy number alterations using conventional algorithms and allow a functional enrichment analysis of the regions with detected alterations. 3.4. Gene-Set Analysis in Genotyping
Another field in which a gene set-based approach could be very useful is genotyping. Association and linkage studies with chips with increasingly density result in a frustrating effect of decreasing the power of the tests, because of the strict corrections that must be applied to the tests. Most genetic disorders have a complex inheritance and can be considered the combined result of variants in many genes, each contributing only weak effects to the disease. Given that, in any disorder, most of the disease genes will be involved in only a few different molecular pathways, the knowledge of the relationships (functional, regulatory, interactions, etc.) between the genes can help in the assessment of possible candidates (which may reside in different loci) with a joint basis for the disease etiology. The use of different gene module definitions (GO, KEGG, protein interactions and coexpression) in an integrated network was recently applied to interrelate positional candidate genes from different disease loci and then to test 96 heritable disorders in the Online Mendelian Inheritance in Man database (53). This gene set-based strategy resulted in a 2.8-fold increase over random selection.
3.5. Conclusion
As research in cancer is increasingly benefited by the introduction of high-throughput technologies, new hypotheses, inspired in systems biology concepts, can be addressed and checked (54). Bioinformatics has become an essential tool not only as a mere instrument for managing the huge amount of data produced by these new technologies, but to implement a new generation of algorithms and concepts that are opening the doors to the understanding of cancer as a system (28, 29). Biomedicine is becoming more computational and research in cancer is pioneering this transformation.
Acknowledgments This work is supported by grants from the Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), ISCIII, projects BIO 2008-04212 from the Spanish Ministry of Education and Science and National Institute of Bioinformatics (http://www.inab.org), a platform of Genoma España. EA is supported by a fellowship for the FIS of the Spanish Ministry of Health (FI06/00027).
Functional Profiling methods in cancer
371
References 1. Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., et al. (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286, 531–537. 2. Allison, D.B., Cui, X., Page, G.P. and Sabripour, M. (2006) Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet, 7, 55–65. 3. Perou, C.M., Jeffrey, S.S., van de Rijn, M., Rees, C.A., Eisen, M.B., Ross, D.T., Pergamenschikov, A., Williams, C.F., Zhu, S.X., Lee, J.C., et al. (1999) Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. Proc Natl Acad Sci USA, 96, 9212–9217. 4. Alizadeh, A.A., Eisen, M.B., Davis, R.E., Ma, C., Lossos, I.S., Rosenwald, A., Boldrick, J.C., Sabet, H., Tran, T., Yu, X., et al. (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature, 403, 503–511. 5. Simon, R., Radmacher, M.D., Dobbin, K. and McShane, L.M. (2003) Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J Natl Cancer Inst, 95, 14–18. 6. Ge, H., Walhout, A.J. and Vidal, M. (2003) Integrating ‘omic’ information: a bridge between genomics and systems biology. Trends Genet, 19, 551–560. 7. Benjamini, Y. and Yekutieli, D. (2001) The control of false discovery rate in multiple testing under dependency. Ann Stat, 29, 1165–1188. 8. Storey, J.D. and Tibshirani, R. (2003) Statistical significance for genomewide studies. Proc Natl Acad Sci USA, 100, 9440–9445. 9. van ‘t Veer, L.J., Dai, H., van de Vijver, M.J., He, Y.D., Hart, A.A., Mao, M., Peterse, H.L., van der Kooy, K., Marton, M.J., Witteveen, A.T., et al. (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415, 530–536. 10. Moreau, Y., Aerts, S., De Moor, B., De Strooper, B. and Dabrowski, M. (2003) Comparison and meta-analysis of microarray data: from the bench to the computer desk. Trends Genet, 19, 570–577. 11. Al-Shahrour, F. and Dopazo, J. (2005) In Azuaje, F. and Dopazo, J. (eds.), Data analysis and visualization in genomics and proteomics. Wiley, pp. 99–112. 12. Al-Shahrour, F., Minguez, P., Vaquerizas, J.M., Conde, L. and Dopazo, J. (2005) BABELOMICS: a suite of web tools for functional
annotation and analysis of groups of genes in high-throughput experiments. Nucleic Acids Res, 33, W460–W464. 13. Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet, 25, 25–29. 14. Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y. and Hattori, M. (2004) The KEGG resource for deciphering the genome. Nucleic Acids Res, 32, D277–D280. 15. Zeeberg, B.R., Feng, W., Wang, G., Wang, M.D., Fojo, A.T., Sunshine, M., Narasimhan, S., Kane, D.W., Reinhold, W.C., Lababidi, S., et al. (2003) GoMiner: a resource for biological interpretation of genomic and proteomic data. Genome Biol, 4, R28. 16. Al-Shahrour, F., Diaz-Uriarte, R. and Dopazo, J. (2004) FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes. Bioinformatics, 20, 578–580. 17. Khatri, P. and Draghici, S. (2005) Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics, 21, 3587–3595. 18. Bammler, T., Beyer, R.P., Bhattacharya, S., Boorman, G.A., Boyles, A., Bradford, B.U., Bumgarner, R.E., Bushel, P.R., Chaturvedi, K., Choi, D., et al. (2005) Standardizing global gene expression analysis between laboratories and across platforms. Nat Methods, 2, 351–356. 19. Mootha, V.K., Lindgren, C.M., Eriksson, K.F., Subramanian, A., Sihag, S., Lehar, J., Puigserver, P., Carlsson, E., Ridderstrale, M., Laurila, E., et al. (2003) PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet, 34, 267–273. 20. Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., et al. (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA, 102, 15545–15550. 21. Goeman, J.J., Oosting, J., Cleton-Jansen, A.M., Anninga, J.K. and van Houwelingen, H.C. (2005) Testing association of a pathway with survival using gene expression data. Bioinformatics, 21, 1950–1957. 22. Goeman, J.J., van de Geer, S.A., de Kort, F. and van Houwelingen, H.C. (2004) A global test for groups of genes: testing association
372
Dopazo
with a clinical outcome. Bioinformatics, 20, 93–99. 23. Tian, L., Greenberg, S.A., Kong, S.W., Altschuler, J., Kohane, I.S. and Park, P.J. (2005) Discovering statistically significant pathways in expression profiling studies. Proc Natl Acad Sci USA, 102, 13544–13549. 24. Smid, M. and Dorssers, L.C. (2004) GOMapper: functional analysis of gene expression data using the expression level as a score to evaluate Gene Ontology terms. Bioinformatics, 20, 2618–2625. 25. Kim, S.Y. and Volsky, D.J. (2005) PAGE: parametric analysis of gene set enrichment. BMC Bioinformatics, 6, 144. 26. Chen, Z., Wang, W., Ling, X.B., Liu, J.J. and Chen, L. (2006) GO-Diff: mining functional differentiation between EST-based transcriptomes. BMC Bioinformatics, 7, 72. 27. Al-Shahrour, F., Minguez, P., Tarraga, J., Montaner, D., Alloza, E., Vaquerizas, J.M., Conde, L., Blaschke, C., Vera, J. and Dopazo, J. (2006) BABELOMICS: a systems biology perspective in the functional annotation of genome-scale experiments. Nucleic Acids Res, 34, W472–W476. 28. Khalil, I.G. and Hill, C. (2005) Systems biology for cancer. Curr Opin Oncol, 17, 44–48. 29. Kitano, H. (2004) Cancer as a robust system: implications for anticancer therapy. Nat Rev Cancer, 4, 227–235. 30. Butcher, E.C. (2005) Can cell systems biology rescue drug discovery? Nat Rev Drug Discov, 4, 461–467. 31. Searls, D.B. (2005) Data integration: challenges for drug discovery. Nat Rev Drug Discov, 4, 45–58. 32. Khatri, P., Sellamuthu, S., Malhotra, P., Amin, K., Done, A. and Draghici, S. (2005) Recent additions and improvements to the OntoTools. Nucleic Acids Res, 33, W762–W765. 33. Al-Shahrour, F., Arbiza, L., Dopazo, H., Huerta-Cepas, J., Minguez, P., Montaner, D. and Dopazo, J. (2007) From genes to functional classes in the study of biological systems. BMC Bioinformatics, 8, 114. 34. Al-Shahrour, F., Minguez, P., Tarraga, J., Montaner, D., Alloza, E., Vaquerizas, J.M., Conde, L., Blaschke, C., Vera, J. and Dopazo, J. (2006) BABELOMICS: a systems biology perspective in the functional annotation of genome-scale experiments. Nucleic Acids Res, 34, W472–W476. 35. Mulder, N.J., Apweiler, R., Attwood, T.K., Bairoch, A., Bateman, A., Binns, D., Bradley, P., Bork, P., Bucher, P., Cerutti, L., et al. (2005) InterPro, progress and status in 2005. Nucleic Acids Res, 33, D201–D205.
36. Robertson, G., Bilenky, M., Lin, K., He, A., Yuen, W., Dagpinar, M., Varhol, R., Teague, K., Griffith, O.L., Zhang, X., et al. (2006) cisRED: a database system for genomescale computational discovery of regulatory elements. Nucleic Acids Res, 34, D68–D73. 37. Wingender, E., Chen, X., Hehl, R., Karas, H., Liebich, I., Matys, V., Meinhardt, T., Pruss, M., Reuter, I. and Schacherer, F. (2000) TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res, 28, 316–319. 38. Griffiths-Jones, S., Grocock, R.J., van Dongen, S., Bateman, A. and Enright, A.J. (2006) miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res, 34, D140–D144. 39. Minguez, P., Al-Shahrour, F., Montaner, D. and Dopazo, J. (2007) Functional profiling of microarray experiments using text-mining derived bioentities. Bioinformatics, 23, 3098–3099. 40. Conde, L., Montaner, D., Burguet-Castell, J., Tarraga, J., Al-Shahrour, F. and Dopazo, J. (2007) Functional profiling and gene expression analysis of chromosomal copy number alterations. Bioinformation, 1, 432–435. 41. Conde, L., Montaner, D., Burguet-Castell, J., Tarraga, J., Medina, I., Al-Shahrour, F. and Dopazo, J. (2007) ISACGH: a web-based environment for the analysis of Array CGH and gene expression which includes functional profiling. Nucleic Acids Res, 35, W81–W85. 42. Hosack, D.A., Dennis, G., Jr., Sherman, B.T., Lane, H.C. and Lempicki, R.A. (2003) Identifying biological themes within lists of genes with EASE. Genome Biol, 4, R70. 43. Dennis, G., Jr., Sherman, B.T., Hosack, D.A., Yang, J., Gao, W., Lane, H.C. and Lempicki, R.A. (2003) DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol, 4, P3. 44. Doniger, S.W., Salomonis, N., Dahlquist, K.D., Vranizan, K., Lawlor, S.C. and Conklin, B.R. (2003) MAPPFinder: using Gene Ontology and GenMAPP to create a global gene-expression profile from microarray data. Genome Biol, 4, R7. 45. Beissbarth, T. and Speed, T.P. (2004) GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics, 20, 1464–1465. 46. Khatri, P., Bhavsar, P., Bawa, G. and Draghici, S. (2004) Onto-Tools: an ensemble of web-accessible, ontology-based tools for the functional design and interpretation of highthroughput gene expression experiments. Nucleic Acids Res, 32, W449–W456. 47. Rivals, I., Personnaz, L., Taing, L. and Potier, M.C. (2007) Enrichment or depletion of a
Functional Profiling methods in cancer
373
GO category within a class of genes: which test? Bioinformatics, 23, 401–407. 48. Cui, X. and Churchill, G.A. (2003) Statistical tests for differential expression in cDNA microarray experiments. Genome Biol, 4, 210. 49. Barry, W.T., Nobel, A.B. and Wright, F.A. (2005) Significance analysis of functional categories in gene expression studies: a structured permutation approach. Bioinformatics, 21, 1943–1949. 50. Al-Shahrour, F., Diaz-Uriarte, R. and Dopazo, J. (2005) Discovering molecular functions significantly related to phenotypes by combining gene expression data and biological information. Bioinformatics, 21, 2988–2993. 51. Albertson, D.G. and Pinkel, D. (2003) Genomic microarrays in human genetic disease and cancer. Hum Mol Genet, 12(Spec No 2), R145–R152. 52. Pinkel, D. and Albertson, D.G. (2005) Array comparative genomic hybridization and its applications in cancer. Nat Genet, 37(Suppl), S11–S17. 53. Franke, L., van Bakel, H., Fokkens, L., de Jong, E.D., Egmont-Petersen, M. and Wijmenga, C. (2006) Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. Am J Hum Genet, 78, 1011–1025. 54. Kitano, H. (2002) Computational systems biology. Nature, 420, 206–210. 55. Zeeberg, B.R., Qin, H., Narasimhan, S., Sunshine, M., Cao, H., Kane, D.W., Reimers, M., Stephens, R.M., Bryant, D., Burt, S.K., et al. (2005) High-throughput GoMiner, an ‘industrial-strength’ integrative gene ontology tool for interpretation of multiple-microarray experiments, with application to studies of Common Variable Immune Deficiency (CVID). BMC Bioinformatics, 6, 168. 56. Al-Shahrour, F., Minguez, P., Tarraga, J., Medina, I., Alloza, E., Montaner, D. and Dopazo, J. (2007) FatiGO+: a functional profiling tool for genomic data. Integration of functional annotation, regulatory motifs and interaction data with microarray experiments. Nucleic Acids Res, 35, W91–W96. 57. Draghici, S., Khatri, P., Bhavsar, P., Shah, A., Krawetz, S.A. and Tainsky, M.A. (2003) Onto-Tools, the toolkit of the modern biologist: Onto-Express, Onto-Compare, OntoDesign and Onto-Translate. Nucleic Acids Res, 31, 3775–3781. 58. Khatri, P., Desai, V., Tarca, A.L., Sellamuthu, S., Wildman, D.E., Romero, R. and Draghici, S. (2006) New Onto-Tools: PromoterExpress, nsSNPCounter and Onto-Translate. Nucleic Acids Res, 34, W626–W631.
59. Khatri, P., Voichita, C., Kattan, K., Ansari, N., Khatri, A., Georgescu, C., Tarca, A.L. and Draghici, S. (2007) Onto-Tools: new additions and improvements in 2006. Nucleic Acids Res, 35, W206–W211. 60. Zhang, B., Schmoyer, D., Kirov, S. and Snoddy, J. (2004) GOTree Machine (GOTM): a web-based platform for interpreting sets of interesting genes using Gene Ontology hierarchies. BMC Bioinformatics, 5, 16. 61. Boyle, E.I., Weng, S., Gollub, J., Jin, H., Botstein, D., Cherry, J.M. and Sherlock, G. (2004) GO::TermFinder – open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes. Bioinformatics, 20, 3710–3715. 62. Robinson, M.D., Grigull, J., Mohammad, N. and Hughes, T.R. (2002) FunSpec: a webbased cluster interpreter for yeast. BMC Bioinformatics, 3, 35. 63. Castillo-Davis, C.I. and Hartl, D.L. (2003) GeneMerge – post-genomic analysis, data mining, and hypothesis testing. Bioinformatics, 19, 891–892. 64. Berriz, G.F., King, O.D., Bryant, B., Sander, C. and Roth, F.P. (2003) Characterizing gene sets with FuncAssociate. Bioinformatics, 19, 2502–2504. 65. Maere, S., Heymans, K. and Kuiper, M. (2005) BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics, 21, 3448–3449. 66. Martin, D., Brun, C., Remy, E., Mouren, P., Thieffry, D. and Jacq, B. (2004) GOToolBox: functional analysis of gene datasets based on Gene Ontology. Genome Biol, 5, R101. 67. Masseroli, M., Galati, O. and Pinciroli, F. (2005) GFINDer: genetic disease and phenotype location statistical analysis and mining of dynamically annotated gene lists. Nucleic Acids Res, 33, W717–W723. 68. Masseroli, M., Martucci, D. and Pinciroli, F. (2004) GFINDer: Genome Function INtegrated Discoverer through dynamic annotation, statistical analysis, and mining. Nucleic Acids Res, 32, W293–W300. 69. Zhang, B., Kirov, S. and Snoddy, J. (2005) WebGestalt: an integrated system for exploring gene sets in various biological contexts. Nucleic Acids Res, 33, W741–W748. 70. Zhong, S., Storch, K.F., Lipan, O., Kao, M.C., Weitz, C.J. and Wong, W.H. (2004) GoSurfer: a graphical interactive tool for comparative analysis of large gene sets in Gene Ontology space. Appl Bioinformatics, 3, 261–264.
374
Dopazo
71. Shah, N.H. and Fedoroff, N.V. (2004) CLENCH: a program for calculating Cluster ENriCHment using the Gene Ontology. Bioinformatics, 20, 1196–1197. 72. Mlecnik, B., Scheideler, M., Hackl, H., Hartler, J., Sanchez-Cabo, F. and Trajanoski, Z. (2005) PathwayExplorer: web service for visualizing high-throughput expression data on biological pathways. Nucleic Acids Res, 33, W633–W637. 73. Young, A., Whitehouse, N., Cho, J. and Shaw, C. (2005) OntologyTraverser: an R package for GO analysis. Bioinformatics, 21, 275–276. 74. Pasquier, C., Girardot, F., Jevardat de Fombelle, K. and Christen, R. (2004) THEA: ontologydriven analysis of microarray data. Bioinformatics, 20, 2636–2643. 75. Vencio, R.Z., Koide, T., Gomes, S.L. and Pereira, C.A. (2006) BayGO: Bayesian analysis of ontology term enrichment in microarray data. BMC Bioinformatics, 7, 86. 76. Falcon, S. and Gentleman, R. (2007) Using GOstats to test gene lists for GO term association. Bioinformatics, 23, 257–258. 77. Lee, H.K., Braynen, W., Keshav, K. and Pavlidis, P. (2005) ErmineJ: tool for functional
analysis of gene expression data sets. BMC Bioinformatics, 6, 269. 78. Volinia, S., Evangelisti, R., Francioso, F., Arcelli, D., Carella, M. and Gasparini, P. (2004) GOAL: automated Gene Ontology analysis of expression profiles. Nucleic Acids Res, 32, W492–W499. 79. Breslin, T., Eden, P. and Krogh, M. (2004) Comparing functional annotation analyses with Catmap. BMC Bioinformatics, 5, 193. 80. Tomfohr, J., Lu, J. and Kepler, T.B. (2005) Pathway level analysis of gene expression using singular value decomposition. BMC Bioinformatics, 6, 225. 81. Ben-Shaul, Y., Bergman, H. and Soreq, H. (2005) Identifying subtle interrelated changes in functional gene categories using continuous measures of gene expression. Bioinformatics, 21, 1129–1137. 82. Boorsma, A., Foat, B.C., Vis, D., Klis, F. and Bussemaker, H.J. (2005) T-profiler: scoring the activity of predefined groups of genes using gene expression data. Nucleic Acids Res, 33, W592–W595.
Chapter 20 Calibration of Microarray Gene-Expression Data Hans Binder, Stephan Preibisch, and Hilmar Berger Summary Calibration of microarray measurements aims at removing systematic biases from the probe-level data to get expression estimates that linearly correlate with the transcript abundance in the studied samples. The improvement of calibration methods is an essential prerequisite for estimating absolute expression levels, which, in turn, are required for quantitative analyses of transcriptional regulation, for example, in the context of gene profiling of diseases. We address hybridization on microarrays as a reaction process in a complex environment and express the measured intensities as a function of the input quantities of the experiment. Popular calibration methods such as MAS5, dChip, RMA, gcRMA, vsn, and PLIER are briefly reviewed and assessed in light of the hybridization model and of previous benchmark studies. We present our hook method, a new calibration approach that is based on a graphical summary of the actual hybridization characteristics of a particular microarray. Although single-chip related, hook performs as well as the multi-chip-related gcRMA, presently one of the best state-of-the-art methods for estimating expression values. The hook method, in addition, provides a set of chip summary characteristics that evaluate the performance of a given hybridization. The algorithm of the method is briefly described and its performance is exemplified. Key words: Gene expression, Microarray calibration, Preprocessing methods, Transcript concentration, Hook curve, Hybridization, Langmuir isotherm
1. Introduction In this chapter, we emphasize GeneChip microarray data analysis after the chips have been hybridized and scanned and the images have been summarized into hundred thousands of probe intensity values. With this enormous amount of data, we need standardized systems and tools for data management to analyze the results in a proper and sound way, as well as to be able to benefit from other publicly available gene expression data sets. Robert Grützmann and Christian Pilarsky (eds.), Cancer Gene Profiling: Methods and Protocols, Methods in Molecular Biology, vol. 576, DOI 10.1007/978-1-59745-545-9_20, © Humana Press, a part of Springer Science + Business Media, LLC 2010
375
376
Binder, Preibisch, and Berger
The basic principle of microarray experiments relies on the fluorescence intensity measurement for an individual probe to infer the transcript abundance specific for a selected gene. This relationship raises several difficult issues to properly extract the expression degree from the measured intensity. Calibration of microarray measurements aims at removing consistent and systematic sources of variations to allow mutual comparison of measurements acquired from different probes, arrays, and experimental settings. Calibration is also called preprocessing because it usually constitutes the first step in the microarray analysis pipeline. It potentially influences the results of all subsequent steps of “higher-level” analyses as well as the biological interpretation of these results, and is therefore a crucial step in the processing of microarray data. The chapter is organized into three parts: (1) As the essential premise for evaluating existing and developing new calibration methods, we acknowledge hybridization on microarrays as a reaction process in a complex environment and express the measured intensities as a function of input quantities of the experiment. (2) Over the past years, microarray preprocessing has adapted a few generally accepted methodologies. In the second part of the chapter, we briefly review these options with regards to the underlying hybridization process and we judge advantages and disadvantages in the light of previous benchmark studies. Because we focus on Affymetrix GeneChip arrays, special attention is dedicated to the question of whether a mismatch-based chip design provides benefits for intensity calibration. (3) Finally, we present our hook method, a new calibration approach that is based on a graphical summary of the hybridization characteristics of each microarray. It uses a sort of natural metrics for intensity calibration, with the potential to estimate expression values on an absolute scale. We briefly describe the algorithm and exemplify its performance.
2. Calibration of Microarrays Microarray experiments aim to estimate the “expression degree” of thousands of specific target sequences using the integral intensity response of the respective probe spots on the chip surface. The detected intensity is affected by parasitic effects owing to the “technical” variability of repeated measurements and systematic biases that disturb the one-to-one relationship between the input and the output quantity of the measurement (1). The task of making estimates of the input quantity of a measurement from observations of its output is called calibration. First, it requires the determination of the model describing the basic relationship between the probe intensity and the specific
Calibration of Microarray Gene-Expression Data
377
transcript concentration with consideration given to all relevant parasitic effects that should be straightened out. Second, the magnitude of these effects should be estimated using the intensity information of a given chip or of a series of chips. Third, one needs practicable algorithms that estimate the expression degree from the intensity values. 2.1. The Langmuir Hybridization Model
The hybridization of a microarrays probe (P) can be described by the following reversible second-order reactions referring to specific (S) and nonspecific (N) target binding, respectively: P f + S f PS and P f + N f PN .
(1)
Accordingly, free complementary RNA (cRNA; or complementary cDNA) fragments with completely complementary (S f) and partly complementary (N f) sequences in solution compete for duplex formation (PS and PN) with free DNA–probe–oligonucleotides attached to the chip surface (P f). The equilibrium constants for specific and nonspecific hybridization characterize the “affinity” of the respective targets for duplexing with the probe,
KS ≡
[PS ] [P f ]∙[S f ]
KN ≡
and
[PN ] . [P f ]∙[N f ]
(2)
The brackets denote the respective concentrations. Making use of the condition of material balance for the probe oligonucleotides, [P ] = [PS ] + [PN ] + [P f ] , and assuming excess of free species, f [ N ] = [ N f ] + [ PN ] and [ S ] = [ S ] + [ PS ] , one obtains the fraction of occupied probe oligonucleotides after insertion into Eq. 2 and rearrangement, Θ ≡
(
)
K S ∙[S ] + K N ∙[N ] [PS ] + [PN ] = . [P ] 1 + K S ∙[S ] + K N ∙[N ]
(
)
(3)
Typically, the hybridization solution contains a very large number of nonspecific fragments of different lengths and sequences. For sake of simplicity, we subsume this diversity by the term K N ∙[N ] ≡ ∑ i K iN ∙[N ]i referring to a single, effective species.
Our reaction scheme, Eq. 1, considers only the bimolecular duplexing between probes and targets for the sake of simplicity. Note that the available concentration of free probes and targets are, however, reduced by parasitic reactions such as bulk dimerization between different targets and intramolecular folding of probes and targets. These effects can be taken into account by substituting the equilibrium constants for the bimolecular binding
378
Binder, Preibisch, and Berger
reactions (Eq. 2) with effective reaction constants depending on the reaction constants of the additional processes (see, e.g., ref. 1 for details). Typically, the effective binding rates are decreased compared with their values in the absence of parasitic reactions. After the hybridization step, free targets are removed by washing, and bound targets are labeled with fluorescent markers that attach to biotinylated nucleotides inserted into the target sequences before hybridization. Finally, the fluorescence emission of the probe spot is scanned and processed into one intensity value. Assuming good grinding, it directly relates to the fraction of occupied probe duplexes, Q, i.e., (4) I = M∙ Θ +O. Here, M denotes the proportionality constant in intensity units and O the “optical” background, referring to the residual intensity measured in the absence of bound transcripts owing to, e.g., adsorbed free labels or the dark current of the detector. 2.2. Probe and Chip Design
On GeneChip expression arrays, each gene is interrogated by a set of Nset = 11–20 probe pairs. Each of them consists of a perfect match (PM) and a mismatch (MM) version. The PM sequence perfectly matches a segment of the target gene with a length of 25 nucleotides. The MM sequence is identical to that of the corresponding PM probe except the middle (13th) base, which is changed to its Watson–Crick complement. The MM probes are intended to provide an estimate of the background of the respective PM. The probe set forms a series of pseudo-replicates probing the same target with different probe sequences to increase the certainty of the expression estimate. GeneChip microarrays can be viewed as sort of multi-photometer chips, each of which assembles approximately 105–106 virtually independent dual-channel micro-photometers on an area of approximately 1 cm2. This analogy implies that each PM probe spot constitutes the “sample” channel for detecting RNA fragments of a given sequence; whereas the MM spot serves as the “reference” channel for nonspecific background correction. The apparatus function given by Eq. 4 applies to each of these “microphotometers,” however, with different sequence- and transcriptspecific values of the parameters. With (3) one obtains:
I pP,c =
LPp ,c 1 + M c−1 ∙LPp ,c
+ Oc
with
LPp ,c = S pP,c + N pP,c .
(5)
The concentration dependence of the intensity of a PM/MM probe pair is illustrated in Fig. 1. In Eq. 5, probe-related properties are indexed by the superscripts P = PM, MM to account for the probe type, and by the subscripts p = 1, …, Nprobe
Calibration of Microarray Gene-Expression Data
379
and c = 1, …, Nchip for the probe and chip effects in terms of the probe number on the chip and the chip number in a series of microarray hybridizations, respectively. Each gene/transcript is subsumed in the chip effect because its expression degree is a sample- and, thus, chip-related property. LPp,c is the linear approximation of the amount of target binding in intensity units. It additively decomposes into contributions due to nonspecific and specific hybridization, NPp,c and SPp,c, respectively. The latter term can be further split into factors characterizing the affinity (ApP) and the expression degree (Ec) according to Eq. 3,
S pP,c = A pP ∙Ec = M c ∙K pP,S ∙[S ]c .
(6)
The right-hand side of Eq. 6 refers to an absolute scale where the expression degree is given in concentration units (material per volume, e.g., mole per liter). Note that the binding constant defines the concentration of “half occupancy,” at which 50% of the probe–oligonucleotides become occupied in the absence of nonspecific hybridization (see Eq. 3 with KpS∙[S] = 1 and [N] = 0). On the contrary, the middle part of Eq. 6 defines the expression degree and affinity in arbitrary units with an uncertainty of a constant factor. Microarray calibration experiments using sets of spiked-in transcripts at different concentrations confirmed the predicted nonlinear intensity response to a good approximation (2–6). This hyperbolic function levels off into an intensity asymptote of IPp,c→Mc + Oc upon saturation of the probe spots with bound transcripts at high transcript concentrations (see Fig. 1). It can be “linearized,” provided the asymptotic and optical background values are known,
LPp ,c =
I pP,c − Oc
(
(7)
)
1 − M c−1 ∙ I pP,c − Oc
This transformation is illustrated in the right part of Fig. 1. 2.3. Calibration Error: Linear or Logarithmic Scale
The raw intensity data are highly “noisy.” Application of simple error propagation formalisms to Eq. 5 provides the intensity error in the linear and logarithmic scales: e ≡ d ( I − O) ≈ ±
(db)2 + (( I − O) · dg )2 (8)
2
db 2 log e ≈ d log( I − O) = ± + (dg ) . ( I − O)
It splits into an additive contribution due to fluctuations of the transcript concentrations and the optical background,
380
Binder, Preibisch, and Berger
Fig. 1. Langmuir hybridization isotherm (left part, Eq. 5) and linearized isotherm (right part, Eq. 7) of PM and MM probes. The row of figures below shows the “hook” plot in D vs. S coordinates. Error limits are shown by dashed lines (Eq. 8). The hybridization regimes are indicated in the left part of the figure (see text). The optical background is omitted for sake of clarity.
db ∝ d [S ] ∝ d [N ] ∝ dO ∈ N (0, σb ) ; and into a multiplicative term
caused by variations of the binding affinity, dg ∝ d log K ∈N (0, σ g ). The former term dominates at small intensities, whereas the multiplicative contribution is the most significant source of variation at higher intensities. Most of the available data analysis algorithms assume a homoscedastic, intensity-independent gaussian error. The linear scale meets this assumption at small intensities but progressively underestimates the error with increasing signal. In turn, log-transformed data underestimate the error at low intensities. Mostly, relevant expression values refer to medium and higher intensity levels. Therefore, for most purposes, the data analysis is more adequately performed in log scale than in the linear scale. An apparently better alternative makes use of the so-called 1 generalized logarithm, g log( x ) ≡ log ( x + x 2 + c ) . It behaves 2 linearly at small and logarithmically at high arguments, ensuring a
Calibration of Microarray Gene-Expression Data
381
virtually constant error width, g log e ≈ dg . However, its proper use requires scaling of the argument and of the parameter c (7, 8). Note that the standard deviations of the considered distributions are only constants in the absence of saturation. Otherwise, the error width decreases with progressive saturation at high intensities according to: s g ≈ (1 − M −1 ·( I − O)) · s g0
and
s b ≈ ((1 − M −1 ·( I − O)) · s g0 )2 + (s O0 )2 .
(9)
The error limits of the hybridization isotherm are illustrated in Fig. 1 by dashed lines. 2.4. Reference Probes: MM or Half-Price Solution
The use of MM probes as background references for the PM probes, as originally intended, brings up two practical problems: (1) for a considerable fraction of probe pairs, the MM fluoresce brighter than the PM. This observation appears “unphysical” because MM probes are assumed to bind transcripts at maximum in equal but never in higher amounts than the PM. (2) The MM probe intensities, on average, scatter more strongly than those of the PM. As a consequence, calibration algorithms either empirically attenuate the MM intensity values to ensure strictly positive PM–MM intensity differences or they deal completely without MM data (see Subheading 3.1. below). Half-price solutions for chips without MM are proposed to replace the “superfluous” MM by additional PM probes (9). New GeneChip generations such as the Exon 1.0 arrays are designed as PM-only chips without MM probes. However, intensity calibration of microarray data is still a challenging task and the question of whether the use of internal reference probes such as MM can bring some real benefit into chip analysis is not answered yet. For example, the problem of bright MM can be rationalized in terms of the “reversed” base pairings that form the complementary middle bases of the PM and MM probe sequences upon nonspecific hybridization and of the purine–pyrimidine asymmetry of binding strengths of RNA/DNA interactions (10, 11). Additionally, the variability problem of the MM probes can be, at least partially, explained on the level of base pairings of the middle base: In the MM, it changes from a complementary Watson– Crick pairing in the nonspecific duplexes into a mismatched pairing in the specific duplexes, whereas the respective pairing of the PM remains virtually unchanged (10, 11). Below, we present a new calibration algorithm that explicitly accounts for these effects. Moreover, this “hook” method uses the MM probes not only as a background reference, but also
382
Binder, Preibisch, and Berger
interprets them as a sort of “weak” PM that also responds to specific hybridization according to Eq. 5. In this approach, the MM operate as a hybridization reference over the full range of transcript concentrations. This way they enable the scaling of the intensities in a natural metrics system. We suggest that this idea opens a new view on the potential design and use of mismatched reference probes. 2.5. The Calibration Tasks
The intensity contribution due to specific hybridization, S, measures the expression degree on a relative scale. Consequently, the inversion of Eq. 5 with respect to S and the solution of Eq. 6 with respect to E (or [S]) furnishes a starting point to discuss the essential tasks for calibrating probe level data. It implies the need for estimating: 1. The background contributions, N and O. 2. The sequence-specific affinities K S /A and K N affecting N and S, respectively; and 3. The degree of saturation in terms of the saturation parameter M for correcting the intensity of each probe. Microarray intensity data are noisy with non-gaussian frequency distributions. Proper calibration also requires, therefore, the consideration of: 4. Appropriate error models based on the frequency distribution of the intensities and of their specific and nonspecific contributions (Eq. 5). The special design of GeneChip arrays raises two additional tasks for probe intensity calibration, namely: 5. The aggregation of the individual probe-level expression values of one probe set into one transcript-related expression value; and 6. The proper use of the MM probes to adjust the PM data. Usually, the expression measure E is given in arbitrary units that are related to the special conditions of a particular hybridization. For comparison with other chips, calibration therefore requires finally: 7. Adjustment of the chip-related expression measures into one common scale, which is, ideally, the absolute scale of transcript abundance in concentration units.
3. Preprocessing: State of the Art Microarray data calibration is usually called preprocessing because it is performed prior to higher-level statistical analysis, such as differentially expressed genes selection. A preprocessing method
Calibration of Microarray Gene-Expression Data
383
for GenChips typically consists of three basic “ingredients”: background correction, normalization, and summarization. The background correction step is typically done in an attempt to remove nonspecific binding and the optical background; the normalization step reduces systematic variation between chips; and the summarization step generates an expression value for each gene/probe set. Background correction typically uses information only from one array, normalization makes a series of arrays comparable, and summarization can be performed alternatively on the basis of single-chip or multi-chip data. Numerous algorithms exist for the steps dealing with one or several of the calibration tasks specified in the previous section. Many of them can be applied in different combinations and order, providing numerous potential preprocessing methods with apparently little consensus regarding which is the most suitable. In the next sections, we give a short overview of some of the most popular methods and review their performance on the basis of the results of different benchmark studies. 3.1. Linear Approximations
P P P P The linear approximation of Eq. 5, I p ,c ≈ L p ,c + Oc = S p ,c + N p ,c + Oc , neglects saturation at high transcript concentrations. It is used in basically all popular preprocessing methods: Microarray Suite 5 (MAS5, (12)), robust multiarray analysis (RMA, (13, 14)), gcRMA (15), dChip (16), probe logarithmic intensity error (PLIER, (17)), and variance stabilization normalization (vsn, (7)). The kernel of these methods, except vsn, essentially deals with the baseline correction and summarization steps, which, in principle, can be independently combined with stand-alone normalization algorithms such as quantile (18), global mean (19), loess, or invariant probe set normalizations (16) (see below). In contrast, vsn provides baseline-corrected and normalized probelevel expression values that can be further processed with any stand-alone summarization algorithm, such as median polish (see below). To clarify, by “method,” we mean the complete processing pipeline starting from raw intensity data and ending up with transcript-related expression values. Available algorithms can be roughly divided into global and probe-specific baseline-correction algorithms (see Table 1 for an overview). RMA and vsn, referring to the former group, correct all probe intensities of a selected microarray by one common background, whereas the other algorithms estimate a specific background value for each probe, partly, using the MM probe intensities. For summarization, all methods, except MAS5, process a series of chips in parallel. The obtained expression values are consequently context sensitive and require reprocessing upon elimination, substitution, or addition of arrays in the respective series. The methods can also differ with respect to the used error model that fits the data either in linear, log, or glog scale.
384
Binder, Preibisch, and Berger
Table 1 Comparison of preprocessing methods with respect to background correction, scaling of the expression values, and chip processing. The asterixes indicate adequate and useful approaches with respect to probe-specific effects, error propagation, and single-chip analysis vsn
RMA
gcRMA
PLIER
Background
Global*
Global
Specific*
Scale
glog*
log*
# of chips
Multi
Multi
dChip
MAS5
Hook
Specific* Specific*
Specific*
Specific*
log*
glog*
Lin
log*
glog*
Multi
Multi
Multi
Single*
Single*
In the following, we outline the algorithmic backbone of the selected methods: 3.1.1. Microarray Analysis Suite 5
MAS5 is a single-chip background and summarization method. It performs background correction in two steps: First, the optical background is estimated by dividing the chip surface into a 4 × 4 grid, taking the average over the 2% weakest intensities within each zone and subtracting an interpolated background depending on the x–y position of each probe to account for spatial inhomogeneities. Second, the MM intensities serve as estimates for the N contribution, SMAS5 = IPM − IMM*, where however “bright” MM are substituted by “representative” values IMM*, which transform negative differences (IPM – IMM) into small positive ones (IPM – IMM*) ³ 0 to obtain strictly positive specific signals for each probe, SMAS5 ³ 0. Finally, the SMAS5 values are transformed into log scale and summarized for each probe set using one-step Tukey’s biweight median, which effectively removes signals with large median absolute deviations. In addition to the expression measure, MAS5 calculates the detection call, a useful qualitative value, which indicates whether a transcript is reliably detected (present) or not detected (absent). MAS5 uses global normalization as standard, which simply rescales the log intensities of each probe by a chip-specific factor that ensures agreement between all chip averages in the considered series.
3.1.2. dChip
Two alternative versions of this method provide either PM only or PM–MM estimates of the expression degree using the equations IpcPM = ApPM∙Ec + Bp + e or IPM − IMM = ApPM–MM∙Ec + e to fit the respective intensities by nonlinear least squares (e is the additive error term). The model assumes equal background contributions of the PM and MM on all chips of a series, and includes the optical contribution, Bp = Np + O with Np = NpPM = NpMM. The method constrains the squared set average of the affinity to unity,
Calibration of Microarray Gene-Expression Data
385
<Ap2>peset = 1, with the consequence that the expression degree is obtained as the affinity-weighted average of the specific signal over the probeset, Ec = <Sc,p∙Ap>p, with larger weights given to high-affinity probes. dChip uses invariant-set normalization as standard: This method selects a subset of PM probes with small rank differences of their intensities in a series of arrays, and calculates an intensity-dependent correction curve from this subset, which is then applied to all probes. 3.1.3. Robust Multiarray Analysis
To get strictly positive expression estimates (S ³ 0), RMA decomposes the frequency distribution of the intensities into an exponential signal (P S (S) ~ exp(−a∙S)) and a gaussian background (P B(B) ~ N(B, mc, sc)) distribution: P I (Ip) = P B (B)∙P S (Sp). The distribution parameters a, mc, and sc are estimated from the chip data. The background-corrected signal referring to a given intensity is then obtained as the weighted average over the background and signal distributions, with the constraint SpRMA = Ip − BcRMA ³ 0: BcRMA = mc + s c ∙ (s c ∙ a − ∆f) (Df is the difference of normalized error functions). Summarization is performed by the fit of the log-transformed specific data of each probe set in a series of chips to the additive model, log(SpcRMA) = log EcRMA + log ApRMA + log e, using median polish to minimize the residual log error. The used constraint Median(log ApRMA)peset = 0 results in expression measures that are roughly related to the median of the log signal, i.e., log Ec ~ median(log Sp,c)p€set. RMA uses quantile normalization as standard. This algorithm transforms the different intensity distributions of a chip series into one “average” distribution.
3.1.4. gcRMA
This method is essentially identical to the RMA method, except for the background correction step. Here gcRMA accounts for the sequence specificity of nonspecific hybridization using the intensity of pseudo-MM as “representatives” taken from a subset of the MM possessing the same GC content as the PM probe of interest. Then the logarithm of the specific signal, log SgcRMA, is calculated as the weighted average over the gaussian background distribution and a signal distribution following a power law. As in RMA, the center of the background distribution is shrunken with respect to that of the pseudo-MM due to correlations with the PM, i.e., B pgcRMA = exp (r ln I pMM + (1 − r)∙mc ) (r is the coefficient ,c of correlation between the PM and the MM data and mc is the center of the MM distribution).
3.1.5. Variance Stabilization Normalization
The vsn approach shifts and rescales the intensity of a series of chips to transform their intensity-dependent heteroscedastic error distribution into an intensity-independent homoscedastic distribution. Instead of the logarithm, it uses the arcsinh function as a special case of the glog transformation, arcsinh(x) = g log(x) with c = 4 (see above), to get the background-corrected signal,
386
Binder, Preibisch, and Berger
arcsinh(Spcvsn) = arcsinh((Ipc – Bcvsn)/F0vsn). The chip-specific parameters Bcvsn and Fcvsn are obtained via maximum likelihood optimization for a subset of virtually invariant genes in a series of chips. The arcsinh-transformed probe-level expression values can then be summarized using, e.g., median polish, according to arcsinh(Spcvsn) » log Apvsn + log Ecvsn + log e (for Spcvsn>1). 3.1.6. Probe Logarithmic Intensity Error
This method uses the MM probes for background correction and the glog transformation for appropriate error handling. It fits −1 MM the equation S pPLIER using an = A pPLIER ∙EcPLIER = e ∙I pPM ,c ,c − e ∙I p ,c outlier-resistant nonlinear least squares algorithm for minimizing the error term log(e) = g log(SPLIER) − g log(IPM − IMM) with c = 4IPM∙IMM. The fit returns strictly positive signals SPLIER ³ 0 for all nonnegative intensities independently of the relation between the PM and MM values, i.e., including also bright MM, IMM > IPM. For sake of completeness we will notice the existence of alternative and partly interesting approaches such as the positionaldependent nearest neighbor (PDNN) method, which uses a nonlinear, sequence-specific model (20); TM, which is based in a very simple but effective fashion on the trimmed mean of PM–MM differences (21); factor analysis for robust microarray summarization (FARMS), a probe-specific RMA-like, multivariate approach (22); and a method based on strict signal deconvolution based expression detection (23).
3.2. Benchmark Criteria and Calibration Data
In the preceding section, we briefly outlined some of the most popular preprocessing methods. The diversity of competing algorithmic approaches implies profound effects on the derived expression measures with consequences for subsequent higherlevel statistical analysis. The correct choice of a method might depend on the scientific question being asked and on the particular experimental design and microarray data structure. Here, benchmark studies might permit users to judge each method using scientifically meaningfully summaries. Two basic benchmark criteria, precision and accuracy, are essential for judging calibration methods. The accuracy specifies the systematic bias of the method in terms of the deviation of the expression estimates from its true (usually unknown) value. In turn, the precision characterizes the resolution (or “uncertainty”) of the expression estimates. It is inversely related to their variability in replicate measurements. Different test scenarios are used for calibration/benchmark studies to model different experimental situations: In the Latin-square spiked-in experiment, the concentrations of a small set of ~15–40 transcripts are varied in definite concentration steps in a hybridization solution containing a cell extract as a constant background (24). These calibration data are suited to assess the concentration dependence of the intensity and the
Calibration of Microarray Gene-Expression Data
387
performance of the background correction and summarization steps. The small number of variable transcripts affecting less than 1% of the available probe sets and the Latin-square design of the experiment, which cyclically permutes the spikes among the chips, give rise to a rather small inter-chip variability. It makes the data not optimal for judging normalization algorithms. On the contrary, in the golden spike experiment, a relatively high number of transcripts referring to ~25% of all probe sets are hybridized on the chips without special background addition (25). The concentration of approximately one half of these spikes is varied in a “treatment versus control” design. Experiments of the golden spike type might help to develop new, improved normalization algorithms because the basic assumptions of global normalization methods are violated in many expression studies. Particularly, normalization methods such as quantile and global mean normalizations presume that only a small fraction of genes is differentially expressed, and that there are roughly equal numbers of upregulated and downregulated genes. These assumptions are rather restrictive and prevent the exploration of global changes of the expression level (see below). In dilution experiments, the total amount of RNA in the hybridization solution is changed in definite steps (24). In the closely related mixing experiments, two RNA extracts are mixed in different proportions, leaving the total amount of RNA constant (26). These types of experiments provide a good basis for studying the effect of the mutual interference between different transcript fractions in the hybridization solution on the performance of preprocessing methods. Another approach uses quantitative real-time PCR(27) as the gold standard method of measuring gene expression in tissue samples for the evaluation of microarray calibration. Alternative studies analyze statistical characteristics, such as the false discovery rate (21), correlations between genes (28, 29), or sources of variation between samples (30) to validate preprocessing methods on “real” data sets collected in a biomedical context. The practical relevance and consistency of the used criteria must be checked as the case arises: For example, correlation-based criteria favor methods that produce, on the average, zero correlations between randomly selected genes (28). Here methods are preferred that remove biases but, unfortunately, also the “valuable” expression signals. In addition, computer simulations are an interesting option to compare preprocessing methods. However, there is the problem of avoiding inherent circularities, e.g., if the data model relies on assumptions used in the analysis algorithm. For example, it is not surprising that methods ignoring probe-specific background levels perform well on data synthesized without probe-specific background contributions (31). Therefore, results from simulation
388
Binder, Preibisch, and Berger
studies must be critically reflected in the context of the actual simulation design. 3.3. Which Method is the Best?
Numerous studies have assessed preprocessing methods in a wide range of conditions to benchmark their performance. In a general sense, there is apparently no “best” method that clearly outperforms the others under all circumstances. Moreover, all of these methods have been proven in numerous applications to provide reasonable results. For example, in patient-cohort studies, researchers typically select sets of genes that are differentially expressed between certain known conditions (supervised approach) or they attempt to detect biological relations between samples or genes by grouping them according to their expression profiles (unsupervised approach). Often the goal is to obtain predictors for prognostically relevant categories. It has been argued that the choice of the preprocessing method has less influence on the final outcome, especially in studies based on large numbers of arrays, whereas it can have important effects on the results of smaller studies (29). The existence of a certain minimum number of differentially expressed genes is obviously sufficient for predictor selection without the need of exact quantification of the observed changes. Clearly, the reliability of such analyses will improve with the number of samples and/or with the significance level for detecting differential expression. On the other hand, genomic regulation is governed by the specificity of molecular interactions between genomic, transcriptomic, and proteomic factors, and their mutual relations and levels. Particularly, the estimation of transcript levels on an absolute scale using microarrays is a challenging task that becomes necessary for exploring mechanisms of gene regulation. For these issues, exact calibration and the choice of appropriate methods is an essential prerequisite. Calibration data reproducing the basic concentration dependence of the intensity without complex inter-chip variations of the hybridizations clearly show that the nonspecific hybridization background correction is the main factor that explains differences between the methods (25, 27, 28, 32). Global background correction algorithms such as vsn and RMA obviously underestimate the level of nonspecific hybridization, leading to attenuated estimates of differential expression with strong negative biases, especially at low expression levels. Methods with MM corrections such as MAS5, PLIER, and dChip outperform methods discarding MM data at medium and higher expression levels, providing much better accuracies. On the other hand, MM corrections give rise to highly variable expression estimates at low intensity levels with partly high false-positive detections. The correct balance between accuracy and precision depends on the signal intensity,
Calibration of Microarray Gene-Expression Data
389
with the problem that the gain in precision at low intensities must be paid for by a penalty in accuracy and vice versa. It seems that the much lower variability of RMA and vsn estimates expression of low-abundance genes in a biased, but very precise manner. Minimizing variability for biased estimates, however, produces a dangerous sense of confidence in potentially incorrect data. On the contrary, a higher variability at low intensities at least circumvents such incorrect conclusions as long the variability exceeds the bias. Here, the sequence-based background adjustment of gcRMA emerges as a method that may be the most optimum one across the whole intensity range (25, 27, 32). Generally, one has to keep in mind that the precision of expression measures can be improved by replicate measurements and also by further developing statistical concepts, e.g., by explicit consideration of the measurement error derived from the hybridization mechanism in a probe-dependent fashion (see above). On the contrary, the accuracy of calibration methods cannot be improved by replicates. It requires the understanding of the essential factors that govern microarray hybridization and their implementation into feasible algorithms. All considered methods systematically underestimate the expression level at high RNA concentrations because they neglect saturation. Here, nonlinear hybridization models such as the two-species Langmuir isotherm provide a more adequate concept to account for this effect. Other important challenges for the amelioration of calibration methods are the need for better probe-specific background corrections, for normalization algorithms that conserve differential expressions between the samples on an absolute scale, and also for better affinity corrections for more precise data. Note that most expressed genes are not necessarily the key players in genomic regulation. Hence, better background and affinity corrections should increase the resolution of the method to also detect relatively small changes of the expression level.
4. Hook Calibration: Toward Absolute Expression Measures
Our hook calibration method analyzes the intensity data of a given GeneChip microarray in terms of the two-species Langmuir isotherm (Eq. 5). The method uses the MM probe intensities as reference for the PM over the whole concentration range to discern typical hybridization regimes, namely those of predominant nonspecific binding (N), mixed hybridization (mix), predominant specific binding (S), saturation (sat), and asymptotic binding (as) as illustrated in Fig. 2. The intensity data are
390
Binder, Preibisch, and Berger
Fig. 2. The hook method. The raw intensity data of one GeneChip microarray are plotted into the D = log(PM/MM) vs. S = 1/2 log(PM ∙ MM) coordinate system and smoothed to get the raw hook curve. Then, probes from the N and S hybridization regimes are used to calculate four sets of 16 position-dependent nearest-neighbor sensitivity profiles of the affinity model (nonspecific and specific for the PM and MM each). After affinity correction of the intensities, one obtains the corrected hook curve. It is used to get improved sensitivity profiles in a second iteration step. The mix, S, and sat ranges of the corrected hook are well fitted using the two-species Langmuir hybridization model. The dimensions of the hook, its width and height, provide hybridization characteristics of the chip such as the binding strength of nonspecific hybridization and the mean PM/MM gain of the binding affinity, respectively.
aggregated into one mean hybridization characteristic called a hook curve because of its characteristic shape, which is predicted by the Langmuir model (see Figs. 1 and 2). The method uses the position-dependent nearest neighbor model to account for the probe-specific binding affinity on specific and nonspecific hybridization. It corrects the probe intensities for probe-specific background, affinity, and saturation limit. Note that our model differs from that of Zhang et al. (20) who restrict the positional dependence by a common weight function for the nearest-neighbor free energy terms. Our positional-dependent terms are freely
Calibration of Microarray Gene-Expression Data
391
adjusted (see Eq. 10 below). The hook method is a single-chip approach, which provides essential hybridization summaries such as the fraction of not-expressed probe sets (%N), the mean background intensity (NcPM), and the PM/MM sensitivity gain on specific binding (sc). 4.1. Algorithm
The algorithm consists of the following basic steps (see also Fig. 2): 1. The intensity data are corrected for the optical background using the Affymetrix zone algorithm (19). 2. The PM and MM probe intensity data are plotted into a special type of M–A plot, where the ordinate value is the log difference, D = log IPM − log IMM, and the abscissa value is the set-averaged log sum, S= 0.5 < (log IPM + log IMM)>set. 3. The data are smoothed using a sliding window over ~100 probe sets along the abscissa. The obtained D vs. S relationship is called a raw hook curve because of its characteristic shape. It divides into four characteristic parts: the N range referring to the relatively flat starting region, the subsequent mix range of positive slope, the S range near the maximum, and the sat range with a negative slope beyond the maximum. 4. The intensities of the probes from the N and S ranges are used to fit the positional-dependent nearest neighbor model. It decomposes the log intensity variation about its set average P,h into a sum of additive sensitivity terms, de k (BB¢ )p , where BB¢ is the couple of adjacent bases at position k and k + 1 of the probe sequence (k = 1, …, 24; BB¢ = AA, AT, …, CC). The model is parameterized separately for nonspecific and specific binding (h = N,S) of the PM and MM (P = PM,MM), respectively (10, 11), thus, providing four sets of 16 BB¢ sensitivity profiles, which, in turn, are used to calculate the affinity correction in a sequence-specific fashion: 24
P,h log A pP,h ,c = ∑ de k ,c (BB¢ ) p . k =1
(10)
5. Next, the probe intensities are corrected for sequence-specific affinities using the model adjusted in the previous step. In the mix range, we use a weighted superposition of the N and P,S − (x P,S ∙log A pP,S )∙log A pP,N ,c + (1 − x ,c ) S contributions, I pP,corr , where = I pP,c ∙10 ,c
P P,N P x pP,S is the fraction of specific ,c = max(1 − N c ∙A p ,c / L p ,c , 0)
hybridization contributing to the intensity.
6. The affinity-corrected intensities are used to get the corrected version of the hook curve with the coordinates Shook and Dhook and an improved set of sensitivity profiles by reiteration of
392
Binder, Preibisch, and Berger
steps 2–5. Note the significant differences between the raw and the corrected hooks: Affinity correction clearly reduces the width of the N range and also the scattering of the data in the remaining hybridization regimes. 7. The mix, S, and sat ranges of the corrected hook curve are fitted using the two-species Langmuir isotherm (see Subheading 4.2). The fit and the separate analysis of the N range provide chip characteristics such as the mean background level (NcPM), the saturation intensity (Mc), the width and correlation coefficient of the background distribution (s and r), and the mean PM/MM sensitivity gains (nc and sc) that are used for calibration of the probe-level intensity data in the next step. 8. The probe intensities are linearized using Eq. 7. Then, the probe-level expression degree is estimated as the weighted glog average of the total signal minus the respective nonspecific background contribution according to Eq. 5:
PM N PM x PM PM,N g log(S pPM , c ) = ∫ N ( N c , s )·g log ( L p , c − 10 · N c · Ap , c ) ·dx,
(11)
where N(NcPM,s N) is the gaussian distribution of the nonspecific, affinity-corrected PM signal. Alternatively, we also calculate a PM–MM version by substituting the g log term in the integral of Eq. 11 for MM x PM PM,N g log(( LPM − (nc−1 · N cPM )− r · ApMM,N ·10( r −1)· x )) p , c − L p , c ) − 10 · N c ( Ap , c ,c
. This approach uses the bivariate marginal distribution of the PM–MM background, where r denotes the coefficient of correlation between the PM and MM background intensity values. 9. The probe-level specific signals are affinity corrected PM PM,S −1 according to E pPM for the PMonly and ,c = S p ,c ∙(A p ,c ) − MM − MM E pPM = S pPM ∙(A pPM,S − s c−1∙A pMM,S )−1 for the PM–MM estimates, ,c ,c ,c ,c
and then summarized by means of the Tukey biweight median to get robust transcript-level expression estimates. 4.2. Natural Metrics of Expression Values
The hook-like shape of the D vs. S dependence can be reproduced using the two-species Langmuir isotherm (see Fig. 1). First, we applied Eq. 5 separately to the intensities of the PM and MM and then transformed the predicted intensities into D vs. S coordinates. The obtained theoretical function fits the experimental data to a good approximation (Fig. 2). The hook curve considers all probes of a given chip. It consequently summarizes the prope-rties of a particular hybridization into a sort of mean binding isotherm. The hook curve is divided into five characteristic ranges, which can be assigned to different hybridization regimes (see step 3 of Subheading 4.1 and also Fig. 2): In the N regime, the probes hybridize almost exclusively nonspecifically owing to
Calibration of Microarray Gene-Expression Data
393
the absence or low concentrations of specific transcripts. In the subsequent mix regime, both specific and nonspecific transcripts significantly contribute to the observed intensity of the probes. In the S regime, the probes predominantly hybridize with specific transcripts. In the sat regime, the probes become progressively saturated with bound transcripts. This effect first and foremost affects the PM due to their higher specific-binding constant. As a consequence, the concentration dependence of the intensity progressively becomes nonlinear and D starts to decrease. In the “as” range, the intensities of the PM and MM reach their asymptotic values owing to complete saturation. In typical hybridizations, this region is usually not reached. Note that the D vs. S coordinates are simply linear combinations of the PM and MM intensities. Hence, the hook curve can be interpreted as a special representation of the binding isotherm where the explicit dependence of the probe intensities on the (usually unknown) transcript concentrations is replaced by the (experimentally available) relation between the PM and the MM probe intensities. Here, the MM probes serve as an internal reference subjected essentially to the same hybridization law as the PM, however, with modified characteristics. Particularly, one expects to find different binding constants for specific and, possibly, also nonspecific binding. Let us denote the respective PM/MM ratios with s c ≡ K cPM,S / K cMM,S and nc ≡ K cPM,N / K cMM,N , respectively. Other hybridization characteristics are the mean background intensity of the PM due to nonspecific binding, NcPM, and the maximum intensity, Mc, referring to completely saturated probe spots. The coordinates of the start and end points of the hook curve, and, to a good approximation, also its maximum, can be directly related to basic hybridization characteristics. For example, the S coordinates of the start and end points, S(0) » log(NcPM) − 1/2 log(nc) and S (¥) » log(Mc), estimate the mean nonspecific background and the saturation intensity, respectively. The D coordinates of the start point and of the maximum, Ds(0) » log(nc) and Dmax » log(sc) + log(nc), are measures of the mean log difference between binding constants of the PM and MM for nonspecific and specific binding, respectively. Making use of these data, one obtains the “width” and the “height” of the hook curve, which estimate the mean binding strength of nonspecific hybridization, Sas(¥) − Ss(0) » log(XcPM,N) = −log(KcPM,N∙[N]), and the mean affinity gain for specific binding of the PM relatively to the MM, Dmax − Ds(0) » log(sc), respectively. The binding strength, XcPM,N, is a dimensionless measure of the concentration in units of the respective binding constant. A value of unity refers to a surface coverage of Q = 0.5 in the absence of specific transcripts. The mean affinity gain is directly related to the free energy difference due to the replacement of the complementary Watson–Crick
394
Binder, Preibisch, and Berger
pairing with a mismatched base pairing in the respective probe/ transcript duplexes (11). In summary, the hook curve spans a sort of natural metrics system for the expression estimates. It reflects essential hybridization characteristics in terms of its geometric dimensions: width, height, and “start” coordinates. 4.3. Examples: Chip Characteristics
Figure 3 shows a collection of representative hook curves taken from six hybridizations of human genome chips of different generations, a plant chip (Arabidopsis thaliana chip ATH-12501) and alternative hybridizations with cRNA and cDNA. Along the chip generations, the spot size of the probes decreases from 20 mm (U95), to 18 mm (U133A and U133Av2), and to 11 mm (U133-plus2). The reduction of spot size has enabled the number of probe sets per chip to be increased from 16,000 to 22,000, and to 54,000, respectively (33, 34). In addition, this development is accompanied by modifications of the reagent kits and the scanning technique. Importantly, probe selection has also been improved by applying more sophisticated genomic and thermodynamic criteria, especially after the U95 generation. The different shapes of the uncorrected hook curves of the U95 and U133 chips, particularly the broader N range of the former one, can be explained by the partially suboptimal probe quality of the U95 generation containing a relatively high number of weak-affinity probes. For the U133 series, the N range considerably narrows, essentially due to better quality of the probes. It is important to note that our affinity correction levels out this difference, to a large extent providing corrected hook curves of very similar shape for the U95 and U133 chips. The width of the fitted hook curves estimates the binding strength of the nonspecific background in “intrinsic” units of the respective binding constant (see above). A wider hook curve is equivalent to a lower level of nonspecific background and, thus, with an increased dynamic measurement range of the probe spots. The widths of the fits shown in Fig. 3 indicate that this range slightly increases with the chip generations (see also Table 2). In general, microarray technology takes advantage of either of two types of chemical entities as the labeled target, cRNA or cDNA, considered to be virtually equivalent for the purpose of expression analysis. Here we compare both options for illustrating the effect of the two binding “chemistries” on the chip characteristics. The substitution of cRNA by cDNA gives rise to essentially two effects (see Fig. 3): First, it increases the dynamic range by reducing the background level, and, second, it reduces the variability of the uncorrected background intensity. Among the two options, affinity correction to a much less extent improves the hook curve of the DNA hybridization. The higher nonspecific background level and variability of the RNA hybridization were
Calibration of Microarray Gene-Expression Data
395
Fig. 3. Hook curves of six different microarray hybridizations: raw hook (lower panel), affinity-corrected hook and number distribution (middle), and the fit of the specific part of the hook (upper panel) for human genome GeneChips of different generations (upper row of figures, Affymetrix HG-U95, HG-U133, and HG_U133_plus2) taken from the spiked-in data sets (37) and mixing series (26); and of a plant genome (lower row; Arabidopsis thaliana, ATH1_121501 array) and of hybridization with cRNA and cDNA (24). The vertical dotted line indicates the “break” of the hook curve that was used to estimate the number of “absent” probe sets given in percent for each hybridization in the figures. See also Table 2 for the mean hybridization characteristic of the respective experimental series.
(37)
(37)
(26)
(24)
(24)
Affymetrix spiked-in
Affymetrix spiked-in
Barnes dilution
Eklund spiked-in (cRNA)
Eklund spiked-in (cDNA)
HG-U133Av2 (6)
HG-U133Av2 (6)
HG-U133_plus2 (12)
HG-U133A (42)
HG-U95A (59)
HG-U95A (74)
Affymetrix chip (# of chips)
Frontal brain
(38)
HG-U95Av2 (6)
Patient cohort and cell line studies
(36)
Ref.
GeneLogic dilution
Calibration data sets
Data set
1.81 ± 0.13
1.58 ± 0.02
1.63 ± 0.01
1.62 ± 0.01
1.47 ± 0.02
1.93 ± 0.06
1.74 ± 0.13
log O
Optical background
1.87 ± 0.18
1.06 ± 0.01
1.55 ± 0.14
1.47 ± 0.08
1.54 ± 0.05
1.70 ± 0.05
1.54 ± 0.22
log N
Nonspecific background
4.80 ± 0.28
4.22 ± 0.03
4.51 ± 0.13
4.48 ± 0.03
4.20 ± 0.04
4.14 ± 0.09
4.27 ± 0.15
log M
saturation intensity
2.93 ± 0.25
3.16 ± 0.02
2.96 ± 0.15
3.01 ± 0.10
2.66 ± 0.05
2.44 ± 0.10
2.75 ± 0.20
log X N
N binding strength
0.91 ± 0.02
0.93 ± 0.03
1.08 ± 0.04
1.02 ± 0.03
0.85 ± 0.04
0.89 ± 0.04
1.00 ± 0.05
log s
PM/MM gain (s)
0.10 ± 0.01
0.04 ± 0.002
0.07 ± 0.01
0.08 ± 0.005
0.08 ± 0.005
0.07 ± 0.006
0.10 ± 0.015
log n
PM/MM gain (n)
0.30 ± 0.006
0.29 ± 0.003
0.36 ± 0.04
0.32 ± 0.0013
0.29 ± 0.003
0.30 ± 0.008
0.28 ± 0.008
σN
standard deviation of N-BG
Table 2 Mean hybridization characteristics of GeneChips estimated from different experimental series. The values are given as MED ± MAD, where MED is the median and MAD the median absolute deviation calculated from the respective values over the experimental series in logarithmic scale (log10)
396 Binder, Preibisch, and Berger
(40)
(41)
(42)
(43)
(44)
(45)
(46)
Colon cancer
Lymphocytic leukemia
Renal carcinoma
Mouse
Arabidopsis
Yeast
Rice
Rice (25)
1.62 ± 0.03
1.85 ± 0.06
1.84 ± 0.09
ATH1–121501 (16)
Yeast-2 (41)
1.86 ± 0.05
1.80 ± 0.11
1.60 ± 0.05
1.88 ± 0.13
1.84 ± 0.06
MOE430A (33)
HG-U133_plus2 (47)
HG-U133_plus2 (20)
HGU133Av2(20)
HG-U133A (221)
1.31 ± 0.06
1.44 ± 0.07
1.41 ± 0.15
1.55 ± 0.11
1.99 ± 0.09
1.29 ± 0.08
1.62 ± 0.12
1.96 ± 0.15
4.51 ± 0.09
4.60 ± 0.07
4.46 ± 0.06
4.42 ± 0.12
4.73 ± 0.09
4.32 ± 0.14
4.63 ± 0.04
4.49 ± 0.10
3.20 ± 0.10
3.16 ± 0.10
3.01 ± 0.15
2.87 ± 0.11
2.72 ± 0.10
3.03 ± 0.15
3.01 ± 0.12
2.43 ± 0.15
1.0 ± 0.03
1.05 ± 0.04
0.99 ± 0.06
0.98 ± 0.03
0.82 ± 0.03
0.87 ± 0.04
0.96 ± 0.05
0.85 ± 0.06
0.03 ± 0.008
0.002 ± 0.03
0.03 ± 0.007
0.06 ± 0.01
0.10 ± 0.006
0.06 ± 0.006
0.06 ± 0.01
0.09 ± 0.01
Median (med(x)) and median absolute deviation: MAD = 1.4·med(|x − med(x)|)) (the factor accounts for asymptotic normal consistency)
(39)
Malignant lymphomas
0.30 ± 0.01
0.31 ± 0.03
0.26 ± 0.004
0.29 ± 0.008
0.38 ± 0.02
0.30 ± 0.009
0.31 ± 0.01
0.35 ± 0.03
Calibration of Microarray Gene-Expression Data 397
398
Binder, Preibisch, and Berger
attributed to relatively stable mismatched “G·u wobble” base pairings in the RNA/DNA duplexes, which give rise to less specific binding compared with DNA/DNA hybridizations without such stable mismatch pairings (24). To generalize the discussed single-chip-related results, we collect mean values of these characteristics over experimental series taken from different studies dealing with calibration issues, biological samples, cancer specimen, different chip generations, and species (Table 2). In essence, most of the chip characteristics provide relatively similar values for the different series, despite the very heterogeneous origin of the data. The maximum intensity and the optical and nonspecific background levels vary roughly over three orders of magnitude. The PM affinity gain parameter for specific hybridization shows that the central mismatch of the MM causes, on the average, a tenfold (s ~ 7–11) increased affinity of the PM compared with that of the MM. On the contrary, for nonspecific binding, one expects, on the average, the same affinity for the PM and MM. The respective PM/MM gain parameter, however, indicates a small but significantly increased PM affinity, n ~ 1.05–1.25. We tentatively attribute this effect to false-positive detections in the N range, i.e., to a certain amount of specific hybridization among the absent probes (see below). The relatively narrow distributions of hybridization characteristics reflect the common physical–chemical basics of the method, for example, the oligonucleotide density and size of the probe spots, the common MM probe design, and hybridization conditions. The positional-dependent sensitivity terms, d e (Eq. 10), represent another type of chip characteristic because they are used to adjust the intensities of each microarray. Figure 4 shows the sensitivity profiles of the MM probes for three of the chips taken from Fig. 3; note the similar profiles of the two selected RNA hybridizations. Generally, one observes C>G>T>A for most of the sequence positions. On the contrary, for the DNA hybridization, this order changes to G>C>A»T. The positional-dependent sensitivity terms, d eɛ are directly related to the binding strength of base pairings in the probe/ target duplexes (10, 11, 35), which are basically independent of a particular hybridization but change with the chemical entity. In Fig. 4, we aggregated the 16 nearest-neighbor profiles into four single-base profiles for the sake of clarity. In addition, the maximum and minimum NN profiles are shown. For the RNA hybridizations, for example, adjacent CC provide the strongest intensity increment, whereas for DNA hybridization, one gets GG and CG. Note also the “dents” in the middle of the specific MM profiles. They reflect the effect of the mismatches on the binding strength with “molecular resolution.”
Calibration of Microarray Gene-Expression Data
399
Fig. 4. Sensitivity profiles of three chips shown in Fig. 3. Only the MM profiles for nonspecific (above) and specific (below) hybridization are shown. The PM profiles look similar to those of the nonspecific MM profiles. The 16 nearest neighborterms (NN) profiles are aggregated into four single-base profiles for the sake of clarity (symbols). In addition, each figure shows the two NN profiles with the largest positive and negative values. The profiles of the RNA hybridizations differ from those of the DNA hybridization due to the different binding chemistry.
4.4. Examples: Expression Values
For further validation of the method, we analyzed the Affymetrix Latin-square spiked-in and the GeneLogic dilution data sets (24). The corrected hook curves of selected chips of these series are shown in Figs. 5 and 7, respectively. The hook curves of the spiked-in series mainly reflect the hybridization of the cell extract, which was added in equal amounts to all hybridizations (Fig. 5). In addition, each chip contains a set of “spiked-in” probes covering the whole concentration range of the spikes (0–512 pM). The D vs. S coordinates of these spikes spread over the full range of the hook curve (see circles in Fig. 5). Their positions shift along the hook to the right with increasing transcript concentration. Probes without specific transcripts and probes with only tiny spiked-in concentrations accumulate mainly within the N range of the hook curve. In a simple approximation, we classify these probes as “absent” in analogy with the absent calls calculated by MAS5 (19). The insertion in Fig. 5 shows that both methods, hook and MAS5, provide very similar absent rates for the spikes. Note that the vertical shift between the MAS5 and hook data is due to the somewhat arbitrary choice of the threshold parameters used in both methods. It can be simply reduced by appropriate adjustment. Figure 6 shows the expression measures obtained from selected preprocessing methods as a function of the spiked-in concentration.
400
Binder, Preibisch, and Berger
Fig. 5. Hook curve of one spiked-in hybridization (HGU-133A). The open circles refer to the spiked-in probes. Their positions move along the hook to the right with increasing spiked-in concentration of the respective specific transcripts. The vertical line indicates the breakpoint between the N and mix regimes, which classifies the probes into absent and present ones. The insertion shows the fraction of absent probes as a function of the spiked-in concentration obtained from the hook and the MAS5 methods.
Perfect calibration refers consequently to a diagonal line of slope unity in this double-logarithmic plot. The hook and gcRMA methods clearly outperform MAS5 and RMA with respect to this criterion. Note that the reduced slope of the RMA curve indicates a systematic bias, which underestimates differential expression roughly by the square root of the true change, FCRMA » (FCtrue)0.5. Figure 6 also reveals that saturation gives rise to the flattening of all curves at high concentrations except that of the hook method, which corrects the data for this effect. Dilution of the hybridization solution in the dilution series gives rise to the progressive shift of the N range of the hook curve toward smaller abscissa values, leaving the position of the asymptotic “as” range unchanged (Fig. 7). The associated “widening” of the curve is compatible with the global decrease of the transcript concentration in this experiment (see above). This trend is also paralleled by the disappearance of the “sat” range, i.e., dilution globally decreases the occupancy of the probes.
Calibration of Microarray Gene-Expression Data
401
Fig. 6. Mean expression degree of all spiked-in probe sets as a function of the spiked-in concentration. The comparison of different preprocessing methods shows that the single-chip hook method performs roughly as well as the multi-chip method gcRMA. The diagonal lines of slope one refer to optimum calibration. The dotted diagonals indicate fivefold changes with respect to the dashed diagonal line. The smaller slope of MAS5 and especially of RMA compared with that of hook and gcRMA indicate the accuracy penalty of these methods. Note that the MAS5 and gcRMA curves are vertically shifted for the sake of clarity.
Figure 7b shows that the background intensity indeed changes almost linearly with dilution. The mean nonspecific background (N) is the log intensity average over the N range of the respective hooks. The optical background (O) referring to 2% of the darkest probes is obtained in step 1 of the algorithm. The total background (N + O) is independently obtained by omitting this optical background correction in the hook algorithm. The relation between the background levels indicates that the optical contribution gradually decreases with increasing transcript concentrations. Moreover, the residual slope of the O data shows that the “optical” background correction probably also comprises small contributions from nonspecific hybridization. Simple dilution does not change the component composition of the hybridization solution. Consequently, the amount of absent probe sets is expected to remain invariant in the different dilution steps. The respective fractions of absent probes obtained from the hook curves confirm this expectation (Fig. 7c). On the contrary, MAS5 provides an increasing amount of absent probes at smaller transcript concentrations, probably because the underlying algorithm converts probes with smaller intensities progressively into absent ones. The hook method uses the N region as classificatory criterion for absent probes. Obviously, it is more robust
402
Binder, Preibisch, and Berger
Fig. 7. (a) Hook curves of the dilution experiments for different amounts of RNA (see figure). The dashed curves are calculated using the two-species Langmuir isotherm assuming a common asymptotic maximum intensity value. On dilution, the position of the left branch of the hook shifts to smaller abscissa values, indicating the decrease of nonspecific hybridization. (b) Background level on dilution. The total background (N + O) decomposes into contributions due to the optical effects (O) and nonspecific hybridization (N). (c) The hook method provides a virtually constant fraction of absent probes on dilution, whereas MAS5 progressively overestimates absent calls.
against dilution effects than the probe intensity criterion used by MAS5 (19). Figure 8 illustrates the effect of dilution on the expression levels of selected probe sets. The expression data obtained from the hook algorithm correctly reflect the linear decrease of transcript concentration on dilution in contrast to the MAS5 and RMA expression levels, which remain virtually constant. The latter effect is the result of the used normalization algorithms,
Calibration of Microarray Gene-Expression Data
403
Fig. 8. Expression values of selected probes and methods on dilution. The concentration of the specific transcripts linearly decreases as reflected by the hook estimate. The other methods provide different, mostly constant expression estimates owing to normalization. Note that AFFXBioB3 is a hybridization control that is spiked into the hybridization solution with constant concentration. Again, the hook method well reproduces this behavior.
which, for MAS5 (global mean normalization) and RMA (quantile), balance the probe-level data relative to a mean characteristic over all dilution steps. This relative scale remains virtually invariant in this type of experiment. In contrast, the hook method uses an absolute scale, which sensitively responds to dilution effects. A set of special probes, the so-called hybridization controls, are spiked into the hybridizations with equal concentrations. The global normalizations pretend variable expression degrees for these probes over the dilution series (e.g., AFFXBioB3_at, see Fig. 8), whereas the hook expression values remain virtually constant as expected. Note that another effect is also revealed in the expression data shown in Fig. 8: The mean expression levels of the selected transcripts differ by more than three orders of magnitude. These absolute changes are accompanied by distinct variations between the expression levels provided by the different methods. For example, one gets RMA > hook at intermediate expressions (31432_at in Fig. 8) but partly hook > RMA at high (31463_at) and low (31491_at) levels. These trends can be attributed to the
404
Binder, Preibisch, and Berger
better linearity of the hook method over the whole concentration range, which reduces systematic biases due to background and saturation effects compared, e.g., with RMA (see also Fig. 6). 4.5. Download
The beta version of the hook program can be downloaded from http://www.izbi.de. The stand-alone JAVA program processes single chips and chip series in a batch mode according to the scheme given in Fig. 2. Chip and probe set-related characteristics such as expression degrees, hook curves, and sensitivity profiles are exported in tabular form and .jpg graphics. The detailed description of the method and selected applications are given in refs. (47, 48).
5. Conclusions The improvement of microarray calibration methods is an essential prerequisite for obtaining absolute expression estimates, which, in turn, are required for quantitative analysis of transcriptional regulation. Benchmark studies indicate that the correction for nonspecific background intensity contributions is the crucial preprocessing step. Here, mismatched MM probes provide essential information not available from PMonly approaches. Among established linear calibration approaches, gcRMA emerges as the method that makes the best compromise between accuracy and precision across the whole intensity range. The Langmuir hybridization model provides a physically adequate and computationally feasible approach for microarray intensity calibration, with the potency to improve existing linear methods. Our hook calibration method uses this model together with the positionaldependent nearest-neighbor affinity correction. Although related to single-chip analysis, the hook method performs roughly as well as the multi-chip method, gcRMA method, in estimating expression values. The hook method, in addition, provides a set of chip summary characteristics that evaluate the performance of a given hybridization in terms simple parameters such as the mean nonspecific background intensity, its saturation value, the mean PM/ MM sensitivity gain, and the fraction of absent probes.
Acknowledgments We thank Anke Wendschlag for performing some of the data calculations. The work was supported by the Deutsche Forschungsgemeinschaft under grant no. BIZ 6/4. H. Berger was supported
Calibration of Microarray Gene-Expression Data
405
by the Molecular Mechanisms in Malignant Lymphomas Network Project of the Deutsche Krebshilfe (grant no. 70-3173-Tr3) to which we are grateful for using the MMML gene expression data.
References 1. Binder, H. (2006), Thermodynamics of competitive surface adsorption on DNA microarrays – theoretical aspects, Journal of Physics Condensed Matter 18, S491–523. 2. Hekstra, D., Taussig, A. R., Magnasco, M., and Naef, F. (2003), Absolute mRNA concentrations from sequence-specific calibration of oligonucleotide arrays, Nucleic Acids Research 31, 1962–68. 3. Burden, C. J., Pittelkow, Y. E., and Wilson, S. R. (2004), Statistical analysis of adsorption models for oligonucleotide microarrays, Statistical Applications in Genetics and Molecular Biology 3, 35. 4. Binder, H., Kirsten, T., Loeffler, M., and Stadler, P. (2004), The sensitivity of microarray oligonucleotide probes – variability and the effect of base composition, Journal of Physical Chemistry B 108, 18003–14. 5. Binder, H., and Preibisch, S. (2006), GeneChip microarrays – signal intensities, RNA concentrations and probe sequences, Journal of Physics Condensed Matter 18, S537–66. 6. Burden, C. J., Pittelkow, Y. E., and Wilson, S. R. (2006), Adsorption models of hybridization and post-hybridization behaviour on oligonucleotide microarrays, Journal of Physics Condensed Matter 18, 5545–65. 7. Huber, W., von Heydebreck, A., Sueltmann, H., Poustka, A., and Vingron, M. (2002), Variance stabilization applied to microarray data calibration and to the quantification of differential expression, Bioinformatics 1, 1–9. 8. Durbin, B. P., Hardin, J. S., Hawkins, D. M., and Rocke, D. M. (2002), A variance-stabilizing transformation for gene-expression microarray data, Bioinformatics 18, 105–10. 9. Wu, Z., and Irizarry, R. A. (2005), A statistical framework for the analysis of microarray probe-level data, John Hopkins University, Dept. of Biostatistics Working Paper 73, 1–31. 10. Binder, H., and Preibisch, S. (2005), Specific and non-specific hybridization of oligonucleotide probes on microarrays, Biophysical Journal 89, 337–52. 11. Binder, H., Preibisch, S., and Kirsten, T. (2005), Base pair interactions and hybridization isotherms of matched and mismatched
oligonucleotide probes on microarrays, Langmuir 21, 9287–302. 12. Affymetrix (2001), Affymetrix Microarray Suite 5.0, in “User Guide”, Affymetrix, Inc., Santa Clara, CA. 13. Irizarry, R. A., Bolstad, B. M., Collin, F., Cope, L. M., Hobbs, B., and Speed, T. P. (2003), Summaries of Affymetrix GeneChip probe level data, Nucleic Acids Research 31, e15. 14. Irizarry, R. A., Hobbs, B., Collin, F., BeazerBarclay, Y. D., Antonellis, K. J., Scherf, U., and Speed, T. P. (2003), Exploration, normalization, and summaries of high density oligonucleotide array probe level data, Biostatistics 4, 249–64. 15. Wu, Z., Irizarry, R. A., Gentleman, R., Murillo, F. M., and Spencer, F. (2003), A model based background adjustment for oligonucleotide expression arrays, John Hopkins University, Dept. of Biostatistics Working Paper 1. 16. Li, C., and Wong, W. H. (2001), Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection, Proceedings of the National Academy of Sciences of the United States of America 98, 31–36. 17. Affymetrix (2005), Guide to probe logarithmic intensity error (PLIER) estimation. 18. Bolstad, B. M., Irizarry, R. A., Astrand, M., and Speed, T. P. (2003), A comparison of normalization methods for high density oligonucleotide array data based on variance and bias, Bioinformatics 19(2), 185–93. 19. Affymetrix (2002), Statistical Algorithms Description Document, Santa Clara. 20. Zhang, L., Miles, M. F., and Aldape, K. D. (2003), A model of molecular interactions on short oligonucleotide microarrays, Nature Biotechnology 21, 818–28. 21. Shedden, K., et al. (2005), Comparison of seven methods for producing Affymetrix expression scores based on False Discovery Rates in disease profiling data, BMC Bioinformatics 6, 26. 22. Hochreiter, S., Clevert, D.-A., and Obermayer, K. (2006), A new summarization method for Affymetrix probe level data, Bioinformatics 22, 943–49.
406
Binder, Preibisch, and Berger
23. Havilio, M. (2005), Signal deconvolution based expression-detection and background adjustment for microarray data, Journal of Computational Biology 13, 63–80. 24. Eklund, A. C., Turner, L. R., Chen, P., Jensen, R. V., deFeo, G., Kopf-Sill, A. R., and Szallasi, Z. (2006), Replacing cRNA targets with cDNA reduces microarray cross-hybridization, Nature Biotechnology 24, 1071–73. 25. Choe, S., Boutros, M., Michelson, A., Church, G., and Halfon, M. (2005), Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset, Genome Biology 6, R16. 26. Barnes, M., Freudenberg, J., Thompson, S., Aronow, B., and Pavlidis, P. (2005), Experimental comparison and cross-validation of the Affymetrix and Illumina gene expression analysis platforms, Nucleic Acids Research 33, 5914–23. 27. Qin, L.-X., Beyer, R., Hudson, F., Linford, N., Morris, D., and Kerr, K. (2006), Evaluation of methods for oligonucleotide array data via quantitative real-time PCR, BMC Bioinformatics 7, 23. 28. Ploner, A., Miller, L., Hall, P., Bergh, J., and Pawitan, Y. (2005), Correlation test to assess low-level processing of high-density oligonucleotide microarray data, BMC Bioinformatics 6, 80. 29. Verhaak, R., Staal, F., Valk, P., Lowenberg, B., Reinders, M., and de Ridder, D. (2006), The effect of oligonucleotide microarray data preprocessing on the analysis of patient-cohort studies, BMC Bioinformatics 7, 105. 30. Zakharkin, S., Kim, K., Mehta, T., Chen, L., Barnes, S., Scheirer, K., Parrish, R., Allison, D., and Page, G. (2005), Sources of variation in Affymetrix microarray experiments, BMC Bioinformatics 6, 214. 31. Freudenberg, J., Boriss, H., and Hasenclever, D. (2004), Comparison of preprocessing procedures for oligo-nucleotide microarrays by parametric bootstrap simulation of spike-in experiments, Methods of Information in Medicine 5, 434–38. 32. Irizarry, R. A., Wu, Z., and Jaffee, H. A. (2006), Comparison of Affymetrix GeneChip expression measures, Bioinformatics 22, 789– 94. 33. Affymetrix (2001), Array Design for the GeneChip Human Genome U133 Set. 34. Affymetrix (2003), GeneChip Human Genome U133 Arrays. 35. Binder, H., Kirsten, T., Hofacker, I., Stadler, P., and Loeffler, M. (2004), Interactions in oligonucleotide duplexes upon hybridisation
of microarrays, Journal of Physical Chemistry B 108, 18015–25. 36. GeneLogic dilution data: http://www.GeneLogic. dilution.com/. 37. Affymetrix spiked-in data set: http://www. affymetrix.com/support/technical/sample_ data/datasets.affx. 38. Deng, V., et al. (2007), FXYD1 is an MeCP2 target gene overexpressed in the brains of Rett syndrome patients and Mecp2-null mice, Human Molecular Genetics 16, 640–50. 39. Hummel, M., et al. (2006), A biologic definition of Burkitt’s lymphoma from transcriptional and genomic profiling, The New England Journal of Medicine 354, 2419–30. 40. Juhasz, A., Markel, S., Gaur, S., Wu, X., and Doroshow, J. (2007), Inhibition of NOX1 Gene Expression with shRNA in Human Colon Cancer, Gene Expression Omnibus GSE4561. 41. Malek, S. N., and Ouilette, P. N. (2007), Chronic lymphocytic leukemia (CLL) gene expression comparison, Gene Expression Omnibus GSE 9250. 42. Furge, K. A., Chen, J., Koeman, J., Swiatek, P., Dykema, K., Lucin, K., Kahnoski, R., Yang, X. J., and Teh, B. T. (2007), Detection of DNA copy number changes and oncogenic signaling abnormalities from gene expression data reveals MYC activation in high-grade papillary renal cell carcinoma, Cancer Research 67, 3171–76. 43. zur Nieden, N. I., Price, F. D., Davis, L. A., Everitt, R. E., and Rancourt, D. E. (2007), Gene profiling on mixed embryonic stem cell populations reveals a biphasic role for {beta}catenin in osteogenic differentiation, Molecular Endocrinology 21, 674–85. 44. Stepanova, A. N., Yun, J., Likhacheva, A. V., and Alonso, J. M. (2007), Multilevel interactions between ethylene and auxin in Arabidopsis roots, The Plant Cell 19, 2169–85. 45. Li, C. M., and Klevecz, R. R. (2006), From the cover: A rapid genome-scale response of the transcriptional oscillator to perturbation reveals a period-doubling path to phenotypic change, Proceedings of the National Academy of Sciences of the United States of America 103, 16254–59. 46. Jain, M., Nijhawan, A., Arora, R., Agarwal, P., Ray, S., Sharma, P., Kapoor, S., Tyagi, A. K., and Khurana, J. P. (2007), F-box proteins in rice. Genome-wide analysis, classification, temporal and spatial gene expression during panicle and seed development, and regulation by light and abiotic stress, Plant Physiology 143, 1467–83.
Calibration of Microarray Gene-Expression Data
47. Binder, H., Krohn, K., and Preibisch, S. (2008), “Hook” calibration of GeneChip-microarrays: chip characteristics and expression measures, Algorithms for Molecular Biology 3:11.
407
48. Binder, H., and Preibisch, S. (2008), “Hook” calibration of GeneChip-microarrays: Theory and algorithm, Algorithms for Molecular Biology 3:12.
Chapter 21 Meta-analysis of Cancer Gene-Profiling Data Xinan Yang and Xiao Sun Summary DNA microarray profiles are plagued by the issue of large number of variables but small number of samples and are often notorious for their low signal-to-noise ratio for clinical applications. Therefore, a great need for meta-analysis techniques is emerging to yield more valid and informative results than each experiment separately. By exploring the power of several studies in one single analysis, meta-analysis of many cancer gene-profiling data increases the statistical power to detect differentially expressed genes and allows assessment of heterogeneity. OrderedList is such a method that was specially proposed for cancer gene expression data meta-analysis. It is superior to other methods in that it does not rely on strong effects of differential gene expression in a single study but on consistent regulated genes across multiple studies. This chapter introduces the R implementation of this methodology on real data sets to identify biomarkers for adenocarcinoma lung cancer. Key words: Microarray, Gene-list comparison, Expression, Meta-analysis
1. Introduction With high-dimensional variables (thousands of genes), microarray data suffer from small numbers of samples and are often notorious for their low signal-to-noise ratio. However, microarray technologies are becoming more prevalent for cancer research, and it is now usual to find several gene expression data sets from different laboratories employing the same/different technologies to identify genes related to the same condition. Meta-analysis is a statistical technique for combining these quantitative findings from independent studies. Therefore, meta-analysis of gene-profiling data is increasingly required to integrate data sets that investigate
Robert Grützmann and Christian Pilarsky (eds.), Cancer Gene Profiling: Methods and Protocols, Methods in Molecular Biology, vol. 576 DOI 10.1007/978-1-59745-545-9_21, © Humana Press, a part of Springer Science + Business Media, LLC 2010
409
410
Yang and Sun
a common theme or disorder, and to yield more valid and informative results than each experiment separately (1). Meta-analysis is a way to enlarge the sample size of microarray data by integrating studies of the same theme. Moreover, it provides a possibility to discover a measurable commonness that exists between certain topics of independent studies. The danger is that, in amalgamating a large set of different studies, the construct definitions can become imprecise and, thus, it may be difficult to meaningfully interpret the results. The general assumption for meta-analysis is that many researches into one topic can be combined into a large study in terms of “effect sizes.” The effect size encodes the selected research findings (effects) on a numeric scale. It provides information regarding how much an expression change is evident either across all studies or for a subset of studies(2). Two common strategies for modeling the effect size of microarray include either transforming gene expression measures across studies or generating summaries such as p values, probabilities, or ranks (3, 4). This chapter describes the problems of meta-analysis, e.g., comparing between different chip platforms and the methods to solve these problems. Moreover, it introduces the related Bioconductor package OrderedList (5) and useful linkages. It focuses on combining summary measures of expression rather than expression measures, which better overcomes the difficulty in incorporating data across multiple platforms and laboratories. By accessing the original raw data (if available) that yielded the initial results, and observing the orderings of two independent statistics, OrderedList uncovers whether two lists of differentially expressed genes are significantly similar. This similarity is evaluated by weighted sum of size of overlapping in top ranks. The method is demonstrated by using public data from the same as well as different platforms. Corresponding R example codes are given in Subheading 3.3.
2. Materials (Public Microarray Databases)
The Minimum Information About a Microarray Experiment (MIAME) guidelines that outline the minimum information should be included when describing a microarray experiment. Popular public repositories that store cancer microarray data as well as support the MIAME requirements include: 1. Gene Expression Omnibus (GEO) – A database in NCBI for the public use and dissemination of gene expression data (6) (http://www.ncbi.nlm.nih.gov/geo/) (see Note 1).
Meta-analysis of Cancer Gene-Profiling Data
411
2. ArrayExpress – A public repository for microarray-based gene expression data maintained by the European Bioinformatics Institute (7) (http://www.ebi.ac.uk/arrayexpress/). 3. Oncomine – A collection of publicly available cancer microarray studies and data mining tools to efficiently query genes and data sets of interest (8). Oncomine Research Edition is freely available to academic and nonprofit organizations at (http://www.oncomine.org). 4. Stanford Microarray Database (SMD) – Stores two-color, spotted DNA microarrays for the entire scientific community (9) (genome-www5.stanford.edu/). 5. Others.
3. Methods The validity of a meta-analysis depends on the quality of the systematic review on which it is based. Good meta-analyses aim at complete coverage of all relevant studies including heterogeneity studies, and explore the robustness of the main themes using sensitivity analysis. Thus, the design of meta-analysis is the core and involves clinical and biology know-ledge. In each study, samples are divided into at least two distinct classes. To integratively evaluate multiple independent data sets that investigate a common theme or disorder, it is important to choose the classes one is interested in comparing. Here, we simply compare healthy lung and adenocarcinoma (a type of lung cancer) as an example. The base of multiple comparisons is pair-wise comparison. The methods of the main package, OrderedList, provide a comparison of two summary measures, say, effect sizes. Separately, each effect size is generated from the chosen two conditions of a certain clinical/biological theme or disorder for each study (see Note 2). It is important that although each single study might not necessarily reveal significant changes, one might observe considerable overlap in the top-ranking genes. Moreover, consensus changes reflecting the addressed theme or disorder would always lead orders in different studies. Hence, the number of overlapping genes is first computed for the pair of lists along their ranks. Then a similarity score is assigned to this comparison of two ranked (ordered) gene lists (see Note 3). In principle, the similarity score is a weighted sum of the size of overlap in the top ranks, with more weight placed on the top ranks (10).
412
Yang and Sun
3.1. For Gene Lists with Expression Levels 3.1.1. Data Selection and Collection
Select the experiments addressing the same clinical/biological problem. For example, to compare the differential gene expression between human adenocarcinoma and healthy lung tissues, we select two GEO data sets and four Oncomine gene lists. The two GEO data sets are based on different microarray technologies and used as examples for meta-analysis data with expression levels. The other gene lists from Oncomine provide an example of performing meta-analysis based directly on gene lists. 1. GSE1987 (see Subheading “Download and Convert Expression Data”): GSE1987 is a non-small cell lung cancer data set with 37 cases, based on Affymetrix Hgu95av2 gene chips. It includes seven samples of adenocarcinoma, seven samples of healthy lung tissues adjacent to the tumors, two commercial samples of normal lung RNA (see Note 4), and others. 2. GDS619 (see Subheading “Download, Impute, and Convert Expression Data”): GDS619 collects high-grade human lung tumor groups, including 12 adenocarcinoma and 19 healthy lung samples, based on two-color spotted complementary DNA (cDNA) microarrays. 3. Four whole expressed gene lists (see Table 1) of t statistics on healthy versus adenocarcinoma, downloaded from Oncomine Research Edition, version 3.5 (see Note 5, Subheading “Download and Read Gene Lists into R”).
3.1.2. Data Preprocessing
All data with available raw profiles are recommended to be preprocessed (11), respectively or together (see Note 6). For the data stored in GEO, one can simply download the preprocessed data in the format of Simple Omnibus Format in Text (SOFT). 1. Preprocessed expression values are then base-two log-transformed where applicable (see Note 7, Subheading “Get the GSE Data You Wanted”).
Table 1 The gene lists downloaded from Oncomine (Research Edition version 3.5) for comparisons of healthy lung tissue versus lung adenocarcinoma ID
Author
Platform
No. healthy
No. cancer
G1
Beer
HumanGeneFL Array
10
86 (17)
G2
Bhattacharjee
Human Genome U95Av2 Array
17
139 (13)
G3
Stearman
Human Genome U95Av2 Array
19
20 (14)
G4
Garber
Spotted Array
6
40 (18)
Meta-analysis of Cancer Gene-Profiling Data
413
2. Process the phenoData of each study and decide on the features (conditions) to be compared across studies (see Subheadings “Build phenoType Table from the Description of Each Sample” and “Observe and Process the phenoData”). 3. Build the ExpressionSet object for each study, respectively (see Subheading “Convert to ExpressionSet Object” in two occurrences under Subheading 3.3.1). 4. Microarray features with more than half of the values missing across all arrays per study are not considered for further analysis (see Subheadings “Remove the Probesets with NA Values Across Samples” and “Keep the Probesets with Less Than 50% NA Values”). 5. To use the package OrderedList, the data with missing values are imputed by replacing them via nearest-neighbor averaging (see Note 8, Subheading “Impute Missing Expression Values”). 6. Variables (genes) are matched for across-studies comparison. If studies are based on different platforms and only a subset of genes can be mapped from one chip to the other, one must provide this information via the argument mapping in the function prepareData. Here, we used a mapping between the manufacturer’s identifiers and UniGene identifiers as example (see Note 9, Subheading “Observe the Common Identifiers” under Subheading 3.3.1). 7. Prepare a collection of two expression sets of class exprSet by calling the function prepareData (see Notes 10 and 11, Subheading “Combine Two Studies Into One Expression Set”). These data sets are then merged into one exprSet together with the rearranged phenoData, and the argument mapping if the studies are based on different platforms. 3.1.3. Evaluate the Significance of the Similarity Score
1. Within each study, a gene-wise test on the difference of class means is conducted as the effect size (12). To do this we performed a t test with regularized variances, z test, as an example (see Note 12). 2. Decide the parameter beta ∈{0.5,1} : beta = 1: The class labels of two studies match each other. That is, the first class label of study A has the same interpretation as the first class label of study B. The same principle applies for the second class labels (see Subheading “Detect Similarities of Two Expression Studies”). beta = 0.5: The class labels do not match. For example, study A compares different tumor grades whereas study B compares different tissues. Now, the orientation of the two lists is not clear. Thus, both the similarities of the originally ordered lists as well as the similarity of one list to the other list in flipped orientation are taken into account.
414
Yang and Sun
Fig. 1. A data driven parameter alpha helps to provide the best signal-to-noise separation for similarity scores. A datadriven optimal alpha is chosen where the pAUC scores are maximally separating the distribution of observed and random similarity scores (11). A vertical line marks the optimal alpha.
3. Decide the parameters B and alpha to decide how many ranks should be taken into account (10). This can be done with two parameters:
B is the number of internal subsamplings needed to achieve an optimized alpha*. An example result is shown in Fig. 1. alpha is a vector of weighting parameters. If set to NULL (the default), the parameters are computed such that the top 100–2,500 ranks receive weights above min.weight = 1 × 10−5. A smaller alpha counts for more ranks in the ordered gene list to calculate the similarity score. The optimal alpha gets the highest pAUC score that separates B times random scores from alternative scores and, thus, provides the best signal-to-noise separation (see Fig. 2). 3.1.4. Get the Contributing Identifiers that Drive the Similarity
The output of function OrderedList also gives a vector with sorted probe IDs of the overlapping genes that contribute percent (95% as default) to the overall similarity score (see Subheading “Get
Meta-analysis of Cancer Gene-Profiling Data
415
Fig. 2. The red (right) curve corresponds to simulated observed scores and the black (left) curve corresponds to simulated random scores. The vertical red line denotes the actually observed similarity score. These two kernel density estimates of score distributions underlie the pAUC score for the optimal alpha, as shown in Fig. 1. The bottom rugs mark the simulated values.
the Contributing Identifiers that Drive the Similarity” under Subheading 3.3.1). 3.2. For Gene Lists Without Expression Levels (for Each Sample) 3.2.1. Compare Between the Same Technological Platforms
The examples of comparison between the same platforms are given in Subheading “Compare Between the Same Affymetrix Chips” for Affymetrix Hgu95av2 (GPL91). Two preprocessed gene lists, one reported by Bhattacharjee et al. (13) and another reported by Stearman et al. (14) are given (see Table 1). In addition, Subheading “Compare Between Different Affymetrix Chips” gives R codes to do comparison between different Affymetrix chips as an example. 1. Use probeset IDs as identifiers if comparing between the same platform, otherwise, use the Unigene ID (see Note 13). 2. Find the common identifiers between the two studies (see Note 14).
416
Yang and Sun
Fig. 3. The numbers of overlapping genes in the two gene lists generated from different platforms but for the same comparison of healthy versus adenocarcinoma lung tissue. The overlap size is drawn as a step function over the respective ranks. The top ranks correspond to upregulated and the bottom ranks to downregulated genes. In addition, the expected overlap and 95% confidence intervals derived from a hypergeometric distribution are shown as filled background. The similarity is also significant here (p = 0).
3. Order the lists of identifiers to be compared. 4. Compare the ordered lists with weighted overlap score using the function compareLists. 5. Get the contributing identifiers that drive the similarity by calling the function getOverlap (see Fig. 3). 3.2.2. Comparison Between Different Technological Platforms
The examples of comparison between different microarray technological platforms, i.e., cDNA arrays and Affymetrix oligonucleotide arrays are given in Subheading “Comparison Between Different Microarray Technological Platforms.” Although many arguments exist, it is reported that the log ratios of the highly expressed genes are strongly correlated, especially between Affymetrix and cDNA arrays(15). 1. Make one gene equal one statistic for the lists to be compared. This step is required because many genes are detected with
Meta-analysis of Cancer Gene-Profiling Data
417
multiple probesets and it is difficult to map them between different microarray technological platforms. These genes are first presented by the probeset with highest statistic or highest variance (see Note 15). 2. Check the one-to-one relationship between the two lists to be compared. 3. Observe the common one-to-one mapped identifiers. 4. Order the lists of identifiers to be compared. 5. Compare the ordered lists with the weighted overlap score by calling the function compareList. 6. Get the contributing identifiers that drive the similarity, if significant. 3.3. Examples 3.3.1. R Examples for Comparing Gene Lists with Expression Data 3.3.1.1. Download and Convert Expression Data
# adenocarcinoma (AC) vs. healthy human lung samples # R version 2.6.0 (2007-10-03) library(“GEOquery”) library(“impute”) # (see Note 5) library(“OrderedList”) library(“hu6800”) library(“hgu95av2”) require(“Biobase”)
3.3.1.1.1. Get the GSE Data You Wanted
gse