Methods
in
Molecular Biology
Series Editor John M. Walker School of Hertfordshire Hatfield, Hertfordshire, AL10 9AB, UK
For other titles published in this series, go to www.springer.com/series/7651
Plant MicroRNAs Methods and Protocols
Edited by
Blake C. Meyers and Pamela J. Green Department of Plant & Soil Sciences, and Delaware Biotechnology Institute, University of Delaware, Newark, DE, USA
Editors Blake C. Meyers Department of Plant & Soil Sciences, and Delaware Biotechnology Institute, University of Delaware, Newark, DE, USA
Pamela J. Green Department of Plant & Soil Sciences, and Delaware Biotechnology Institute, University of Delaware, Newark, DE, USA
ISSN 1064-3745 e-ISSN 1940-6029 ISBN 978-1-60327-004-5 e-ISBN 978-1-60327-005-2 DOI 10.1007/978-1-60327-005-2 Springer New York Dordrecht Heidelberg London Library of Congress Control Number: 2009933463 © Humana Press, a part of Springer Science+Business Media, LLC 2010 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Humana Press, c/o Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. While the advice and information in this book are believed to be true and accurate at the date of going to press, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein. Printed on acid-free paper Humana Press is part of Springer Science+Business Media (www.springer.com)
Preface We have assembled a set of protocols that we believe represent the state of the art for laboratory and computational analyses of plant microRNAs (miRNAs). These small, noncoding RNA molecules of about 21 nucleotides have made a grand entrance onto the scientific stage with their discovery in the 1990s and their stunning ascent to stardom in the early years of the current decade. Along the way, it has been demonstrated that miRNAs are simply one of several classes of small RNAs produced in plant cells, albeit a particularly important class given the broad phylogenetic conservation and strong regulatory effects of many miRNAs. Plant miRNAs are uniquely interesting for their ancient evolutionary origins and their strong post-transcriptional regulatory effects. Most chapters of this volume focus on the identification, validation, and characterization of the miRNA class of RNAs. However, a topic that cannot avoid mention is the other classes of small RNAs that are biochemically similar in size and composition (although of somewhat genetically distinct origins). For example, in the cloning and characterization of plant miRNAs, the resulting data set is rich with another significant class of small RNAs, substantially more prevalent in plants than other organisms – the heterochromatic, small, interfering RNAs (heterochromatic siRNAs). As you will see, the methods contained in this volume emphasize miRNA analyses, but include ways to distinguish one class of small RNAs from another. One chapter describes how to characterize candidate members of a unique class of siRNAs (trans-acting siRNAs) that are dependent on the action of miRNAs to initiate their formation. The biogenesis of miRNAs is dependent on an increasingly well-defined set of proteins that include enzymes for the precise excision of the mature miRNA from a longer precursor, as well as the modification of this mature miRNA, export from the nucleus, and interaction with its target. This volume starts with a chapter that clearly lays out the cellular participants in the production of miRNAs and the utility of studying these genes and gene products. Discussions of the genes that encode these proteins are found throughout the volume, as partial or full loss-of-function mutants in these genes are important components in the toolbox for studying miRNAs. One set of chapters describes the standard approaches for purification of the working material for the study of miRNAs: the small RNA component of the transcriptome. While not always easy, depending on the composition and source of the plant tissue that is used, the protocols describe here should cover a broad swath of even the most recalcitrant plant materials, resulting in very high quality RNA that can be used for library construction. An alternative approach described in one of the chapters is the isolation of small RNAs associated with Argonaute proteins. With purified small RNA in hand, one is ready to begin the characterization of small RNAs by any one of numerous experimental approaches, the most common of which is deep sequencing by “next generation” technologies, a process that leads to datasets on the scale of millions of small RNAs per reaction. This volume includes a protocol for the generation of sequencing libraries from purified small RNA. After sequencing, the next significant challenge faced by the experimentalist will be the handling of the data – trimming, mapping, organizing, and analyzing the millions of short sequence reads.
v
vi
Preface
Several chapters describe methods for analyzing miRNA-directed regulation of target RNAs. In its most basic form, the identification of the regulatory targets of plant miRNAs is based on the observation of a near-perfect complementarity between miRNAs and their mRNA targets. This makes computational-based target prediction simpler in plants than in animals. The pairing of miRNAs with an mRNA, in plants, typically results in cleavage of the mRNA target. Such approaches to target prediction in plants are addressed in this volume, as are both standard approaches to validating specific target cleavage events and the exciting development of genome-wide methods to characterize cleaved mRNAs in a single library. This volume includes a series of chapters that discuss approaches to analyzing the functional role of plant miRNAs. This includes computational methods for the prediction of plant miRNA targets and the experimental methods that can supply validation data to support these predictions. Computational methods have also been applied to the study of gene regulatory sequences in promoters, an application that works well to identify promoter elements and potential transcription factor binding motifs in plant miRNA precursors. Regulatory elements contribute to the regulation of expression of miRNAs in response to stresses such as biotic and abiotic stress; under such conditions, miRNAs in turn regulate other transcripts creating variation in both their levels and expression patterns. In situ hybridizations have long been used for the localization of messenger RNAs and proteins, but an exciting recent application of this methodology is the ability to localize miRNAs in plant tissues using a new generation of highly sensitive probes. And when these data have been brought together to infer the function of a miRNA, plants are amenable to an assessment of this predicted function using transient assays. All of these topics are the subjects of chapters in this volume, and we believe that these will provide valuable contributions and useful material for our readers’ experimental work. Interestingly, studies of the biology of plant miRNAs have been somewhat turned on their head with the realization that miRNAs can also be used as tools themselves for the study of the biological function of other genes. A chapter in this volume describes the development of artificial miRNAs, a powerful tool for the investigation of gene function with current applications in both forward and reverse genetics experiments. While most of the initial studies and basic approaches have been developed in Arabidopsis, many exciting advances are sure to come from the application of these and other methods to other plant species. Indeed, our goal in editing this book is to provide the community with a set of protocols that will help advance miRNA research for all plant species. To this end, all or nearly all of the protocols could be used for any plant species of interest. We have included several chapters that will be of particular interest to plant biologists working in non-model species, including a set of approaches for RNA purification from quite diverse species and tissues, as well as an overview of computational methods to handle data from a broad set of species. MicroRNA activities and stability are dependent on a series of modifications and processing. One of the most well-characterized modifications is 3′ methylation, an activity carried out by the HUA ENHANCER 1 methyltransferase, which may stabilize the miRNA by preventing uridylation and by diminishing exonuclease-mediated degradation. One chapter in this volume describes the methods by which 3′ methylation can be assessed. Surely there are many additional advances yet to come in this field, contributing new methods in parallel to making additional strides in our understanding of the biology of small RNAs. As an example, a collaborative effort of many of the labs that contributed to this volume led to an overhaul of the criteria used to define plant miRNAs (1). That manuscript
Preface
vii
came out too late to be fully reflected in this volume, and other rapid changes in the field will keep this as one of the fastest moving fields in plant biology. Much as deep sequencing represented a tremendous advance, dare we say a revolution in the study of small RNAs, there are unquestionably greater things yet to come in the methods for the analysis of plant miRNAs. This may include further advances in understanding the cell specificity, abundance levels, trafficking, modifications, target interactions, or biological roles. In summary, we are excited by the prospect of the experiments that this volume may facilitate or even inspire. As students (broadly speaking), practitioners, or theorists in the field of plant molecular biology, we hope that you will find many or all of these chapters to be of use in your work. The chapters may serve to introduce you to a new field of work, or extend your capabilities in a topic in which you are already quite familiar. While a mastery of the techniques in this volume is not a requisite for success in the field of small RNAs, an incredible group of contributors has contributed a set of protocols. Finally, this book would not have come to fruition without the careful editorial and administrative assistance of Sharon Bancroft, along with additional administrative help by Charlotte McDermitt and Kathy Fleischut. Most of all, we are incredibly grateful to the contributing groups who have taken their time to describe in exquisite detail the methods, tips, and tricks that they use. We hope that you find this unique collection of protocols helpful to your research. Blake C. Meyers Pamela J. Green Newark, DE Reference 1. Meyers BC, Axtell MJ, Bartel B, Bartel DP, Baulcombe D, Bowman JL, Cao X, Carrington JC, Chen X, Green PJ, Griffiths-Jones S, Jacobsen SE, Mallory AC, Martienssen RA, Poethig RS, Qi Y, Vaucheret H, Voinnet O, Watanabe Y, Weigel D, Zhu JK (2008) Criteria for annotation of plant microRNAs. Plant Cell 20:3186–3190
Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
v xi
1 Piecing the Puzzle Together: Genetic Requirements for miRNA Biogenesis in Arabidopsis thaliana . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zhixin Xie
1
2 Prediction of Plant miRNA Genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Matthew W. Jones-Rhoades 3 Methods for Isolation of Total RNA to Recover miRNAs and Other Small RNAs from Diverse Species . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Monica Accerbi, Skye A. Schmidt, Emanuele De Paoli, Sunhee Park, Dong-Hoon Jeong, and Pamela J. Green
19
31
4 miRNA Target Prediction in Plants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Noah Fahlgren and James C. Carrington
51
5 A Method to Discover Phased siRNA Loci . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michael J. Axtell
59
6 Directed Gene Silencing with Artificial MicroRNAs . . . . . . . . . . . . . . . . . . . . . . . Rebecca Schwab, Stephan Ossowski, Norman Warthmann, and Detlef Weigel
71
7 Bioinformatics Analysis of Small RNAs in Plants Using Next Generation Sequencing Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kan Nobuta, Kevin McCormick, Mayumi Nakano, and Blake C. Meyers
89
8 High-Throughput Approaches for miRNA Expression Analysis . . . . . . . . . . . . . . 107 Cheng Lu and Frédéric Souret 9 In Situ Detection of miRNAs Using LNA Probes . . . . . . . . . . . . . . . . . . . . . . . . 127 Zoltán Havelda 10 Analysis of miRNA Modifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Bin Yu and Xuemei Chen 11 MicroRNA Promoter Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Molly Megraw and Artemis G. Hatzigeorgiou 12 Computational Methods for Comparative Analysis of Plant Small RNAs . . . . . . . 163 Gayathri Mahalingam and Blake C. Meyers 13 Biotic Stress-Associated microRNAs: Identification, Detection, Regulation, and Functional Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Florence Jay, Jean-Pierre Renou, Olivier Voinnet, and Lionel Navarro
ix
x
Contents
14 Abiotic Stress-Associated miRNAs: Detection and Functional Analysis . . . . . . . . . 203 Dong-Hoon Jeong, Marcelo A. German, Linda A. Rymarquis, Shawn R. Thatcher, and Pamela J. Green 15 Processing of miRNA Precursors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 Yukio Kurihara and Yuichiro Watanabe 16 Purification of Arabidopsis Argonaute Complexes and Associated Small RNAs . . . 243 Yijun Qi and Shijun Mi 17 Transient Assays for the Analysis of miRNA Processing and Function . . . . . . . . . . 255 Felipe F. de Felippes and Detlef Weigel Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
Contributors Monica Accerbi • Delaware Biotechnology Institute, University of Delaware, Newark, DE, USA Michael J. Axtell • Department of Biology, Pennsylvania State University, University Park, PA, USA James C. Carrington • Department of Botany and Plant Pathology, Center for Genome Research and Biocomputing, Oregon State University, Corvallis, OR, USA Xuemei Chen • Department of Botany and Plant Sciences, Institute of Integrative Genome Biology, University of California, Riverside, Riverside, CA, USA Felipe F. de Felippes • Max Planck Institute for Developmental Biology, Tübingen, Germany Emanuele De Paoli • Delaware Biotechnology Institute, University of Delaware, Newark, DE, USA Noah Fahlgren • Department of Botany and Plant Pathology, Center for Genome Research and Biocomputing, and Molecular and Cellular Biology Graduate Program, Oregon State University, Corvallis, OR, USA Marcelo A. German • Delaware Biotechnology Institute, University of Delaware, Newark, DE, USA Pamela J. Green • Department of Plant and Soil Sciences, School of Marine Science and Policy, and Delaware Biotechnology Institute, University of Delaware, Newark, DE, USA Artemis G. Hatzigeorgiou • Department of Genetics, Center for Bioinformatics, School of Medicine, Department of Computer and Information Science, School of Engineering, University of Pennsylvania, Philadelphia, PA, USA Institute of Molecular Oncology, Biomedical Sciences Research Center “Alexander Fleming”, Vari-Athens, Greece Zoltán Havelda • Agricultural Biotechnology Center, Gödöllo˝, Hungary Florence Jay • Institut de Biologie Moléculaire des Plantes, CNRS UPR2353 – Université Louis Pasteur, Strasbourg Cedex, France Dong-Hoon Jeong • Delaware Biotechnology Institute, University of Delaware, Newark, DE, USA Matthew W. Jones-Rhoades • Department of Biology, Knox College, Galesburg, IL, USA Yukio Kurihara • Department of Life Sciences, University of Tokyo, Tokyo, Japan Cheng Lu • DuPont Agricultural Biotechnology, RT 141 & Henry Clay, Wilmington, DE, USA Gayathri Mahalingam • Delaware Biotechnology Institute, University of Delaware, Newark, DE, USA Kevin McCormick • Delaware Biotechnology Institute, University of Delaware, Newark, DE, USA
xi
xii
Contributors
Molly Megraw • Department of Genetics, Center for Bioinformatics, School of Medicine, University of Pennsylvania, Philadelphia, PA, USA, Institute for Genome Sciences & Policy, Duke University, Durham, NC, USA Blake C. Meyers • Department of Plant and Soil Sciences, and Delaware Biotechnology Institute, University of Delaware, Newark, DE, USA Shijun Mi • National Institute of Biological Sciences, Beijing, China Mayumi Nakano • Delaware Biotechnology Institute, University of Delaware, Newark, DE, USA Lionel Navarro • Institut de Biologie Moléculaire des Plantes, CNRS UPR2353 – Université Louis Pasteur, Strasbourg Cedex, France Kan Nobuta • Delaware Biotechnology Institute, University of Delaware, Newark, DE, USA Stephan Ossowski • Department of Molecular Biology, Max Planck Institute for Developmental Biology, Tübingen, Germany Sunhee Park • Delaware Biotechnology Institute, University of Delaware, Newark, DE, USA Yijun Qi • National Institute of Biological Sciences, Beijing, China Jean-Pierre Renou • UMR Génomique Végétale INRA-CNRS-UEVE, 2 rue G.Crémieux, Evry, France Linda A. Rymarquis • Delaware Biotechnology Institute, University of Delaware, Newark, DE, USA Skye A. Schmidt • School of Marine Science and Policy and Delaware Biotechnology Institute, University of Delaware, Newark, DE, USA Rebecca Schwab • Institut de Biologie Moléculaire des Plantes (CNRS), Strasbourg Cedex, France FrÉdÉric Souret • Affymetrix Inc., Cleveland, OH, USA Shawn R. Thatcher • Chemistry–Biology Interface Program, Department of Chemistry and Biochemistry, and Delaware Biotechnology Institute, University of Delaware, Newark, DE, USA Olivier Voinnet • Institut de Biologie Moléculaire des plantes, CNRS UPR2353–Université Louis Pasteur, Strasbourg Cedex, France Norman Warthmann • Department of Molecular Biology, Max Planck Institute for Developmental Biology, Tübingen, Germany Yuichiro Watanabe • Department of Life Sciences, University of Tokyo, Tokyo, Japan Detlef Weigel • Department of Molecular Biology, Max Planck Institute for Developmental Biology, Tübingen, Germany Zhixin Xie • Department of Biological Sciences, Texas Tech University, Lubbock, TX, USA Bin Yu • Department of Botany and Plant Sciences, Institute of Integrative Genome Biology, University of California, Riverside, Riverside, CA, USA, Center for Plant Science Innovation, University of Nebraska, Lincoln, NE, USA
Chapter 1 Piecing the Puzzle Together: Genetic Requirements for miRNA Biogenesis in Arabidopsis thaliana Zhixin Xie Abstract MicroRNAs (miRNAs) are an important class of endogenous small silencing RNAs in both plants and animals. They regulate the expression of a wide range of target genes that are involved in many important biological processes. Biogenesis of plant miRNAs requires a distinct set of proteins, including members that belong to several highly conserved RNA silencing protein families. The framework for miRNA biogenesis in plants was revealed through genetic and biochemical analyses using mutants that are defective in miRNA accumulation. These general miRNA-deficient mutants constitute a set of invaluable genetic resources for the plant miRNA research community. They could be utilized to experimentally validate the candidate miRNAs that are either predicted by a computational program or recovered from a small RNA deep sequencing effort which is becoming a more affordable and widely used approach for small RNA discovery. Starting with a brief introduction on multiple small RNA pathways in plants, this chapter provides basic experimental procedures for the examination of miRNA accumulation from wild type plants and various mutant lines in Arabidopsis. Key words: miRNA biogenesis, Arabidopsis thaliana, miRNA-deficient mutants, DICER-LIKE1 (DCL1), HUA ENHANCER1 (HEN1), ARGONAUTE1(AGO1), HYPONASTIC LEAVES1 (HYL1), SERRATE (SE), HASTY(HST), miRNA detection
1. Introduction Nearly 7 years have passed since microRNAs (miRNAs) and other endogenous small silencing RNAs were discovered in plants (1–4), shortly after the seminal discovery of silencing-associated small RNAs in plants (5), and the cloning of miRNAs from animals (6–8). Much has been learned about the genomic origin, complexity, biogenesis pathway, biological function, and possible mechanisms of evolution for these small regulatory RNA molecules (reviewed B.C. Meyers and P.J. Green (eds.), Plant MicroRNAs, Methods in Molecular Biology, vol. 592 DOI 10.1007/978-1-60327-005-2_1, © Humana Press, a part of Springer Science+Business Media, LLC 2010
1
2
Xie
in (9–12). This chapter will describe the reverse genetics approach and the related experimental procedures that have been used to elucidate the miRNA biogenesis pathway in the model plant Arabidopsis thaliana. Essentially, a generic miRNA pathway gene could be established if the loss-of-function mutation leads to the loss of miRNA production, and therefore to the loss of miRNA accumulation. This approach should also be applicable to other plant species as long as the required genomic and genetic resources are available. The method can be applied to experimentally validate the candidate miRNAs that are either predicted by a computational program or recovered from a small RNA deep sequencing effort which is becoming a more affordable and widely used approach for miRNA discovery (13–17). Since the targeted audience of these methods series are those who are either new to miRNA research or less familiar with the recent development of the field. This chapter will begin with a brief overview of the multiple small RNA pathways in plants. Detailed protocols will follow for polymerase chain reaction (PCR)-based mutant genotyping as well as miRNA detection by northern blot assays. 1.1. The RNA Silencing Machinery in Plants
The generation of small RNAs of 21–24-nucleotides (nt) is a hallmark of RNA silencing, which involves a set of evolutionarily conserved proteins that are found exclusively in eukaryotes (10). The proteins that belong to the families of DICER (DCR) or DICER-LIKE (DCL), ARGONAUTE (AGO), and RNADEPENDENT RNA POLYMERASE (RDR) form the core RNA silencing machinery in plants (9, 10). Both the DCR/DCL and the AGO family proteins are highly conserved across the eukaryotic kingdoms while the RDR family proteins appear to be missing in mammals and insects. Perhaps one of the most remarkable features of the RNA silencing system in plants is the existence of multiple RNA silencing factors that form functionally distinct modules for small RNA generation and function (10, 18). The A. thaliana genome, for example, contains four expressed DCL genes, six RDR genes, and ten AGO genes (9). It has become clear that the high complexity of the regulatory small RNA component in plants is largely attributable to the proliferation and subsequent functional diversification of the RNA silencing machinery on an evolutionary time scale.
1.2. miRNAs and Other Small RNAs in Plants
Although chemically similar, the small silencing RNAs in plants can be classified into two broad categories: miRNAs and small interfering RNAs (siRNAs) (10). The major difference between these two categories lies in their genomic origin and precursor structures. miRNAs arise from defined genetic loci – the MIRNA genes, which are found predominantly within genomic segments previously annotated as intergenic regions (IGRs) (11).
Piecing the Puzzle Together
3
Typically, RNA polymerase II (Pol II) transcription from a MIRNA locus gives rise to a miRNA primary transcript (pri-miRNA) that is capable of forming a characteristic fold-back hairpin structure (19–21). This intra-molecular, imperfect dsRNA structure is recognized and processed by a DCL1-containing complex to give rise to a miRNA/miRNA* duplex, presumably through a multi-step process involving a stem-loop intermediate (pre-miRNA) (19, 22– 24). The fate of each strand in a miRNA/miRNA* duplex, appears to be predetermined by the thermodynamic features of the duplex (25, 26). That is, the miRNA strand is selectively incorporated into an AGO1-containing, RNA-induced silencing complex (RISC) upon unwinding, whereas the miRNA* strand is often short-lived (27–29). Mature plant miRNAs are typically 21-nt in size. They negatively regulate gene expression by guiding target recognition and cleavage through the AGO1-containing miRISC (11). SiRNAs, on the other hand, arise from perfect dsRNA precursors that are formed through various mechanisms, typically involving an RDR activity. The dsRNA precursors are processed into small RNA duplexes by one of the other three DCLs (DCL2, DCL3, or DCL4). Genetic and small RNA deep sequencing analyses have revealed at least three siRNA-generating systems in Arabidopsis, each requiring a distinct set of DCL and RDR proteins (10, 12). (1) Heterochromatin-associated siRNAs: These are typically 24-nt small RNAs associated with genomic repetitive sequences such as transposable elements and pericentromeric repeats (30, 31). Biogenesis and function of these siRNAs require RDR2, DCL3, AGO4, and the plant-specific nuclear RNA polymerase IV (Pol IV) (30–36). The heterochromatin-associated siRNA pathway operates in the nucleus to guide heterochromatin formation (37, 38). (2) Trans-acting siRNAs (ta-siRNAs): Biogenesis of ta-siRNAs is initiated by miRNA-directed cleavage of noncoding RNAs known as TAS primary transcripts (39, 40). RDR6 converts the cleaved TAS RNA into dsRNA, a process that also requires SUPPRESSOR OF GENE SILENCING 3 (SGS3), a coiled-coil protein with unknown function (41, 42). DCL4 processes the dsRNA into phased 21-nt siRNA arrays (43–45). (3) Natural antisense transcript-associated siRNAs (nat-siRNAs): These are siRNAs originating from dsRNAs formed by convergent transcription of two partially overlapping genes (46, 47). The biogenesis of the primary nat-siRNAs appears to be either DCL1- or DCL2-dependent. Both the ta-siRNAs and nat-siRNAs can direct target cleavage. All classes of functional small RNAs characterized so far in plants are 3¢-methylated through the activity of HUA ENHANCER 1 (HEN1), a small RNA methylase (48, 49). The methylation may serve to stabilize small RNAs in vivo by protecting them from nucleolytic activity or other types of terminal modifications (48–50).
4
Xie
1.3. Genetic Requirements for miRNA Biogenesis
Our current knowledge on miRNA biogenesis came mainly from work performed with various miRNA-deficient mutants, a collective effort made by several leading groups. As mentioned earlier, almost all the known miRNA pathway components in plants were identified through this “reverse genetics” approach. DCL1 and HEN1 were among the first genes whose roles in miRNA biogenesis were genetically established (3, 4, 51). Both the dcl1 and the hen1 homozygous mutant plants exhibited severe developmental defects, reflecting the critical roles played by miRNAs in plant development. In fact, multiple dcl1 and hen1 alleles were recovered from genetic screens for mutants that are defective in flower development (51–55). Loss-of-function mutations in AGO1 also cause severe developmental defects (56–59). Similarly, pleiotropic developmental abnormalities have also been observed in mutants harboring loss-of-function mutations in HYPONASTIC LEAVES1 (HYL1) and SERRATE (SE), two proteins required for miRNA processing (23, 24, 60–63). HYL1 belongs to a family of nuclear dsRNA-binding proteins. While HYL1 specifically interacts with DCL1 (64), the DOUBLE-STRANDED RNA-BINDING PROTEIN 4 (DRB4), another member of the HYL1/DRB family appears to interact with DCL4 and function in the ta-siRNA pathway (65, 66), suggesting that members of this family may play distinct roles in plant small RNA biogenesis. SE, a zinc-finger protein, is known to regulate shoot meristem function and leaf polarity in Arabidopsis (62). Recent in vivo evidence shows that DCL1, HYL1, and SE colocalize in nuclear dicing bodies and function in pri-miRNA processing (22, 67). HASTY (HST), an Arabidopsis ortholog of the mammalian Exportin 5, is another protein that has been shown to play a role in plant miRNA biogenesis (39, 68–70). In wild type Arabidopsis, a majority of mature miRNAs accumulate in the cytoplasm. Loss-of-function mutations in HST reduced the accumulation of most miRNAs in the cytoplasm, suggesting its role in miRNA transport from the nucleus to the cytoplasm (69, 70). Information on some published miRNAdeficient mutant alleles is presented in Table 1. Among the miRNA-deficient mutants described above, the dcl1 and the hen1 mutants have been by far the most widely used lines in experimentally validating new miRNAs, although HEN1 also functions in siRNA biogenesis (31, 71). When working with these mutant lines, one potential problem could be the poor seed productivity of the homozygous mutants. Although in most cases a sufficient amount of seeds could still be obtained by propagating the homozygous individuals, there are cases where maintenance through the heterozygous individual is necessary. Homozygous mutants harboring either the dcl1-7 or the dcl1-9 allele (51), for example, often fail to produce any viable seeds. For this reason, a
Locus ID At1g01040 At4g20910
At1g48410
At1g09700 At3g05040 At2g27100
Gene name
DICER-LIKE 1(DCL 1)
HUA ENHANCER1(HEN1)
ARGONAUTE1
HYPONASTIC LEAVES1 (HYL1)
HASTY (HST)
SERRATE (SE)
X-rays T-DNA T-DNA
T-DNA
se-4 (SALK_059424)
Dipoxybutane T-DNA
Col-0
Col-1 Col-3 Col-0
Col-0 Col-0
(62, 63) (62) (62) (63)
(68–70) (39)
(23, 60) (24)
(56) (56) (24, 28, 56) (28, 59) (57) (58) (27) Col-0 Ws Col-0 La-er Col-0 Col-0 Col-0
T-DNA T-DNA EMS Transposon insertion EMS EMS T-DNA
La-er Col-0
(3, 52) (71) (24) (64) La-er Col-0 Col-0 Col-0
EMS EMS T-DNA T-DNA
Transposon insertion T-DNA
(51, 53) (4, 51, 54)
References
La-er La-er
Background
EMS T-DNA
Type of mutation
se-1 se-2 (SAIL_44_G12) se-3 (SALK_083196)
hst-1 hst-15 (SALK_079290)
hyl1-1 hyl1-2 (SALK_064863)
ago1-1 ago1-2 ago1-3 ago1-11and ago1-12 ago1-22 to ago1-24 ago1-25 to ago1-27 ago1-36 (SALK_087076)
hen1-1 hen1-4 hen1-5 (SALK_049197) hen1-6 (SALK_090960)
dcl1-7 dcl1-9
Representative alleles
Table 1 Selected miRNA-deficient mutants in Arabidopsis thaliana
Piecing the Puzzle Together 5
6
Xie
protocol for quick genomic DNA extraction and PCR-based genotyping is included in this chapter, taking the dcl1-7 allele (a point mutation) as an example. For genotyping of T-DNA insertion lines using allele-specific PCR, the readers are directed to a recently published protocol (72).
2. Materials 2.1. Mutant Genotyping
1. dcl1-7 seeds: available for ordering (stock number CS3089 for La-er background and CS6953 for Col-0 background) through the Arabidopsis Biological Resource Center (ABRC) or the Nottingham Arabidopsis Stock Center (NASC). The dcl1-7 allele harbors a single base change (a C to T substitution) at the position 1,429 from the ATG in the genomic DNA, resulting in a missense mutation (P415S) in the DCL1 protein (51, 53). 2. 0.1% agarose in distilled water. Sterilize and store at 4°C. 3. Genomic DNA extraction buffer: 50 mM Tris-Cl, pH 8.0, 10 mM EDTA, pH 8.0, 100 mM NaCl, 1.0% SDS, and 10 mM b-mercaptoethanol (add fresh right before use). 4. Neutralization buffer: 3 M potassium acetate, pH 4.8. 5. Isopropanol. 6. 70% ethanol. Store at −20°C. 7. 1X TE buffer: 10 mM Tris-Cl, pH 7.6, 1 mM EDTA. 8. 10X PCR buffer: 500 mM KCl, 100 mM Tris-Cl, pH 9.0, 1.0% Triton X-100, and 15 mM MgCl2. 9. dNTPs: 2.5 mM each. Store at −20°C. 10. Taq DNA polymerase. Store at −20°C. 11. Custom DNA oligonucleotides: 10 mM. Store at −20°C. For dcl1-7 genotyping: (1) Mlu I-DCL1_1219F (24mer): 5¢-GCCATCTTTGGAATGACTGACGCG-3¢ (2) DCL1_1302R (24mer): 5¢-GAGGTTACGTATCTTTATCGCACA-3¢. 12. Metaphor agarose. 13. DNA size markers: 25 or 50 bp DNA ladder (e.g., Promega catalog number G4511 and G4521, respectively). Store at −20°C. 14. 4X DNA sample loading buffer: 50% glycerol, 0.03% bromophenol blue, 50 mM Tris-Cl, pH 7.7; and 5 mM EDTA. 15. Ethidium bromide: 2 mg/mL. Wrap the bottle with aluminum foil. 16. Mlu I: 10 U/µL. Store at −20°C.
2.2. miRNA Detection by Northern Blot Assay
Piecing the Puzzle Together
7
1. 10% SDS. 2. 95% ethanol. 3. DEPC-treated water. 4. 10X TBE buffer: 900 mM Tris-borate, 20 mM EDTA, pH 8.0. 5. 30% polyacrylamide stock: acrylamide:bis acrylamide = 37.5:1. Wrap the bottle with aluminum foil and store at 4°C. 6. Urea: DNase-, RNase-, and protease-free. 7. 2% agar. 8. TEMED. Store at 4°C. 9. 10% ammonium persulfate: Make fresh. May be kept at 4°C up to 1 month. 10. 4X RNA sample loading buffer: 50% glycerol, 0.03% bromophenol blue, 50 mM Tris-Cl, pH 7.7, and 5 mM EDTA in DEPC-treated water. 11. RNA size markers (21- and 24-nt): 100 µM each. Store at −80°C. 12. Ethidium bromide stock: 2 mg/mL. Wrap the bottle with aluminum foil. 13. PerfectHyb Plus buffer: Sigma, catalog number H7033-1 L. 14. T4 Polynucleotide kinase (PNK): 10 U/µL. 15. 20X SSC: 3 M sodium chloride, 0.3 M sodium citrate, pH 7.0. 16. Custom DNA oligonucleotides for use as radiolabeled probes: 10 µM. 17. ATP (g-32P):~6,000 Ci/mmol, 10 mCi/ml.
3. Methods 3.1. Genotyping for the dcl1-7 Allele 3.1.1. Grow the Mutant Plants
1. Prepare seeds for both the mutant line and the wild type (La-er or Col-0, see Subheading 2.1, item 1) control by immersing the seeds in ~10 ml of 0.1% agarose suspension in 14 ml snap-top tubes. Store the tubes at 4°C for 2 to 3 overnights. 2. Fill a set of 3-inch pots with commercial soil mix (e.g., SunGrow Loose Mix #1; BWI catalog number WPLC1). Completely wet the soil mix with deionized water. Move the pots into a flat with holes at the bottom. If a growth chamber is being used to grow the plants, sit the flat in a second flat with the same dimensions but with no holes at the bottom.
8
Xie
3. Using a P1000 pipetter attached to a wide-bored tip, plant 5–10 mutant seeds in each pot. Make an effort during planting to ensure that the seeds are evenly distributed over the surface to facilitate the subsequent thinning. The number of pots being planted will depend on the experimental needs. To identify heterozygous individuals for seeds propagation, 20 pots will yield an adequate number of the desired genotype. 4. Plant the wild type control in the same way, except that 3–5 seeds may be planted at the center of the pots. 5. Cover each flat with a matching plastic dome to maintain the humidity. Move the flats to a growth chamber or a greenhouse room. A setting with 16 h light (24°C)/8 h dark (22°C) should work fine. 6. Remove the dome shortly after germination. 7. When the seedlings are big enough to allow differentiation between the wild type phenotype and the homozygous mutant phenotypes, thin the seedlings in the mutant flat down to 3–4 wild type-looking plants per pots. Two to three weeks after planting are usually required before the homozygous mutant phenotype can be reliably recognized. 3.1.2. Genomic DNA Extraction
1. From each individual plant to be genotyped, collect 2–3 rosette leaves or flower clusters in a microcentrifuge tube. Samples from the wild type and homozygous mutant plants should also be collected to serve as controls. 2. Add 375 mL of extraction buffer to each tube. 3. Using a plastic Pellet Pestle (VWR; catalog number KT749521-1500) attached to a power drill, homogenize the tissue by several short grindings. 4. Add another 375 mL of extraction buffer to each tube. Mix by brief vortex. 5. Heat the sample tubes in a 65°C water bath for 10 min. 6. Add 150 mL of neutralization buffer and mix. Incubate the tubes on ice for 20 min. 7. Pellet the cell debris by spinning at 14 k rpm for 5 min on a bench top microcentrifuge. 8. Transfer ~700 mL of the aqueous phase into a fresh tube. Add an equal volume of isopropanol and mix. 9. Centrifuge at 14 k rpm for 2 min to pellet the DNA. 10. Carefully pour off the supernatant and wash the pellet with 500 mL of ice-cold 70% ethanol by centrifuge at 14 k rpm for 2 min. 11. Air-dry the pellet by sitting the tubes upside down on a test tube rack for about 15 min. 12. Resuspend the pellet in 30 ~ 50 mL of 1X TE. Store at 4°C.
3.1.3. PCR and Restriction Analysis
Piecing the Puzzle Together
9
The derived cleaved amplified polymorphic sequence (dCAPS) method that is widely used for PCR-based detection of single nucleotide polymorphisms in plants (73) is adopted to detect the dcl1-7 mutation. Briefly, mismatches are introduced into one of the two DCL1-specific primers (Mlu I-DCL1_1219F) such that a Mlu I site is created if the flanking nucleotide is a “T” (in the mutant allele) instead of a “C” (in the wild type allele). Upon Mlu I digestion, the non-cleavable PCR fragments and the Mlu I-cleaved PCR fragments could be resolved by electrophoresis on a 4% metaphor agarose gel. 1. Assemble 25 mL PCR reactions in 200 mL tubes by mixing the following components: –– 10X PCR buffer 2.5 mL –– 2.5 mM dNTPs 2.0 mL –– Primer 1(Mlu I-DCL1_1219F; 10 mM) 1.0 mL –– Primer 2 (DCL1_1302R; 10 mM) 1.0 mL –– Distilled water
16.5 mL
–– Taq DNA polymerase (1 U/mL) 1.0 mL –– Genomic DNA 1.0 mL 2. Run PCR for 25 cycles using the following program: –– Initial denaturation at 96°C for 1 min –– Denaturation at 95°C for 1 min –– Annealing at 58°C for 45 s –– Extension at 72°C for 20 s –– Repeat the cycle 24 more times –– Final extension at 72°C for 5 min –– Hold at 4°C 3. Prepare a 4% metaphor agarose gel. Run a 3–5 mL PCR reaction to confirm successful amplification. A 25 or 50 bp DNA ladder should also be loaded to serve as size markers. A successful amplification should yield a DNA fragment of approximately160 bp in length (Fig. 1a). 4. Digest the PCR products with Mlu I in 20 mL reactions by mixing the following components in microcentrifuge tubes, and incubate in a 37°C water bath for 1 h. –– 10X Reaction buffer 2.0 mL –– Distilled water 7.7 mL –– PCR products 10.0 mL –– Mlu I (10 U/mL) 0.3 mL 5. Prepare a second 4% metaphor agarose gel. Analyze 5–8 mL of the digested PCR reaction on the gel. Mlu I will cut a 20 bpfragment off from the PCR products amplified from the
10
Xie
a
M(bp) 200 175 150 100 75 50 1
b
2
3
4
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
6
7
8
9
10
11
12
13
14
15
16
17
18
19
M(bp) 200 175 150 100 75 1
2
5
18
19
Fig. 1. Genotyping of Arabidopsis dcl1-7 mutant using the dCAPS method. (a) The 162 bp DNA fragments amplified from genomic DNA prepared from individual dcl1-7 mutant plants (lanes 3 ~ 9 and 11~17), a wild type control (lane 18), and a homozygous mutant plant (lane 2) were analyzed on a 4% metaphor agarose gel. (b) The same set of PCR products was analyzed on a 4% metaphor agarose gel after digestion with MluI. Notice that lanes 3, 4, 6, 7, and 8 represent the PCR products amplified from heterozygous dcl1-7 mutant plants while lane 5 represents a segregating wild type. A 25 bp DNA ladder (lanes 1, 10, and 19) was used as a size marker on both gels.
mutant, but not the wild type allele. Samples derived from heterozygous mutant plants are therefore expected to display a doublet consisting of a ~140 bp band and a ~160 bp band, respectively, while samples derived from the wild type plant or a homozygous mutant plant shall display a single band of ~160 and ~140 bp in length, respectively (Fig. 1b). 3.2. Detection of miRNAs by Northern Blot Analysis
1. Clean a set of glass plates, spacers, and comb with 10% SDS. Rinse thoroughly with deionized water. Wipe clean with 95% ethanol for quick drying.
3.2.1. Resolve Total RNA by Denaturing Polyacrylamide Gel Electrophoresis (PAGE)
2. Prepare 17% polyacrylamide gel with 7 M urea in 0.5X TBE by mixing the following components in a clean 250 ml Erlenmeyer flask: –– DEPC-treated water 2.0 ml –– 10X TBE 1.5 ml –– 30% polyacrylamide stock 17.0 ml –– Urea 12.6 g 3. Mix the contents by slowly swirling the flask a few times and incubate in a 37°C water bath to dissolve the urea.
Piecing the Puzzle Together
11
4. Assemble the pre-cleaned plates and spacers using a set of 6–8 binder clips. Seal the plates with a thin layer of 2% agar. 5. When the urea is completely dissolved, add 15 µL of TEMED and 125 µL of freshly made 10% ammonium persulfate. Mix the contents immediately by swirling the flask a few times. Cast the gel and insert the comb immediately. Allow the gel to polymerize for at least 30 min. 6. When the gel is fully polymerized, pull the comb out slowly and very carefully to avoid broken wells. Rinse out the wells with 0.5X TBE using a 10 ml syringe with a needle attached. 7. Assemble the electrophoresis apparatus and pre-run the gel at 180 V in 0.5X TBE buffer for at least 20 min. Rinse out the wells again right before loading the samples. 8. Prepare the RNA samples by mixing 5–20 µg of total RNA (see Note 1) with the 4X RNA sample loading buffer in microcentrifuge tubes. Keep the final sample volume below 25 µL for best resolution. 9. Prepare the RNA size markers (see Note 2) in the same way as for sample preparation. 10. Heat the samples at 65°C for 5 min and immediately cool on ice for 2 min. Spin briefly to collect the sample in the bottom of the tube. 11. Load the samples and size markers with gel loading tips so that the samples form a flat band at the very bottom of the well. 12. Run the gel at 180 V in 0.5X TBE buffer until the bromophenol blue dye reaches the bottom of the gel. 3.2.2. Transfer RNA from the Polyacrylamide Gel to Nylon Membrane
1. Disassemble the electrophoresis apparatus and carefully separate the gel slab from the glass plates. Cut off a small triangle from the upper-right corner of the gel to mark the orientation. Stain the gel in 0.5X TBE containing ethidium bromide for 3–5 min. 2. Get the Bio-Rad semi-dry transfer unit ready for later use. Presoak one piece of nylon membrane (see Note 3) and two pieces of Bio-Rad extra-thick blot paper in 0.5X TBE. 3. Destain the gel briefly in 0.5X TBE to remove the excessive ethidium bromide on the gel surface. 4. Lay a fresh piece of plastic wrap over the UV transilluminator of a gel imager. Lay the gel on the plastic wrap. Take a picture of the entire gel under UV. The 5 S rRNA and tRNAs will show up as bright bands near the top of the gel. The RNA size markers should also be visible.
12
Xie
5. Assemble the gel transfer sandwich on the Bio-Rad semi-dry transfer unit. The order should be (from the bottom to top): extra-thick blot paper, nylon membrane, the gel, the second piece of extra-thick blot paper, and the cathode plate. Cut off a small triangle from the upper-right corner of the membrane to mark the orientation. Remove air bubbles that may have been trapped between the gel and the membrane. This can be done by carefully lifting one side of the gel, and then slowly laying it back onto the membrane. 6. Transfer for 1 h under constant current (400 mAmps). 7. Disassemble the gel transfer sandwich. Air-dry the membrane on a piece of clean Whatman paper. 8. Check the gel under UV to confirm a successful transfer. 9. Crosslink the membrane twice in a UV crosslinker under the “auto” setting. 10. Under a portable UV light, mark the position of the RNA size markers on the membrane with a pencil. 3.2.3. Probing the Membrane with RadioLabeled Oligonucleotide Probe
1. Place the membrane in a hybridization bottle, with the sample side facing the center of the bottle. Prehybridize the membrane in 5 ml PerfectHyb Plus buffer (Sigma) at 38°C for 30 min or longer with rotation at a slow mode. 2. End-label an oligonucleotide probe using T4 PNK (see Note 4). Assemble the following labeling reaction in a microcentrifuge tube: –– Distilled water 15.5 µL –– 10X PNK buffer 2.5 µL –– DNA oligonucleotide (10 µM; see Note 5) 1.0 µL –– ATP (g-32P) (~6,000 Ci/mmol; 10 mCi/ml) 5.0 µL –– T4 PNK (10 U/µL) 1.0 µL 3. Incubate the tube at 37°C for 30 min. Inactivate the kinase by incubating at 65°C for 10 min. 4. Purify the probe by passing the reaction through a Bio-Rad P6 spin column to remove the unincorporated ATP (g-32P) (see Note 6). 5. Heat the purified probe to 70ºC for 2 min, then cool on ice for 2 min. Add the probe to the prehybridization buffer. Allow the hybridization to continue overnight (8 ~ 12 h) at 38°C with rotation at a slow mode. 6. Pour off the hybridization buffer into a radioactive liquid waste container. Wash the membrane at 50ºC with rotation at fast mode with preheated buffers: (1) once with 2X SSC, 0.2% SDS for 20 min; (2) once with 1X SSC, 0.1% SDS for 20 min; (3) once with 0.5X SSC, 0.1% SDS for 20 min; (4) once with 0.1X SSC, 0.1% SDS at 50°C for 20 min.
Piecing the Puzzle Together
13
7. Wrap the membrane with plastic wrap. 8. Expose the membrane to X-ray film in an autoradiograph cassette at −80°C with two intensifying screens (see Note 7). Tape the wrapped membrane onto the bottom piece of the intensifying screen to facilitate the alignment between the membrane and the film afterwards. Mark the orientation of the film by cutting off a small triangle from the upper-right corner. The optimal exposure time may vary from a few hours to several overnights, depending on the abundance of the target miRNA as well as the type of the probe.
4. Notes 1. Clean, high quality total RNA extracts can be directly used without further fractionation for low-molecular-weight (LMW) RNAs for miRNA detection by northern blot assays. In the past, the column from the Qiagen RNA/DNA midi kit (catalog number 14142) was used to separate the high-molecularweight (HMW) and LMW RNAs by a stepwise elution procedure, followed by isopropanol precipitation. The recovery of small RNAs through this procedure turned out to be suboptimal and inefficient. However, the column purification step does significantly improve the quality of RNA, especially for samples prepared from polysaccharide-rich tissues. It is therefore still recommended to purify the total RNA extracts using either the Qiagen RNA/DNA midi kit or other alternative methods, but fractionation of LMW RNA is generally unnecessary. 2. RNA size markers serve as a convenient reference for data interpretation. An equal molar mix of 21- and 24-nt synthetic, 5¢ phosphorylated RNA oligonucleotides with a non-plant sequence (e.g., GFP) works well for this purpose. Loading of 250–500 pmol each is usually sufficient for visualizing the marker on the membrane following the gel transfer. 3. For maximum sensitivity, the use of a charged nylon membrane (e.g., Nytran SuPerCharge Membranes by Whatman, formerly Schleider & Schuell; VWR catalog number 28151318) is recommended. 4. The T4 PNK has an intrinsic bias against the efficient labeling of certain oligonucleotides, particularly those with 5¢-C ends. Labeling of such oligonucleotides with T4 PNK could therefore encounter low labeling efficiency. The OptiKinase from USB (catalog number 78334X), which is a modified version of T4 PNK and exhibits little or no base discrimination, could be the choice for solving this problem.
14
Xie
5. For detection of miRNAs with very low abundance, use of oligonucleotides with the locked nucleic acid (LNA) modification instead of regular DNA oligonucleotides could significantly improve the sensitivity of detection (74). 6. The incorporation of the radioactive 32P into the oligonucleotides can be experimentally measured. Take out 0.5 µL of the labeling reaction and make a 1:50 dilution with 1X TE. On each of the two Whatman DE81 circular filter papers (catalog number 3658-325), spot a small aliquot (e.g., 2 µL) of the diluted labeling reaction. Air-dry the filter papers. Set one of the circles aside and wash the other circle twice for 5 min each with 50 ml 0.5 M Na2HPO4 (pH6.8) in a glass beaker with gentle shaking. A final wash in 95% ethanol may be followed to facilitate the air-drying of the filter paper. Transfer each of the filter papers to a scintillation vial containing 5 ml of scintillation cocktail (e.g., Fisher Scientific catalog number SX18-4). Measure the radioactivity retained on each filter paper using a liquid scintillation counter (LSC). The incorporation efficiency can be calculated as follows:
% incorporation = éë(incorporated cpm ) / (total cpm )ùû ´ 100 é(cpm from washed filter ) / ù =ê ú ´ 100 ëê(cpm from unwashed filter )ûú An incorporation efficiency of 50% or higher is often achieved (see Note 4). 7. If possible, check the radioactive signal on a phosphor imager before exposing the blot to film. This serves as a preview to ensure that a satisfactory signal/noise ratio has been achieved. In case of high background or unknown “hot spots”, an extended final wash can be done before exposing the membrane to film.
Acknowledgments I thank Chris Rock for critically reading the manuscript.
References 1. Llave C, Kasschau KD, Rector MA, Carrington JC (2002) Endogenous and silencing-associated small RNAs in plants. Plant Cell 14: 1605– 1619
2. Mette MF, van der Winden J, Matzke M, Matzke AJ (2002) Short RNAs can identify new candidate transposable element families in Arabidopsis. Plant Physiol 130:6–9
3. Park W, Li J, Song R, Messing J, Chen X (2002) CARPEL FACTORY, a Dicer homolog, and HEN1, a novel protein, act in microRNA metabolism in Arabidopsis thaliana. Curr Biol 12:1484–1495 4. Reinhart BJ, Weinstein EG, Rhoades MW, Bartel B, Bartel DP (2002) MicroRNAs in plants. Genes Dev 16:1616–1626 5. Hamilton AJ, Baulcombe DC (1999) A species of small antisense RNA in posttranscriptional gene silencing in plants. Science 286:950–952 6. Lagos-Quintana M, Rauhut R, Lendeckel W, Tuschl T (2001) Identification of novel genes coding for small expressed RNAs. Science 294:853–858 7. Lau NC, Lim EP, Weinstein EG, Bartel DP (2001) An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science 294:858–862 8. Lee RC, Ambros V (2001) An extensive class of small RNAs in Caenorhabditis elegans. Science 294:862–864 9. Baulcombe D (2004) RNA silencing in plants. Nature 431:356–363 10. Chapman EJ, Carrington JC (2007) Specialization and evolution of endogenous small RNA pathways. Nat Rev Genet 8:884–896 11. Jones-Rhoades MW, Bartel DP, Bartel B (2006) microRNAs and their regulatory roles in plants. Annu Rev Plant Biol 57:19–53 12. Vaucheret H (2006) Post-transcriptional small RNA pathways in plants: mechanisms and regulations. Genes Dev 20:759–771 13. Axtell MJ, Snyder JA, Bartel DP (2007) Common functions for diverse small RNAs of land plants. Plant Cell 19(6):1750–1769 14. Fahlgren N, Howell MD, Kasschau KD, Chapman EJ, Sullivan CM, Cumbie JS, Givan SA, Law TF, Grant SR, Dangl JL, Carrington JC (2007) High-throughput sequencing of Arabidopsis microRNAs: evidence for frequent birth and death of MIRNA genes. PLoS ONE 2:e219 15. Henderson IR, Zhang X, Lu C, Johnson L, Meyers BC, Green PJ, Jacobsen SE (2006) Dissecting Arabidopsis thaliana DICER function in small RNA processing, gene silencing and DNA methylation patterning. Nat Genet 38:721–725 16. Lu C, Tej SS, Luo S, Haudenschild CD, Meyers BC, Green PJ (2005) Elucidation of the small RNA component of the transcriptome. Science 309:1567–1569 17. Rajagopalan R, Vaucheret H, Trejo J, Bartel DP (2006) A diverse and evolutionarily fluid set of microRNAs in Arabidopsis thaliana. Genes Dev 20:3407–3425
Piecing the Puzzle Together
15
18. Xie Z, Qi X (2008) Diverse small RNA-directed silencing pathways in plants. Biochim Biophys Acta – Gene Regulatory Mechanisms 1779 (11):720–724 19. Kurihara Y, Watanabe Y (2004) Arabidopsis micro-RNA biogenesis through Dicer-like 1 protein functions. Proc Natl Acad Sci USA 101:12753–12758 20. Parizotto EA, Dunoyer P, Rahm N, Himber C, Voinnet O (2004) In vivo investigation of the transcription, processing, endonucleolytic activity, and functional relevance of the spatial distribution of a plant miRNA. Genes Dev 18:2237–2242 21. Xie Z, Allen E, Fahlgren N, Calamar A, Givan SA, Carrington JC (2005) Expression of Arabidopsis MIRNA genes. Plant Physiol 138: 2145–2154 22. Fang Y, Spector DL (2007) Identification of nuclear dicing bodies containing proteins for microRNA biogenesis in living Arabidopsis plants. Curr Biol 17:818–823 23. Han MH, Goud S, Song L, Fedoroff N (2004) The Arabidopsis double-stranded RNAbinding protein HYL1 plays a role in microRNA-mediated gene regulation. Proc Natl Acad Sci USA 101:1093–1098 24. Vazquez F, Gasciolli V, Crete P, Vaucheret H (2004) The nuclear dsRNA binding protein HYL1 is required for microRNA accumulation and plant development, but not posttranscriptional transgene silencing. Curr Biol 14:346–351 25. Khvorova A, Reynolds A, Jayasena SD (2003) Functional siRNAs and miRNAs exhibit strand bias. Cell 115:209–216 26. Schwarz DS, Hutvagner G, Du T, Xu Z, Aronin N, Zamore PD (2003) Asymmetry in the assembly of the RNAi enzyme complex. Cell 115:199–208 27. Baumberger N, Baulcombe DC (2005) Arabidopsis ARGONAUTE1 is an RNA Slicer that selectively recruits microRNAs and short interfering RNAs. Proc Natl Acad Sci USA 102:11928–11933 28. Qi Y, Denli AM, Hannon GJ (2005) Biochemical specialization within Arabidopsis RNA silencing pathways. Mol Cell 19:421–428 29. Vaucheret H, Mallory AC, Bartel DP (2006) AGO1 homeostasis entails coexpression of MIR168 and AGO1 and preferential stabilization of miR168 by AGO1. Mol Cell 22: 129–136 30. Hamilton A, Voinnet O, Chappell L, Baulcombe D (2002) Two classes of short interfering RNA in RNA silencing. Embo J 21:4671– 4679
16
Xie
31. Xie Z, Johansen LK, Gustafson AM, Kasschau KD, Lellis AD, Zilberman D, Jacobsen SE, Carrington JC (2004) Genetic and functional diversification of small RNA pathways in plants. PLoS Biol 2:E104 32. Herr AJ, Jensen MB, Dalmay T, Baulcombe DC (2005) RNA polymerase IV directs silencing of endogenous DNA. Science 308:118–120 33. Onodera Y, Haag JR, Ream T, Nunes PC, Pontes O, Pikaard CS (2005) Plant nuclear RNA polymerase IV mediates siRNA and DNA methylation-dependent heterochromatin formation. Cell 120:613–622 34. Pontier D, Yahubyan G, Vega D, Bulski A, Saez-Vasquez J, Hakimi MA, Lerbs-Mache S, Colot V, Lagrange T (2005) Reinforcement of silencing at transposons and highly repeated sequences requires the concerted action of two distinct RNA polymerases IV in Arabidopsis. Genes Dev 19:2030–2040 35. Zhang X, Henderson IR, Lu C, Green PJ, Jacobsen SE (2007) Role of RNA polymerase IV in plant small RNA metabolism. Proc Natl Acad Sci USA 104:4536–4541 36. Zilberman D, Cao X, Jacobsen SE (2003) ARGONAUTE4 control of locus-specific siRNA accumulation and DNA and histone methylation. Science 299:716–719 37. Li CF, Pontes O, El-Shami M, Henderson IR, Bernatavichute YV, Chan SW, Lagrange T, Pikaard CS, Jacobsen SE (2006) An ARGONAUTE4-containing nuclear processing center colocalized with Cajal bodies in Arabidopsis thaliana. Cell 126:93–106 38. Pontes O, Li CF, Nunes PC, Haag J, Ream T, Vitins A, Jacobsen SE, Pikaard CS (2006) The Arabidopsis chromatin-modifying nuclear siRNA pathway involves a nucleolar RNA processing center. Cell 126:79–92 39. Allen E, Xie Z, Gustafson AM, Carrington JC (2005) microRNA-directed phasing during trans-acting siRNA biogenesis in plants. Cell 121:207–221 40. Axtell MJ, Jan C, Rajagopalan R, Bartel DP (2006) A two-hit trigger for siRNA biogenesis in plants. Cell 127:565–577 41. Peragine A, Yoshikawa M, Wu G, Albrecht HL, Poethig RS (2004) SGS3 and SGS2/SDE1/ RDR6 are required for juvenile development and the production of trans-acting siRNAs in Arabidopsis. Genes Dev 18:2368–2379 42. Vazquez F, Vaucheret H, Rajagopalan R, Lepers C, Gasciolli V, Mallory AC, Hilbert JL, Bartel DP, Crete P (2004) Endogenous transacting siRNAs regulate the accumulation of Arabidopsis mRNAs. Mol Cell 16:69–79 43. Gasciolli V, Mallory AC, Bartel DP, Vaucheret H (2005) Partially redundant functions of
Arabidopsis DICER-like enzymes and a role for DCL4 in producing trans-acting siRNAs. Curr Biol 15:1494–1500 44. Xie Z, Allen E, Wilken A, Carrington JC (2005) DICER-LIKE 4 functions in trans-acting small interfering RNA biogenesis and vegetative phase change in Arabidopsis thaliana. Proc Natl Acad Sci USA 102:12984–12989 45. Yoshikawa M, Peragine A, Park MY, Poethig RS (2005) A pathway for the biogenesis of trans-acting siRNAs in Arabidopsis. Genes Dev 19:2164–2175 46. Borsani O, Zhu J, Verslues PE, Sunkar R, Zhu JK (2005) Endogenous siRNAs derived from a pair of natural cis-antisense transcripts regulate salt tolerance in Arabidopsis. Cell 123: 1279–1291 47. Katiyar-Agarwal S, Morgan R, Dahlbeck D, Borsani O, Villegas A Jr, Zhu JK, Staskawicz BJ, Jin H (2006) A pathogen-inducible endogenous siRNA in plant immunity. Proc Natl Acad Sci USA 103:18002–18007 48. Yang Z, Ebright YW, Yu B, Chen X (2006) HEN1 recognizes 21–24 nt small RNA duplexes and deposits a methyl group onto the 2¢ OH of the 3¢ terminal nucleotide. Nucleic Acids Res 34:667–675 49. Yu B, Yang Z, Li J, Minakhina S, Yang M, Padgett RW, Steward R, Chen X (2005) Methylation as a crucial step in plant microRNA biogenesis. Science 307:932–935 50. Li J, Yang Z, Yu B, Liu J, Chen X (2005) Methylation protects miRNAs and siRNAs from a 3¢-end uridylation activity in Arabidopsis. Curr Biol 15:1501–1507 51. Schauer SE, Jacobsen SE, Meinke DW, Ray A (2002) DICER-LIKE1: blind men and elephants in Arabidopsis development. Trends Plant Sci 7:487–491 52. Chen X, Liu J, Cheng Y, Jia D (2002) HEN1 functions pleiotropically in Arabidopsis development and acts in C function in the flower. Development 129:1085–1094 53. Golden TA, Schauer SE, Lang JD, Pien S, Mushegian AR, Grossniklaus U, Meinke DW, Ray A (2002) SHORT INTEGUMENTS1/ SUSPENSOR1/CARPEL FACTORY, a Dicer homolog, is a maternal effect gene required for embryo development in Arabidopsis. Plant Physiol 130:808–822 54. Jacobsen SE, Running MP, Meyerowitz EM (1999) Disruption of an RNA helicase/RNAse III gene in Arabidopsis causes unregulated cell division in floral meristems. Development 126:5231–5243 55. Ray A, Lang JD, Golden T, Ray S (1996) SHORT INTEGUMENTS1 (SIN1), a gene required for ovule development in Arabidopsis,
also controls flowering time. Development 122:2631–2638 56. Bohmert K, Camus I, Bellini C, Bouchez D, Caboche M, Benning C (1998) AGO1 defines a novel locus of Arabidopsis controlling leaf development. Embo J 17:170–180 57. Fagard M, Boutet S, Morel JB, Bellini C, Vaucheret H (2000) AGO1, QDE-2, and RDE-1 are related proteins required for posttranscriptional gene silencing in plants, quelling in fungi, and RNA interference in animals. Proc Natl Acad Sci USA 97:11650–11654 58. Morel JB, Godon C, Mourrain P, Beclin C, Boutet S, Feuerbach F, Proux F, Vaucheret H (2002) Fertile hypomorphic ARGONAUTE (ago1) mutants impaired in post-transcriptional gene silencing and virus resistance. Plant Cell 14:629–639 59. Kidner CA, Martienssen RA (2004) Spatially restricted microRNA directs leaf polarity through ARGONAUTE1. Nature 428:81–84 60. Lu C, Fedoroff N (2000) A mutation in the Arabidopsis HYL1 gene encoding a dsRNA binding protein affects responses to abscisic acid, auxin, and cytokinin. Plant Cell 12: 2351–2366 61. Yang L, Liu Z, Lu F, Dong A, Huang H (2006) SERRATE is a novel nuclear regulator in primary microRNA processing in Arabidopsis. Plant J 47:841–850 62. Grigg SP, Canales C, Hay A, Tsiantis M (2005) SERRATE coordinates shoot meristem function and leaf axial patterning in Arabidopsis. Nature 437:1022–1026 63. Lobbes D, Rallapalli G, Schmidt DD, Martin C, Clarke J (2006) SERRATE: a new player on the plant microRNA scene. EMBO Rep 7:1052–1058 64. Kurihara Y, Takashi Y, Watanabe Y (2006) The interaction between DCL1 and HYL1 is important for efficient and precise processing of pri-miRNA in plant microRNA biogenesis. RNA 12:206–212 65. Adenot X, Elmayan T, Lauressergues D, Boutet S, Bouche N, Gasciolli V, Vaucheret H (2006) DRB4-dependent TAS3 trans-acting
Piecing the Puzzle Together
17
siRNAs control leaf morphology through AGO7. Curr Biol 16:927–932 66. Hiraguri A, Itoh R, Kondo N, Nomura Y, Aizawa D, Murai Y, Koiwa H, Seki M, Shinozaki K, Fukuhara T (2005) Specific interactions between Dicer-like proteins and HYL1/DRBfamily dsRNA-binding proteins in Arabidopsis thaliana. Plant Mol Biol 57:173–188 67. Song L, Han MH, Lesicka J, Fedoroff N (2007) Arabidopsis primary microRNA processing proteins HYL1 and DCL1 define a nuclear body distinct from the Cajal body. Proc Natl Acad Sci USA 104:5437–5442 68. Telfer A, Poethig RS (1998) HASTY: a gene that regulates the timing of shoot maturation in Arabidopsis thaliana. Development 125: 1889–1898 69. Bollman KM, Aukerman MJ, Park MY, Hunter C, Berardini TZ, Poethig RS (2003) HASTY, the Arabidopsis ortholog of exportin 5/MSN5, regulates phase change and morphogenesis. Development 130:1493–1504 70. Park MY, Wu G, Gonzalez-Sulser A, Vaucheret H, Poethig RS (2005) Nuclear processing and export of microRNAs in Arabidopsis. Proc Natl Acad Sci USA 102:3691–3696 71. Boutet S, Vazquez F, Liu J, Beclin C, Fagard M, Gratias A, Morel JB, Crete P, Chen X, Vaucheret H (2003) Arabidopsis HEN1: a genetic link between endogenous miRNA controlling development and siRNA controlling transgene silencing and virus resistance. Curr Biol 13:843–848 72. Stepanova AN, Alonso JM (2006) PCR-based screening for insertional mutants, in: Arabidopsis Protocols (Salinas J and Sanchez-Serrano J, eds) Humana Press, Totowa, NJ. pp. 163–172 73. Neff MM, Neff JD, Chory J, Pepper AE (1998) dCAPS, a simple technique for the genetic analysis of single nucleotide polymorphisms: experimental applications in Arabidopsis thaliana genetics. Plant J 14:387–392 74. Valoczi A, Hornyik C, Varga N, Burgyan J, Kauppinen S, Havelda Z (2004) Sensitive and specific detection of microRNAs by northern blot analysis using LNA-modified oligonucleotide probes. Nucleic Acids Res 32:e175
Chapter 2 Prediction of Plant miRNA Genes Matthew W. Jones-Rhoades Abstract This chapter presents procedures for the computational identification of plant miRNA genes. In the first procedure, homologs of known miRNAs are identified in a database of genomic or cDNA sequence. In the second procedure, previously unidentified miRNA families are predicted through the analysis of secondary structure, evolutionary conservation, and targeting potential. Key words: Gene prediction, microRNA discovery, Comparative genomics
1. Introduction MicroRNAs are short, non-coding, endogenously expressed RNAs that are processed from longer hairpin precursors (see Subheading 2.1, (1) for review). Historically, most plant miRNA genes have been discovered by one of two methods: the molecular cloning of small RNAs (2–8), and computational prediction of miRNA genes based on the conservation of sequence and secondary structure (9–14). While molecular cloning is the most direct way of discovering miRNAs, bioinformatic approaches have provided a useful complement to cloning experiments. For example, as new miRNAs are discovered through molecular cloning, computational approaches can identify homologous miRNAs in databases of genomic or cDNA sequences, thereby establishing the copy number and the range of evolutionary conservation of the miRNAs. Similarly, the accurate identification of known miRNA families in the growing body of genomic and cDNA sequence is critical for the accurate and thorough annotation of gene content in these databases. B.C. Meyers and P.J. Green (eds.), Plant MicroRNAs, Methods in Molecular Biology, vol. 592 DOI 10.1007/978-1-60327-005-2_2, © Humana Press, a part of Springer Science + Business Media, LLC 2010
19
20
Jones-Rhoades
Computational approaches can also complement cloning experi ments by identifying miRNA families that are difficult to clone due to low or highly specific expression. For example, the miR395 and miR399 families, both of which were initially identified (9, 11), are difficult to detect experimentally in Arabidopsis under standard growth conditions, due to low expression levels, but are easily detectable in plants starved for sulfate or phosphate, respectively (11, 15, 16). The changes from the original are (a) replace “through comparative genomics” with “compu tationally” (affer “were initially identified”) (b) insertion of “experimentally” after “to detect” (c) insertion of “due to low expression levels” “after growth conditions”. While the development of high-throughput sequencing technologies have vastly improved the sensitivity of molecular cloning experiments (3, 5, 17, 18), it is still difficult to ensure that all possible tissues, developmental stages, and growth conditions are represented in cloned libraries of small RNAs. Therefore, computational methods will remain useful in the discovery of plant miRNAs. 1.1. Identification of Plant miRNA Homologs
Homologs of a known miRNA can be predicted by identifying genomic regions that have (a) sequence similarity to the known miRNA and (b) the potential to form miRNA hairpin precursors. Because the loop regions of miRNA hairpins can be highly divergent between related miRNAs, the most sensitive means of finding potential homologs is to identify near matches to the mature miRNA. The genomic regions containing these candidates are then screened for the ability to fold into hairpin structures that resemble miRNA precursors. While this procedure is relatively straightforward, care must be taken to avoid the spurious mis-annotation of non-miRNA sequences as miRNAs (see below).
1.2. De Novo Identification of Conserved Plant miRNAs
A more difficult problem is to predict miRNA genes that are unrelated to any miRNA with previous experimental evidence. Typically the first step in such an approach is to typically identify genomic regions with the potential to form hairpin secondary structures with properties similar to known miRNA precursors. However, a typical genome contains thousands of regions with predicted secondary structures that resemble miRNA precursors (11), of which only a low percentage are likely to represent actual miRNA genes. Therefore, it is necessary to use additional criteria to identify the genomic hairpins with a high probability of encoding actual miRNAs. While several groups have addressed this problem in plants (9–11, 13), all have used the evolutionary conservation of stem-loops and the ability to base pair to potential target RNAs as a means of reducing the number of false positive hairpins under consideration.
Prediction of Plant miRNA Genes
21
This chapter describes a procedure that identified 18 of the 20 miRNA families that are conserved between Arabidopsis and rice (11). Because future analyses will probably involve different genomes that are separated by different evolutionary distances, I have attempted to highlight the ways in which this approach could be adapted to other comparisons. 1.3. miRcheck and Supporting Scripts
A central step in the prediction of plant miRNA genes is the analysis of secondary structures to identify candidate hairpin precursors. Plant miRNA hairpins are diverse in terms of the length and extent of base pairing in the loop region (defined here as the sequence connecting the miRNA and miRNA*). However, all well-supported miRNA stem-loops are predicted to have extensive base pairing in the region containing the miRNA and miRNA*. If only a moderate number of putative miRNA loci are to be analyzed, it may be possible to manually evaluate each secondary structure. However, for a genome-wide search involving hundreds or thousands of candidate miRNAs, it is desirable to use an automated method to evaluate the hairpin structures. The miRcheck algorithm has been used to evaluate the miRNA encoding potential of whole genomes (11), to identify miRNA homologs in poplar (12), and to evaluate the miRNA-encoding potential of cloned small RNAs (5, 19). miRcheck compares the local secondary structure of the miRNA/miRNA* against a series of parameters; a putative miRNA will “pass” miRcheck if cutoff values are met for all parameters. The default cutoffs of miRcheck have exhibited a good balance of selectivity and sensitivity in a number of different applications. Although miRcheck was designed based on the characteristics of 11 miRNA families conserved between Arabidopsis and rice, 94% of plant miRNA loci (miRbase 10.0) that are supported by conservation or deep sequencing pass the default parameters. Importantly, the cutoff values for each miRcheck parameter can be adjusted as for the requirements and goals of an analysis dictate. The features analyzed by miRcheck are: 1. A consistent orientation of base pairing within the putative miRNA (i.e., all nucleotides pairing to the miRNA must either all be 5¢ of their partner, or all be 3¢ of their partner). 2. A stem-loop length of at least 54 nucleotides (including putative miRNA and miRNA*). This requirement discriminates against cases in which the putative miRNA and miRNA* are separated by a very short loop, a pattern not observed for well-documented plant miRNA precursors. 3. A maximum of six unpaired nucleotides in the putative miRNA, of which no more than three are consecutive. In addition, no more than two nucleotides within the putative miRNA may be asymmetrically unpaired (i.e., only two
22
Jones-Rhoades
unpaired nucleotides that lack corresponding unpaired nucleotides in the miRNA* are allowed). 4. A maximum of six unpaired nucleotides in the putative miRNA*, of which no more than three are consecutive. 5. The extension of base pairing at least two nucleotides outside of the putative miRNA. This is defined as the existence of an “extended miRNA” (and corresponding “extended miRNA*”) that contains the putative miRNA and which meets all of the above pairing requirements. 6. The extended miRNA* must be no more than three nucleotides longer than the extended miRNA. This requirement discriminates against long bulges in the miRNA*. 7. At least one nucleotide within the extended miRNA or miRNA* must be unpaired. This requirement is designed to discriminate against long, perfect inverted repeats that occur in plant genomes. In addition to miRcheck itself (implemented as a perl module), I have also made available supporting perl scripts that are used in the methods in this chapter. In many cases, these scripts are wrappers that parse an input file, feed the data to another algorithm (such as miRcheck, patscan, or RNAfold), and parse the output into a convenient format. 1.4. Maximizing Sensitivity and Selectivity in miRNA Prediction
In any bioinformatic gene prediction strategy, it is desirable to minimize the number of false negatives (i.e., to correctly identify as many real genes as possible) while also minimizing the number of false positives. In most cases, the investigator must decide how to balance these sometimes opposing goals. In the procedures below, I present methods and cutoffs that I have found to give a good balance of sensitivity and specificity in a variety of different situations. The ideal thresholds and methods for any particular analysis will be dictated by the nature of the search and the goals of the analysis (i.e., exhaustive list of genes vs. highly accurate list of genes). Due diligence and skepticism on the part of the investigator, including the careful consideration of the possibility of false positives, is paramount to achieving the meaningful and useful prediction of miRNA genes. Without careful controls, there is a very real danger of muddling the literature with spuriously annotated sequences that are not authentic miRNA genes.
2. Materials 2.1. Identification of Plant miRNA Homologs
1. A computer with a Unix/Linux/OSX operating system. 2. A file of miRNA sequences in Fasta format. 3. A file of genomic/cDNA sequences in Fasta format.
Prediction of Plant miRNA Genes
23
4. Patscan (available at http://www-unix.mcs.anl.gov/compbio/ PatScan/) should be installed. 5. RNAfold (available at http://www.tbi.univie.ac.at/~ivo/ RNA/RNAfold.html) or other RNA secondary structure prediction software should be installed. 6. miRcheck and supporting scripts (available at http://web. wi.mit.edu/bartel/pub/software.html). 2.2. De Novo Identification of Conserved Plant miRNAs
1. Two or more sets of genomic or cDNA sequences, contained in separate Fasta files. 2. einverted (part of EMBOSS package, available at http:// emboss.sourceforge.net/) should be installed. 3. RNAfold, miRcheck, and supporting perl scripts (as described above).
3. Methods 3.1. Identification of Plant miRNA Homologs
This protocol assumes that the investigator is starting with one or more miRNA sequences to be used as queries in the search, as well as a set of genomic or cDNA sequences in which to search for miRNA homologs.
3.1.1. Identify Pontential Homology Based on Primary Sequence Similarity
% run_patscan.pl miRNAs.fa genome.fa 200 miRNA_matches This script calls patscan to identify matches to known miRNA (contained in miRNAs.fa) in the target genomic sequence (contained in genome.fa) at the specified stringency (in this case, 0–2 substitutions and no insertions/deletions) and output matches to miRNA_matches (see Note 1).
3.1.2. Screen Potential Homologs for miRNA-like Secondary Structure
1. % retrieve_genomic_regions.pl 350 miRNA_matches genome.fa miRNA_matches_g This command retrieves a genomic fragment for each putative miRNA, with 350 flanking nucleotides added to each side of the putative miRNAs (see Note 2). 2. % RNAfold miRNA_matches_g_f This command uses RNAfold to predict secondary structures for each putative miRNA locus. 3. % evaluate_miRNA_candidates.pl miRNA_matches_g_f miRNA_ matches_g_f_miRNAs This script calls miRcheck to evaluate the secondary structure of each putative miRNA. The output (miRNA_matches_g_f ) contains the position of the miRNA within the hairpin (5¢ or 3¢) for each candidate that passed miRcheck. Additional output files (miRNA_ matches_g_f_mature and miRNA_matches_g_f_hairpin) contain the genomic coordinates and sequences of the mature miRNAs and miRNA hairpins for candidates that passed miRcheck.
24
Jones-Rhoades
3.1.3. Quality Control for miRNA Homolog Predictions
Due to the rather vague pairing requirements for plant miRNA hairpins, there is the considerable possibility of recovering false positives when predicting miRNA genes. There are several quality control steps that should be applied to the predicted miRNA genes to help gauge the stringency of the predictions. One potential source of false positives is the mis-annotation of miRNA binding sites in target genes as miRNA hairpins. Because most plant miRNAs are highly complementary to target miRNAs, high-quality matches (0–2 nucleotides substitutions) to a miRNA in a plant genome primarily fall into one of two categories: miRNA genes and miRNA complementary sites in target genes. In general, the former will have predicted secondary structures that pass miRcheck while the latter will not. However, due to the rather loose pairing requirements of plant miRNAs precursors, a fraction of miRNA target sites can be expected to spuriously fold into hairpin structures. Therefore, it is advisable to screen putative miRNA homologs for the potential to encode for proteins. A putative miRNA locus that has similarity to the known targets of that miRNA family (e.g., a putative miR164 hairpin that is antisense to a CUC-like gene) is probably a false positive. A useful check on the validity of the predicted miRNAs is to consider the location of each putative miRNA within its hairpin precursor (i.e., on the 5¢ arm or the 3¢ arm of the hairpin). Each miRNA family has a characteristic location in which the mature miRNA is always found. For example, all miR156 hairpins contain the mature miRNA on the 5¢ arm. An analysis that predicts a substantial number of miRNAs that are located on the opposite arms of their precursors relative to their known homologs (i.e., miR156 homologs on 3¢ arms) should be cause for concern. Another useful control is to repeat the analysis in a context in which there should be no true positives. For example, arbitrary sequences, with the same dinucleotide composition as the actual miRNAs, can be used as starting point of the search. The prediction of a substantial number of “homologs” to these arbitrary sequences should be taken as an indication that one or more steps in the procedure are insufficiently stringent.
3.1.4. A Worked Example: Identification of Homologs to Arabidopsis miRNAs in Moss
This protocol was designed to identify homologs of Arabidopsis miRNAs in the poplar genome (12). Testing this procedure against the moss (Physcomitrella patens) genome demonstrates its utility, while also illustrating some pitfalls. The recent deep sequencing of moss small RNAs provides an independent measure of the actual miRNA content in the moss genome (19). Using the set of mature Arabidopsis miRNAs (miRbase 10.0, representing 94 families) as queries, 581 genomic loci with sequence similarity to an Arabidopsis miRNA, were identified of
Prediction of Plant miRNA Genes
25
which 46 passed miRcheck. Thirty-two of the 34 conserved miRNA genes with experimental evidence (representing seven families) were identified. For the 87 Arabidopsis miRNA families without experimental evidence in moss, only nine putative homologs that passed miRcheck were identified, of which only four were on the correct hairpin arm (see Note 3). 3.2. De Novo Identification of Conserved Plant miRNAs
This protocol assumes that the investigator is starting with two or more databases of genomic or cDNA sequences in which to search for miRNA genes. The procedure outlined was developed to discover miRNAs conserved between Arabidopsis and rice. Because a comparison across a different evolutionary distance (e.g., comparing Arabidopsis to poplar, or two species of moss to each other) might require a very different analysis, it is not possible to describe a single procedure that will work well in all cases. Accordingly, the protocol describes approaches that worked well in the monocotdicot analysis so that investigators interested in other analyses can consider adapting them as needed.
3.2.1. Identify Genomic Regions with miRNA-like Secondary Structure
1. % run_einverted.pl genome.fa einverted_output This script calls einverted to identify imperfect inverted repeats in the genomic sequence. Because einverted cannot run efficiently on large sequences, the genomic sequence is broken up into overlapping 2 kb fragments (see Note 4). 2. % fold_inverted_repeats.pl einverted_output genome.fa einverted_output _f For each inverted repeat, this script retrieves the genomic sequence containing the repeat (plus 10 flanking nucleotides on either side) and uses RNAfold to predict the secondary structures of each genomic fragment. Two structures are predicted for each inverted repeat, corresponding to theoretical transcripts from either DNA strand. 3. % extract_einverted_20mers.pl einverted_output _f einverted_ output_20mers For each secondary structure, miRcheck identifies 20-mers with local base pairing patterns that are compatible with those typically found in plant miRNA precursors. Because each 20-mer is considered separately, a single stem-loop may contain numerous 20-mer miRNA candidates. The end result of this step is a list (probably quite lengthy) of candidate miRNA stem-loops and candidate miRNA precursors, with each candidate stem-loop containing one or more candidate mature miRNAs. (In many cases, the reverse complement stemloop will also contain candidate miRNAs.) Unless overly stringent cutoffs have been used, it is likely that only a very small fraction of these candidates are actual miRNAs. Therefore, additional filters are important to improve the selectivity of the miRNA predictions (see Notes 5 and 6).
26
Jones-Rhoades
3.2.2. Test Robustness of miRNA Hairpin Folding
For most well documented plant miRNA genes, the prediction of a hairpin structure by RNAfold is insensitive to the presence or absence of additional flanking nucleotides. This implies that the predicted free energy of the miRNA hairpin is more favorable than that of alternative structures in which the nucleotides of the miRNA hairpin pair with flanking sequences. In the analysis of Arabidopsis and rice genomic hairpins, it was observed that many 20-mers that passed miRcheck when their inverted repeats (as identified by einverted) were “folded” by RNAfold in isolation, no longer passed miRcheck when folded in the context of 240 genomic nucleotides flanking either side of the 20-mer. Therefore, simply re-evaluating the secondary structure of each miRNA candidate, as predicted from a genomic sequence of arbitrary length centered on the 20-mer, can serve as a useful screen against candidates with unstable hairpins.
3.2.3. Analyze Conservation of Putative miRNAs to Other Genomes
Because of the extreme level of conservation of some miRNAs (in several cases, 100% nucleotide identity of mature miRNAs between angiosperms and moss), the identification of candidates with conservation of miRNA sequence and secondary structure in more than one genome can be a powerful method to enrich authentic miRNAs. One drawback, of course, is that any miRNAs not conserved between the genomes under comparison will not be identified. In practical terms, there are two possible schemes for the genome-wide identification of potentially conserved miRNAs. One approach, as was used in the Arabidopsis/rice analysis, is to independently identify candidate miRNAs in each genome being analyzed (i.e., carry out steps given in Subheadings 3.2.1 and 3.2.2 separately for each genome) and then to compare the candidate 20-mers to identify potential homologs. Alternatively, a list of candidate 20-mers could be identified in one genome, and potential homologs in other genomes could then be identified (using, for example, the first protocol in this chapter). Regardless of the scheme used, an important consideration is the degree of similarity required for two candidates to be considered as potential homologs. Most miRNAs conserved between Arabidopsis and rice have zero to two nucleotide substitutions when compared across genomes. Allowing three substitutions between candidate 20-mer homologs is likely to capture a large number of false positives.
3.2.4. Analyze Conservation of Putative miRNA Hairpins in Other Genomes
In an alignment of two homologous miRNA hairpins from different genomes, the mature miRNA sequences are highly conserved, as are the miRNA* sequences in most cases. The loop region between the miRNA and miRNA* is often highly divergent in both sequence and length. However, plant genomes also contain conserved hairpins (of uncharacterized function) that are
Prediction of Plant miRNA Genes
27
uniformly conserved throughout the hairpin. Therefore, depending on the level of background conservation between the genomes being analyzed, it may be possible to enrich for authentic miRNAs by discriminating against putative miRNA homologs that are not more highly conserved than the loop regions of their putative precursors. 3.2.5. Identify Potential Regulatory Targets of Putative miRNAs
The function of a mature miRNA is to guide a RISC complex to a complementary target RNA. Most plant miRNAs have extensive complementarity to their target RNAs, a fact that greatly facilitates the prediction of plant miRNA targets (20). Importantly, this high degree of complementarity has also facilitated discovery of the miRNA genes themselves through the identification of candidate targets for candidate miRNAs (9, 11). A number of algorithms, all of which substantively agree with each other, have been described for the prediction of plant miRNA targets (11, 21, 22). Any of these methods can be used for the analysis of the targeting potential of candidate 20-mers by searching for targets in annotated genes or EST sequences. The combined analysis of miRNA conservation with the analysis of targeting potential (i.e., identifying conserved miRNA candidates that can base pair to homologous target RNAs in each species) is particularly powerful for enriching for conserved miRNAs. As with other steps, choosing thresholds for target prediction that limit the number of false positive predictions is critical to obtaining meaningful results.
3.2.6. Verification of Predicted miRNAs
Due to the high potential for false positives, miRNA families predicted through genomic analyses should be viewed with skepticism until validated by experimental data. There are several types of experiments that are useful in validating the expression of predicted miRNAs. The detection of the predicted miRNAs on a northern blot can be strong evidence for expression. However, it is worth noting that a faint, fuzzy band on a northern blot is not a reliable indication of expression (see Note 7). A PCR-based assay can detect the presence of rare RNAs in a library of adapter ligated small RNAs (11, 23). Importantly, this approach can also map the 5¢ end of the predicted small RNA. Perhaps, the most powerful method of validating a predicted miRNA is to demonstrate an interaction with a predicted target. 5¢ RACE has been used by numerous labs to detect the in vivo miRNA-mediated cleavage of target RNAs (5, 11, 19, 21, 24– 26), which results in a cleaved target with a 5¢ end that aligns to the tenth nucleotide of the miRNA. Ideally, the 5¢ end of the miRNA, as identified by PCR analysis, should agree with the observed position of cleavage in the target RNA.
28
Jones-Rhoades
4. Notes
1. Other search/alignment algorithms (e.g., blast) can be used instead of patscan. Regardless of the search algorithm used, an important consideration is the extent of similarity that a genomic region must possess to the query miRNA in order to be considered as a potential homolog. While different thresholds are likely to be appropriate in different situations, the requirement for an ungapped alignment between the mature miRNA and its putative homologs with no more than two nucleotide substitutions seems to give a good balance of selectivity and sensitivity.
2. Plant miRNA hairpins can be quite long. Many hairpins are 200 to 300 nucleotides long (including the miRNA, loop region, and miRNA*), with a few ranging up to 650 nucleotides. Therefore, it is important to retrieve sufficient flanking sequence on either side of the putative miRNA. Adding additional flanking sequence will increase the computing time needed to predict the secondary structures of the genomic regions, but does not seem to affect the predicted structures of most miRNAs stem-loops.
3. The 12 putative moss miRNA homologs that lacked experimental evidence are illustrative of some of the pitfalls of miRNA prediction. In two cases, the reverse complement to experimentally supported miRNA* sequences, were identified as miRNA candidates. (The actual miRNAs were also identified as candidates in these cases.) The identity of the strand, encoding the actual miRNA, would be ambiguous in these cases without experimental evidence. Three putative members of conserved miRNA families were not detected by the deep sequencing; it is unclear if these are false positives or miRNA genes that were not represented in the library of sequenced small RNAs. Of the nine putative homologs to non-conserved families, seven were to miR414, a miRNA identified through a bioinformatic search (13) that has not been subsequently validated experimentally in any species (3, 5). Because the miR414 sequence is a degenerate triplet repeat (UCA), it has 393 matches with zero to two nucleotide substitutions in the moss genome. With so many matches to a simple sequence, it is not surprising that a few can be predicted to fold into a hairpin structure. The miR414 case illustrates that certain sequences are prone to noise in bioinformatics analyses (which probably contributed to initial mis-annotation of miR414 as a miRNA), and that given a large enough number of spurious potential miRNA homologs, some can be predicted to have potential miRNA precursors.
Prediction of Plant miRNA Genes
29
4. Using einverted parameters trained on conserved angiosperm miRNAs, the program run_einverted.pl identified 990,192 inverted repeats in the moss genome. Included in this set were inverted repeats corresponding to 95% of 205 experimentally supported moss miRNA genes, most of which are not conserved in angiosperms. An alternative to using einverted to identify genomic inverted repeats, could be to use RNAfold to predict secondary structures for genomic fragments that tile the entire genome, which could then be analyzed by miRcheck. This would have the advantage of not potentially losing miRNA loci that are not identified by einverted, but may add considerable computational load and noise to the analysis.
5. The parameters passed to miRcheck will have a large impact on the outcome. In the analysis of miRNAs conserved between rice and Arabidopsis, I found that setting parameters to be stringent, so that ~15–20% of actual miRNAs did not pass miRcheck, was helpful in reducing the number of 20-mers to a workable number (11). The use of the same stringent parameters in moss captured 80% of moss miRNAs with experimental evidence. As mentioned for other steps in this chapter, the details, goals, and available computing power may suggest that other cutoffs are more appropriate for other analyses.
6. Plant genomes contain numerous instances of simple sequence repeats, as well as runs of single nucleotides. Therefore, it may be helpful to remove simple sequences from the analysis. By default, extract_einverted_20mers.pl removes 20-mers that consist primarily any one or two nucleotides.
7. As an example, nine computationally identified miRNA families (miR413–420; miR426) were annotated in miRbase on the basis of weak northern signals (13). Subsequent deep sequencing experiments have not detected evidence for expression of these miRNAs (3, 5).
References 1. Jones-Rhoades M, Bartel D, Bartel B. Micro RNAs and their regulatory roles in plants. Annu Rev Plant Biol. 2006;57:19–53. 2. Llave C, Kasschau KD, Rector MA, Carrington JC. Endogenous and silencing-associated small RNAs in plants. Plant Cell. 2002;14: 1605–1619. 3. Lu C, Kulkarni K, Souret F, MuthuValliappan R, Tej S, Poethig R, et al. MicroRNAs and other small RNAs enriched in the Arabidopsis RNA-dependent RNA polymerase-2 mutant. Genome Res. 2006;16:1276–1288.
4. Park W, Li J, Song R, Messing J, Chen X. CARPEL FACTORY, a Dicer homolog, and HEN1, a novel protein, act in microRNA metabolism in Arabidopsis thaliana. Curr Biol. 2002;12:1484–1495. 5. Rajagopalan R, Vaucheret H, Trejo J, Bartel D. A diverse and evolutionarily fluid set of microRNAs in Arabidopsis thaliana. Genes Dev. 2006;20:3407–3425. 6. Reinhart BJ, Weinstein EG, Rhoades MW, Bartel B, Bartel DP. MicroRNAs in plants. Genes Dev. 2002;16:1616–1626.
30
Jones-Rhoades
7. Sunkar R, Girke T, Jain PK, Zhu JK. Cloning and characterization of microRNAs from rice. Plant Cell. 2005;17:1397–1411. 8. Sunkar R, Zhu JK. Novel and stress-regulated microRNAs and other small RNAs from Arabidopsis. Plant Cell. 2004;16:2001–2019. 9. Adai A, Johnson C, Mlotshwa S, Archer-Evans S, Manocha V, Vance V, et al. Computational prediction of miRNAs in Arabidopsis thaliana. Genome Res. 2005;15:78–91. 10. Bonnet E, Wuyts J, Rouzé P, Van de Peer Y. Detection of 91 potential conserved plant microRNAs in Arabidopsis thaliana and Oryza sativa identifies important target genes. Proc Natl Acad Sci USA. 2004;101: 11511–11516. 11. Jones-Rhoades M, Bartel D. Computational identification of plant microRNAs and their targets, including a stress-induced miRNA. Mol Cell. 2004;14:787–799. 12. Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, et al. The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science. 2006;313:1596–1604. 13. Wang X, Reyes J, Chua N, Gaasterland T. Prediction and identification of Arabidopsis thaliana microRNAs and their mRNA targets. Genome Biol. 2004;5:R65. 14. Xie F, Huang S, Guo K, Xiang A, Zhu Y, Nie L, et al. Computational identification of novel microRNAs and targets in Brassica napus. FEBS Lett. 2007;581:1464–1474. 15. Chiou T, Aung K, Lin S, Wu C, Chiang S, Su C. Regulation of phosphate homeostasis by MicroRNA in Arabidopsis. Plant Cell. 2006;18: 412–421. 16. Fujii H, Chiou T, Lin S, Aung K, Zhu J. A miRNA involved in phosphate-starvation response in Arabidopsis. Curr Biol. 2005;15: 2038–2043.
17. Fahlgren N, Howell M, Kasschau K, Chapman E, Sullivan C, Cumbie J, et al. High-throughput sequencing of Arabidopsis microRNAs: evidence for frequent birth and death of MIRNA genes. PLoS ONE. 2007;2:e219. 18. Lu C, Tej S, Luo S, Haudenschild C, Meyers B, Green P. Elucidation of the small RNA component of the transcriptome. Science. 2005; 309:1567–1569. 19. Axtell M, Snyder J, Bartel D. Common functions for diverse small RNAs of land plants. Plant Cell. 2007;19:1750–1769. 20. Rhoades MW, Reinhart BJ, Lim LP, Burge CB, Bartel B, Bartel DP. Prediction of plant microRNA targets. Cell. 2002;110:513–520. 21. Allen E, Xie Z, Gustafson A, Carrington J. microRNA-directed phasing during transacting siRNA biogenesis in plants. Cell. 2005; 121:207–221. 22. Schwab R, Palatnik J, Riester M, Schommer C, Schmid M, Weigel D. Specific effects of microRNAs on the plant transcriptome. Dev Cell. 2005;8:517–527. 23. Lim LP, Lau NC, Weinstein EG, Abdelhakim A, Yekta S, Rhoades MW, et al. The micro RNAs of Caenorhabditis elegans. Genes Dev. 2003;17:991–1008. 24. Kasschau KD, Xie Z, Allen E, Llave C, Chapman EJ, Krizan KA, et al. P1/HC-Pro, a viral suppressor of RNA silencing, interferes with Arabidopsis development and miRNA function. Dev Cell. 2003;4:205–217. 25. Llave C, Xie Z, Kasschau KD, Carrington JC. Cleavage of Scarecrow-like mRNA targets directed by a class of Arabidopsis miRNA. Science. 2002;297:2053–2056. 26. Palatnik JF, Allen E, Wu X, Schommer C, Schwab R, Carrington JC, et al. Control of leaf morphogenesis by microRNAs. Nature. 2003;425:257–263.
Chapter 3 Methods for Isolation of Total RNA to Recover miRNAs and Other Small RNAs from Diverse Species Monica Accerbi, Skye A. Schmidt, Emanuele De Paoli, Sunhee Park, Dong-Hoon Jeong, and Pamela J. Green Abstract For the experimental analysis of miRNAs and other small RNAs in the 20–25 nucleotide (nt) size range, the first and most important step is the isolation of high-quality total RNA. Because RNA degradation products can mask or dilute the presence of true miRNAs, it is important when choosing a method that it efficiently extracts RNA from tissues in a manner that prevents degradation of RNA of both high and low molecular weight. In addition, the presence of polyphenols, polysaccharides, and secondary metabolites may render nucleic acids insoluble, and hinder the recovery of the miRNAs. Finally, and most importantly, the method chosen must be capable of retaining the small RNA component. In this chapter, we will present a set of total RNA isolation methods that can be used to maximize the recovery of high-quality RNA to be used in miRNA analysis for a large number of plant species and tissue types. Key words: RNA, Total RNA extraction, Small RNA extraction, TriReagent®, TRIzol®, Plant RNA Isolation Reagent®
1. Introduction Plants present a wide range of tissue types, both within and between species, which can often make universal methods for molecular biological techniques difficult to develop. Plant tissue varies greatly, from soft green tissue to dry seeds, flowers, roots, xylem and bark, waxy leaves, spiny needles and hard or juicy fruits.
B.C. Meyers and P.J. Green (eds.), Plant MicroRNAs, Methods in Molecular Biology, vol. 592 DOI 10.1007/978-1-60327-005-2_3, © Humana Press, a part of Springer Science + Business Media, LLC 2010
31
32
Accerbi et al.
Total RNA isolation in plant species is often complicated by the presence of recalcitrant tissue and organic compounds (1–4). While many methods exist to extract high-quality RNA from plant tissues, not all are preferred for miRNA analysis. Ion exchange and silica based separation techniques, including the Qiagen RNeasy spin column, are biased against RNAs less than 200 nt in size, and do not effectively retain the miRNA fraction. Furthermore, while LiCl precipitation is often used for total RNA extraction, this salt is largely incapable of retaining the small RNA population and therefore should be substituted with sodium salt and/or alcohol precipitation (5). The isolation of high-quality RNA depends greatly on the treatment and handling of the tissue prior to RNA extraction. Cellular ribonucleases act quickly and efficiently to degrade RNA upon cell lysis; therefore, tissue frozen in liquid nitrogen and stored at no more than −70°C will provide the best possible material for miRNA analysis. Commercial reagents such as RNAlater® (Ambion) may increase storage stability without compromising the quality or quantity of the recovered RNA, although the methods we describe do not rely on this reagent. The process of RNA isolation preferred for miRNA analysis is carried out in three main steps: lysis and denaturation, organic extraction, and RNA precipitation. A chaotropic denaturing agent such as guanidinium, or sometimes a strong reducing agent like 2-mercaptoethanol, will disrupt RNase activity upon cellular disruption and maintain the integrity of the RNA (6–8). In addition to the denaturing activity achieved by the disruption of disulfide bonds, reducing agents in the extraction buffer will prevent the oxidation of polyphenolics, rendering them incapable of binding to nucleic acids (9, 10). Organic extraction by phenol or acid phenol, and chloroform or 1-bromo-3-chloropropane (BCP), followed by centrifugation, will separate the total RNA from DNA and protein components. Finally, alcohol precipitation of the RNA from the upper aqueous layer is capable of retaining both large and small RNAs. RNA extraction from most green tissues can usually be carried out very successfully using TriReagent® (Molecular Research Center, Inc.) or TRIzol® (Invitrogen): this is the same product marketed by different companies. Based on the key reagent in a method developed by Chomczynski and Sacchi (6), the TriReagent® is a mono-phasic solution of phenol and guanidine isothiocyanate. Upon the addition of chloroform or BCP followed by centrifugation, the solution will separate into an organic and an aqueous phase, the RNA remaining in the clear aqueous phase. This method, when followed by alcohol precipitation, provides a high recovery of the small RNA fraction, as well as high-quality, non-degraded RNA.
Methods for Isolation of Total RNA to Recover miRNAs
33
For plant tissues that contain a high level of polyphenolics, polysaccharides, and fibrous materials (e.g., conifers and seeds), Plant RNA Isolation Reagent® (Invitrogen) (PRIR) is a better choice for RNA extraction. This proprietary reagent, when compared to TRIzol® or TriReagent® extraction of Arabidopsis leaves in parallel, provides RNA with a reduced A230 and a higher A260/A280 ratio. A description of these absorbances and the ratios characteristic of high-quality RNA can be found in Subheading 3.7.2 below. Highly recalcitrant plant tissues often have a high abundance of secondary metabolites that may interfere with guanidinium extraction (11). Another method of RNA isolation we present in this chapter is based on (11), with modifications by M. Perez-Amador (personal communication), and adapted for the recovery of small RNA molecules. In this instance, a highly saline, high-pH extraction buffer is used in place of guanidinium extraction, and is useful in cases such as orange tissue, where secondary metabolites may interfere with normal RNA extraction. In this chapter, we present three RNA extraction protocols that facilitate successful isolation of high-quality total RNA from a variety of representative plant species and tissue types with minimal addition of proprietary materials. The scheme in Fig. 1 can be used to help deduce an appropriate method and possible modifications that may be successful for a given sample. Table 1 lists the methods and modifications that we have used successfully to isolate total RNA that retains small RNA for nearly 100 different plant samples. These details are meant to provide an example of what has worked for a wide range of species and tissues, rather than suggest that these approaches are the only effective ones. While many commercial extraction buffers and published RNA isolation methods are specifically tailored to the type of tissue and use for the RNA desired, the methods outlined below are preferred not only for their universal application to plant species, but also for their relatively low cost in both time and materials. The RNA obtained from any of these methods may be utilized for RT-PCR, gel blotting, and library construction, or to analyze mRNAs, housekeeping RNAs such as rRNAs and tRNAs, and small RNAs such as miRNAs and siRNAs. In addition, the total RNA may be used subsequently to isolate low molecular weight RNA (~200 nt and smaller; an excellent preparation for examining small RNAs with gel blotting) and small RNA (~20–25 nt). Detailed protocols for the isolation of these RNA size fractions and the preparation of small RNA cDNA libraries can be found in Chap. 8 of this volume, and that for detection of small RNAs with gel blotting can be found in Chaps. 13 and 14.
Low Yield
Insoluble Precipitate, Low 260/230
High-Quality RNA
High Salt
Extra Chloroform
LowQuality RNA
High-Quality RNA
LowQuality RNA
Low Yield
Insoluble Precipitate, Low 260/230
Degraded RNA, Low 260/280
High-Quality RNA
Potassium Acetate (+ High Salt)
Extra Chloroform (+Acid Phenol)
II-B. Modified PRIR
LowQuality RNA
High-Quality RNA
III. Guanidinium -Free
Recalcitrant tissue, high levels of secondary metabolites
Isolation Reagent, or (III) Guanidinium-Free. For most green tissues or tissues of unknown properties, the most simple and cost-effective method is the TriReagent Isolation. However, if the RNA from this method is of poor quality, assessed by the characteristics of the resultant RNA, it is possible to determine using the illustrated scheme which method will likely suit the specific tissue. Low-quality RNA can have a variety of characteristics. A low A260/A280 ratio is indicative of protein contamination, whereas a low A260/A230 ratio or an insoluble precipitate indicates polysaccharide, salt, or phenolic contamination. Low yield can be determined by an overall low OD. Although yield is not synonymous with low quality, it can lead to low quality if multiple preparations must be combined and contaminants become too concentrated. Furthermore, degraded RNA can be seen by visual inspection on an agarose gel. As discussed in the text, these methods and modifications should enable the extraction of high-quality RNA from almost any tissue type.
Fig. 1. The isolation of high-quality RNA from most plant tissues can be accomplished using a modified version of one of three extraction methods: (I) TriReagent, (II) Plant RNA
High-Quality RNA
LowQuality RNA
II-A. Basic PRIR
I-B. Modified TriReagent
I-A. Basic TriReagent
Degraded RNA, Low 260/280
High levels of polyphenolics polysaccharides, and fibrous tissue
Green tissue
34 Accerbi et al.
Roots Roots Roots Roots Roots inoculated w/ G. intraradices Roots inoculated w/ M. incognita Roots mock inoculated Roots, stripped Root hairs Nodules 14 dpi Nodules 14 dpi Nodules 21 dpi Stolon Rhizoid
Medicago Soybean Soybean Medicago Soybean Common bean Potato Miscanthus, Zostera
TriReagent TriReagent TriReagent TriReagent PRIR PRIR PRIR PRIR Potassium acetate, high salt
Medicago
Extra chloroform
Common Bean, Rice Marsilea, Medicago, Mimulus Zostera Nuphar Medicago
(continued)
Lettuce, orange, pepper, potato, rice, Silene, soybean, tobacco, wheat Amborella, Aristolochia Marsilea, Pumpkin Cycas, Ginkgo, Grapevine, Maize, Mimulus, Miscanthus, Nuphar, Petunia, Poplar, Sorghum, Switchgrass Banana Avocado Zostera Cotton Spruce
Speciesc
PRIR
TriReagent PRIR PRIR Guanidinium-Free TriReagent
PRIR PRIR PRIR PRIR PRIR PRIR
Leaves Leaves Leaves Leaves Leaves Needles
Radical tissues
High salt
TriReagent TriReagent
Leaves Leaves Extra chloroform Potassium acetate Extra chloroform, high salt Acid phenol, proteinase K, Extra chloroform Extra chloroform
Extra chloroform
Modificationsb
TriReagent
Base methoda,b
Leaves
Foliage tissues
Tissue
Table 1 RNA isolation methods that retain small RNA from plant species
Methods for Isolation of Total RNA to Recover miRNAs 35
Whole flower Whole flower Whole flower Whole flower Whole flower Whole flower Spikelet Ovules Strobilus (cone) Strobilus (cone) Fruit Fruit Fruit Fruit Fruit Grain Seed pod Seeds Seeds Seeds 20 DAA
Reproductive tissues
Tissue
Table 1 (continued)
TriReagent PRIR PRIR PRIR PRIR Guanidinium-free PRIR PRIR PRIR PRIR PRIR PRIR PRIR PRIR Guanidinium-free PRIR PRIR TriReagent PRIR PRIR
Base methoda,b
Potassium acetate, high salt Small scale
Extra chloroform
Extra chloroform Potassium acetate Acid phenol, proteinase K, extra chloroform
Extra chloroform
Extra chloroform Potassium acetate Extra chloroform, high salt
Modificationsb
Lettuce, orange, rice, Silene, soybean, tobacco, pepper Aristolochia, avocado, common bean, petunia, potato Banana, cotton, pumpkin Grapevine Zostera Nuphar Barley, maize, Miscanthus, sorghum, switchgrass, wheat Cycas Cycas, Ginkgo Spruce Avocado Banana Grapevine Pepper Orange Sorghum Tobacco Soybean Common bean Medicago
Speciesc
36 Accerbi et al.
TriReagent PRIR PRIR TriReagent LS TriReagent PRIR PRIR TriReagent Extra chloroform, high salt Extra chloroform
Potassium acetate
Barley Poplar Poplar Pumpkin Chlamydomonas, Volvox Coleochaete, Klebsormidium, Spyrogira, Ulva Chara Porphyra
b
a
PRIR, Plant RNA Isolation Reagent® (Invitrogen); TriReagent® and TriReagent LS® (Molecular Research Center, Inc.) See details in Methods section c See http://smallrna.udel.edu
Seedlings Xylem Xylem, tension Phloem sap Algal thallus Algal thallus Algal thallus Algal thallus
Other tissues
Methods for Isolation of Total RNA to Recover miRNAs 37
38
Accerbi et al.
2. Materials 2.1. General Materials and Reagents Needed to Perform the RNA Extractions Described
1. Instruments: (a) Centrifuge with rotor (15,000 × g) (b) Microcentrifuge (c) Spectrophotometer 2. RNase-free mortars, pestles and spatulas: wrap in foil and bake at 220°C at least 12 hours. 3. RNase-free tips, microcentrifuge tubes and 13 ml sterile centrifuge tubes are commercially available; 30 ml non-disposable centrifuge tubes are treated as follows: (a) Soak tubes in 3% hydrogen peroxide at least 1 h; rinse well with DEPC-water and let air dry. (b) Alternatively, soak tubes in 0.1% DEPC-water overnight and let air dry. (c) Autoclaving is recommended for tubes being used beyond the extraction buffer and organic solvent steps. 4. RNase-free chemicals: solutions must be made with DEPCtreated distilled water and sterilized. 5. DEPC-water: add 0.05% DEPC to distilled water, stir overnight and autoclave. RNase-free water is also available commercially. 6. Liquid nitrogen. 7. Gloves: make sure you are wearing clean gloves at all times when handling RNA solutions.
2.2. Plant RNA Isolation Reagent®
1. Plant RNA Isolation Reagent (PRIR) (Invitrogen, Cat. No. 12322-012).
2.2.1. Materials
2. 5 M NaCl. 3. Chloroform. 4. Isopropyl alcohol. 5. 75% ethanol. 75% ethanol is stored at −20°C and used cold. 6. RNase-free water.
2.2.2. Optional Materials
1. High salt solution: 0.8 M sodium citrate, 1.2 M NaCl. 2. Acid phenol:chloroform:isoamyl alcohol 125:24:1, pH 4.5 (Ambion, Cat. No. AM9720). 3. 2 M Potassium acetate, pH 5.5. 4. Proteinase K 20 mg/ml. 5. SDS 20%. 6. 3 M Sodium acetate, pH 5.2. 7. Ethanol.
2.3. Plant RNA Isolation Reagent®: Small Scale Procedure for Medicago truncatula Developing Seeds (20 Days After Anthesis, DAA)
Methods for Isolation of Total RNA to Recover miRNAs
39
1. PRIR, Invitrogen, Cat. No. 12322-012. 2. 5 M NaCl. 3. Chloroform. 4. 2 M Potassium acetate, pH 5.5. 5. High salt solution: 0.8 M sodium citrate, 1.2 M NaCl. 6. Isopropyl alcohol. 7. 75% ethanol. 8. RNase-free water. 9. 3 M Sodium acetate, pH 5.2. 10. Ethanol.
2.4. TriReagent ® / TRIzol® RNA Isolation
1. TriReagent® (Molecular Research Center, Cat. No. TR-118) or TRIzol® (Invitrogen, Cat. No.15596-018).
2.4.1. Materials
2. Chloroform or 1-bromo-3-chloropropane (BCP). 3. Isopropyl alcohol. 4. 75% ethanol. 5. RNase-free water.
2.4.2. Optional Materials
1. Acid phenol:chloroform:isoamyl alcohol 125:24:1, pH 4.5 (Ambion, Cat. No. AM9720). 2. 3 M Sodium acetate, pH 5.2. 3. Ethanol.
2.5. TriReagent LS ® RNA Isolation
1. TriReagent LS® (Molecular Research Center, Cat. No. TS-120). 2. 20 mg/ml glycogen or Polyacryl Carrier® (Molecular Research Center, Cat. No. PC-152). 3. Chloroform or BCP. 4. Isopropyl alcohol. 5. 75% ethanol. 6. RNase-free water.
2.6. Guanidinium-Free RNA Isolation
1. Extraction buffer: (a) 100 mM TrisHCl, pH 9.0 (b) 200 mM NaCl (c) 15 mM EDTA, pH 8.0 (d) 0.5% Sarkosyl (e) 8 ml/ml 2-mercaptoethanol added just prior to use. 2. Tris-HCl saturated phenol, pH 6.7. 3. Phenol:chloroform:isoamyl alcohol, 25:24:1.
40
Accerbi et al.
4. Chloroform:isoamyl alcohol, 24:1. 5. Isopropyl alcohol. 6. 3 M Sodium acetate, pH 5.2. 7. 70% ethanol. 8. RNase-free water.
3. Methods 3.1. Grinding of Plant Material with Mortar and Pestle in Liquid Nitrogen
The following general guidelines apply to any RNA work described in this chapter. The tissue must remain frozen at all times during grinding to avoid release of ribonucleases. Keep adding liquid nitrogen to the mortar as soon as it evaporates. 1. Freeze mortar and pestle at −80°C for at least 1 h. 2. Working on a clean bench, pour some liquid nitrogen in the mortar to chill it. 3. Add the sample to the mortar and start grinding gently to break the bigger pieces. Increase grinding speed and force as the powder becomes finer. Keep adding liquid nitrogen and grind until no tissue particles are visible; typically, sufficiently ground green tissue appears almost white. 4. Chill an RNase-free spatula in liquid nitrogen and transfer the frozen powder to a tube containing the extraction buffer (see step 1 of chosen method). 5. Immediately close the tube and vortex to dissolve any clumps, no more than a few seconds. 6. Quickly unscrew the tube cap to release any pressure which may have arisen due to the evaporation of residual liquid nitrogen. This step is particularly important if the transferred powder looks wet, indicating that the liquid nitrogen has not completely evaporated. 7. Close the tube and continue vortexing for the amount of time indicated in the method of choice (generally in step 1); make sure no clumps of powder are visible. 8. Proceed with the extraction method of choice, generally starting with step 2.
3.2. Plant RNA Isolation Reagent® 3.2.1. General Information
This procedure is adapted from the manufacturer’s protocol (Invitrogen) “Large-Scale RNA Isolation” with the following changes: 1. If you do not know the weight of the sample, estimate the amount of reagent to use by the sample volume: use about 1 ml of loosely packed ground frozen tissue per 10 ml of reagent.
Methods for Isolation of Total RNA to Recover miRNAs
41
2. All centrifugations are carried out at 10,000–12,000 × g in 13 ml or 30 ml centrifuge tubes. 3. Clarification of the extract is achieved by centrifugation instead of filtering. 3.2.2. Procedure
1. Add the frozen ground tissue to a tube containing PRIR and mix thoroughly by vortexing, about 1 min. 2. Incubate for 5 min at room temperature, laying the tube on its side with gentle shaking. 3. Centrifuge 10 min, 4°C at 12,000 × g to precipitate insoluble material; transfer the clarified supernatant to a clean tube (see Note 1). 4. Per 10 ml of clarified supernatant, add 2 ml of 5 M NaCl and mix by inverting the tube, then add 6 ml of chloroform and mix by vortexing. 5. Centrifuge 10 min, 4°C at 12,000 × g to separate the phases; transfer the top aqueous phase to a clean tube, being careful not to disturb the interface. Recovery should be 8–10 ml (see Note 2). 6. Precipitate the RNA by adding 0.9 volumes of isopropyl alcohol and mix gently by inverting the tube 5–6 times (see Notes 3 and 4). 7. Incubate for 10–15 min at room temperature. 8. Centrifuge 30 min, 4°C at 12,000 × g to precipitate the RNA. Gently decant the supernatant, taking care not to disturb the pellet. 9. Wash the pellet by adding 5–10 ml of cold 75% ethanol and mix by vortexing. 10. Centrifuge 10 min, 4°C at 12,000 × g. 11. Gently decant the ethanol, taking care not to disturb the pellet. Briefly centrifuge to collect all residual liquid and remove it with a pipette; air dry the pellet, but not completely. 12. Dissolve the RNA pellet in RNase-free water and transfer it to a clean microcentrifuge tube (see Notes 5 and 6). 13. Measure the OD of the RNA. Good-quality RNA should have an A260/A280 ratio > 1.7 and an A260/A230 > 1.4 (see Note 6 and Subheading 3.7.2 below).
3.3. Plant RNA Isolation Reagent®: Small Scale Procedure for Medicago truncatula Developing Seeds (20 Days After Anthesis, DAA)
14. Store the RNA at −80°C. With this plant material, the RNA obtained by pooling numerous small scale extractions was of better quality than any single large-scale preparation. 1. Grind about 30 seeds in liquid nitrogen with mortar and pestle.
42
Accerbi et al.
2. Add the frozen ground tissue to a tube containing 6 ml of PRIR (0.5 ml PRIR/0.1 g tissue) and mix thoroughly by vortexing, about 1 min. 3. Incubate for 5 min at room temperature, laying the tube on its side with gentle shaking. 4. Divide the buffer-tissue mix in 0.5 ml aliquots into 1.5 ml microcentrifuge tubes. 5. Centrifuge 5 min, 4°C at 12,000 × g. 6. Transfer the supernatant to clean tubes. 7. To each supernatant, add 100 ml of 5 M NaCl and mix by vortexing, then add 300 ml of chloroform and mix by vortexing. 8. Centrifuge 15 min, 4°C at 12,000 × g. 9. Transfer the upper phase to clean tubes. 10. To the upper phase add an equal volume of 2 M KOAc. 11. Incubate on ice for 1 h. 12. Centrifuge 15 min, 4°C at 12,000 × g. 13. Transfer the supernatant to clean tubes. 14. To each supernatant, add ½ volume high salt solution and mix by inverting the tube, then add ½ volume isopropyl alcohol and mix by inverting the tube. 15. Centrifuge 10 min, 4°C at 12,000 × g. 16. Wash the pellet by adding 0.5 ml of cold 75% ethanol and mix. 17. Centrifuge 3 min, 4°C at 12,000 × g. 18. Gently decant the ethanol, taking care not to disturb the pellet. Briefly centrifuge to collect all residual liquid and remove it with a pipette; air dry the pellet, but not completely. 19. Dissolve RNA in 20 ml of RNase-free water. 20. Pool all the RNA fractions and OD the resulting solution. If the ratios are not optimal (A260/A280 21 nucleotides, FASTA format) that is unique to the respective splice form should be used as target and the identifier/name of the splice form (e.g., At2g23450.2) should serve as a header. 5. Not annotated genes, and sequence variants: If the gene to be silenced is not contained in the WMD database for the respective species (e.g., GUS, GFP, viral genes, genes that are not yet annotated, etc.), or represents a sequence variant of an annotated gene (e.g., an allele from a different ecotype or cultivar), the target gene sequence(s) should be provided in FASTA format, headed by a custom name that is different from that of any other annotated gene. Silencing of sequence variants also requires the reference allele to be specified as an accepted off-target unless allele-specific silencing is desired. In all cases, the respective plant species/genome release is to be selected from the drop-down menu on the WMD-Designer page to ensure specificity within this set of sequences. Specification of the “minimal number of included targets” is necessary when more than two genes are to be silenced simultaneously. In addition to finding an amiRNA that silences all genes simultaneously, WMD will attempt to generate amiRNAs that target all possible subgroups of sizes greater or equal to the number given in “minimal number of included targets.” The computation of amiRNAs will take between a few minutes and several hours. The results will be emailed to the provided emailaddress with the information entered in “Description” as the subject line.
76
Schwab et al.
Fig. 1. Example of a WMD result page. Candidate amiRNA sequences are listed and ordered by efficiency and specificity criteria. 3.1.3. Processing amiRNA Design Results
The amiRNA result email contains a hyper-link to a results webpage on which the amiRNA candidate sequences are listed (Fig. 1). This list can also be downloaded for future reference (Microsoft Excel format). See Note 3 in case the results page is empty and displays the message “Unfortunately WMD3 could not design any microRNAs”. In principle, all amiRNA sequences returned by WMD fulfill the above described criteria and are expected to silence the predicted target genes successfully. However, they comply differently to the parameters that we consider optimal in terms of base pair composition, hybridization properties to the target gene(s), and specificity criteria (for details, see (6)). The possible amiRNAs are therefore ranked by a respective cumulative score. The highest ranking amiRNA candidates are presented on top of the list. Green color indicates a very favorable score, while orange and red often mark amiRNAs with potentially reduced efficiency or, more often, specificity. It is thus recommended to proceed from top to bottom of this list. The amiRNA sequences in the results page are hyperlinked to visualizations of alignments of the amiRNA to all potential target sequences in the WMD database, ordered by hybridization energy (for illustration see Fig. 2). The intended target gene ideally appears on top of the list. The WMDTarget Search tool can also show genes that align to the amiRNA
Directed Gene Silencing with Artificial MicroRNAs
77
Fig. 2. Alignment of an amiRNA to its target gene. Explanation to the amiRNA-target alignment presented by WMD.
with equal or fewer than five mismatches but do not fulfill other empirical rules, as indicated with respective notes in red bars. For the subsequent selection of one (or more) amiRNAs for further experiments, we recommend the following: 1. It is preferable for all intended target genes to not have mismatches to the amiRNA at positions 2 to 12. 2. AmiRNA candidates with one or two mismatches at the 3¢ end of the amiRNA (positions 18 to 21) should be preferred, since it has been suggested that perfectly matching amiRNAs might trigger so-called transitive siRNA formation, where amplification of sequences adjacent to the binding site is primed by the miRNA. These sequences could in turn themselves serve as silencing triggers and affect other, unintended genes (7). 3. The absolute hybridization energy of the binding between amiRNA and the target sequence should be less than −30 kcal/ mole, and preferable be in the range between −35 and −40 kcal/ mole. 4. The amiRNA binding site should be located within the coding region of the target gene, since UTRs are more likely to be misannotated. At least two amiRNAs per target gene or group of genes should be selected for experimental work. If several are selected, the amiRNAs should bind the target mRNA at different locations, since secondary structure is suspected to influence miRNA efficacy. 3.2. Construction of aMIRNA Precursors by Site-Directed Mutagenesis
While exogenous small RNAs duplices are often directly used to transfect animal cell cultures and induce gene silencing, their accumulation in plants requires the construction and subsequent expression of a precursor RNA. AmiRNAs are engineered into vectors that contain endogenous MIRNA precursors by sitedirected mutagenesis, such that the resulting precursor RNAs are processed by the endogenous miRNA machinery to release the
78
Schwab et al.
amiRNAs. Several Arabidopsis precursor templates have been used successfully in Arabidopsis (4, 8–11) and other plants (8, 26), while rice osa-MIR528 was specifically engineered for amiRNA production in rice (12) and cre-MIR1157 and cre-MIR1162 have been used successfully in the unicellular green alga Chlamydomonas reinhardtii (24, 25). Please see Note 4 when working with a different plant species. MIRNA precursors fold back on themselves to form a hairpin structure, and it is important to preserve this structure for successful processing. Therefore, engineering of amiRNAs into MIRNA precursor templates not only requires the exchange of the miRNA by the amiRNA sequence, but also of the pairing region in the hairpin, called the (a)miRNA*, such that pairing positions as well as G:U pairs are retained. The WMD software (WMD-Oligo window) thus generates four oligonucleotides per amiRNA sequence input: I and II to engineer the actual amiRNA, and III and IV for the amiRNA* (with wobbles). Currently, the software supports ath-MIR319a, osa-MIR528, and cre-MIR1157 (selected from a dropdown menu) and others will be included as they become available. 3.2.1. The MIRNA Templates
Endogenous MIRNA precursors that have been cloned into plasmids serve as templates for PCR reactions to exchange miRNA and miRNA*. These precursors include the hairpin and short pieces of flanking sequence on either side, which are known to be part of the longer endogenous MIRNA transcript. Plasmids that are currently available contain ath-MIR319a (plasmid pRS300) and osa-MIR528 (plasmid pNW55), which can be obtained upon sending a request to Detlef Weigel (
[email protected]). A schematic representation of these MIRNA containing plasmids is shown in Fig. 3; their complete sequences are available on http://wmd3.weigelworld.org.
3.2.2. Oligonucleotides
Six PCR oligonucleotide primers are needed to produce an aMIRNA transgene. Four primer sequences are generated by WMD (WMD-Oligo window) and are given in 5¢→3¢ orientation. They are 40 nucleotides long and specific for the intended amiRNA. The 5¢ most two and 3¢ most 17 nucleotides match the template MIRNA precursor, while the 21 nucleotides in between do not match and will generate the amiRNA and amiRNA* in the amplicon (see Fig. 4). An additional two general oligonucleotides (A and B; see sequences in Table 1) that match the harboring plasmid (a pBluescript derivative) outside of the MIRNA precursor are also required. They have been placed such that the sizes of the resulting PCR products enable convenient purification and handling. Using the six primers, the aMIRNA precursor is amplified in three pieces (a–c) as shown in Fig. 4. The three pieces are subsequently fused to one amplicon (d) in a single PCR reaction.
Directed Gene Silencing with Artificial MicroRNAs
79
Fig. 3. Template plasmids for construction of the amiRNA precursor, the aMIRNA foldback. (a) Plasmid pRS300 containing the ath-MIR319a precursor in pBluescript SK (cloned via the SmaI site). (b) Plasmid pNW55 containing the osa-MIR528 precursor in pBluescript KS (also cloned via the SmaI site). Complete plasmid sequences are available at http://wmd3. weigelworld.org. Abbreviations: A, B oligonucleotide binding sites; T3, T7 RNA polymerase/oligonucleotide binding sites; Amp Ampicillin resistance gene; MCS multiple cloning site. Sizes of the aMIRNA foldback and surrounding regions are indicated in Fig. 4.
Fig. 4. Schematic representation of PCR reactions that generate aMIRNA precursors. (a) Illustration of the template plasmid (see Fig. 3) with oligonucleotide binding sites indicated. (b) PCR amplicons (a), (b), and (c). (c) (a), (b), and (c) are fused to (d) by PCR. (d) Only the central part encodes the aMIRNA precursor, which is schematically shown at the bottom. Abbreviations: Ath Arabidopsis thaliana; Osa Oryza sativa (rice); A, B, I, II, III, IV oligonucleotide identifiers (see text); MCS multiple cloning site; (a), (b), (c), (d) PCR fragments as indicated in the text.
3.2.3. Generating aMIRNA Precursors by Overlapping PCR
1. Resuspend the template plasmid upon receiving, transform into competent E. coli cells (standard lab strain), spread on ampicillin-containing LB plates, inoculate an overnight culture from a single colony, and isolate the plasmid again using standard plasmid isolation procedures. Prepare a 1:100 dilution.
80
Schwab et al.
Table 2 PCR reactions Template
Forward oligo
Reverse oligo
Template
Length of PCR product (bp)
A
IV
pRS300
272
(b)
III
II
pRS300
171
(c)
I
B
pRS300
298
(d)
A
B
(a) + (b) + (c)
701
(a)
A
II
pNW55
256
(b)
I
IV
pNW55
87
(c)
III
B
pNW55
259
(d)
A
B
(a) + (b) + (c)
555
Reaction
ath-MIR319a (a)
osa-MIR528
2. Setting up PCR reactions (a) to (c). All PCR reactions should preferentially be carried out with a proof-reading polymerase (such as Pfu) to avoid PCR errors. Table 2 shows the oligonucleotide combinations for each PCR reaction together with the expected size of the product (see also Fig. 3). Reactions (a) to (c): 2.0 µl
10× PCR buffer (with ~25 mM Mg++)
2.0 µl
dNTPs (2 mM)
1.0 µl
each oligonucleotide (10 µM; see PCR scheme)
1.0 µl
template DNA (1:100 dilution of template plasmid)
0.2 µl
polymerase
12.8 µl
water
20 µl
total
Protocol: 95° C 2 min ü ï 95° C 30 s ï ï 52° C 30 s ý 35cycles 72° C 40 s ï ï 72° C 7 min þï
Directed Gene Silencing with Artificial MicroRNAs
81
3. Isolate PCR fragments from a 2% agarose gel and purify with standard gel extraction procedures. PCR fragments from reactions (a), (b), and (c) can be pooled already at this step. Elute in 20 µl of water. 4. Reaction (d): fusion of fragments (a), (b), and (c). 2.0 µl
10× PCR buffer (with Mg++)
2.0 µl
dNTPs (2 mM)
1.0 µl
oligonucleotides A and B (10 µM)
0.5 µl
each purified gel fragment (a), (b), and (c) or 1.5 µl of combined gel eluate
0.2 µl
polymerase
12.3 µl
water
Protocol: 95° C 2 min ü ï 95° C 30 s ï ï 52° C 30 s ý 35cycles 72° C 90 s ï ï 72° C 7 min ïþ 5. Isolate PCR fragment from a 1% agarose gel. 3.3. Cloning
To sequence-verify the fusion-PCR product (d), it can be bluntend ligated into a standard cloning vector. It is important to keep in mind that this PCR fragment contains the T3 and T7 primer sites and the Multiple Cloning Sites of the template plasmid (see Fig. 2). Using T3 and/or T7 primers for sequencing may cause failed sequencing reactions, if the vector of choice also contains T3 and/or T7 sites.
3.3.1. Blunt-End Cloning Using Kits or Linearized Plasmids
(See Note 5 for the use of gateway-compatible plasmids.) PCR reactions with proof-reading polymerases generate blunt-ended products. Some companies offer kits to directly clone blunt-ended DNA fragments (e.g., TOPO kits from Invitrogen), and it is recommended to follow the manufacturer’s recommendations. Another simple and cheap protocol to clone blunt-ended PCR products is based on plasmids that are linearized with a restriction enzyme that produces blunt ends (e.g., SmaI). Since PCR products are not 5¢ phosphorylated, the plasmid needs to retain its terminal phospho-groups after restriction and is directly used for ligations without prior dephosphorylation or purification. Re-ligation of the empty plasmid is prevented by addition of SmaI to the ligation mix.
82
Schwab et al.
Ligation Reaction: 1.0 µl
10× reaction buffer for SmaI
0.5 µl
ATP (10 mM)
1.0 µl
plasmid cut with SmaI, not dephosphorylated, and not purified
1.0 µl
T4 DNA ligase (10 U/µl)
0.3 µl
SmaI
6.2 µl
purified PCR fragment
The ligation mix is incubated at 16°C overnight, followed by ~2 h at 30°C (optimal temperature for SmaI restriction) prior to transformation into standard competent E. coli strains. If possible, blue white selection for the presence of an insert is recommended. Single colonies are cultured, and the recovered plasmid DNA should be test digested (e.g., with EcoRI and BamHI to yield a 408 bp band with the ath-MIR319a template, 268 bp with the osa-MIR528 template) prior to sequence verification with standard oligonucleotides (depending on the plasmid) or oligonucleotides A or B. Sequencing at this step is strongly recommended to ensure that the new plasmid is indeed transformed. It may also be useful to know that the miRNAs in the template plasmids harbor uniquely occurring restriction sites – SacI in pRS300 (athMIR319a) and SphI in pNW55 (osa-MIR528) – which should (in most cases) be eliminated after successful PCR mutagenesis. 3.3.2. Sub-cloning into Binary Plasmids
aMIRNA precursors that are generated by site-directed mutagenesis do not contain a promoter or terminator; both need to be added by subsequent sub-cloning steps. For functionality tests in planta and initial characterizations, strong ubiquitous promoters such as cauliflower mosaic virus (35S) have been proven very helpful. More detailed analyses can be carried out with tissue-specific promoters, since amiRNAs function largely cell-autonomously (4). Because amiRNA-mediated gene silencing is quantitative (stronger promoters induce stronger effects), we do not recommend weak promoters when strongly expressed genes should be silenced efficiently. They might, however, become useful when partial silencing is intended. Inducible and transient aMIRNA expression was successful with ethanol and estrogen inducible systems (4, 13). Promoters are often already contained in binary vectors, or are to be inserted with standard cloning techniques. We did not observe remarkable differences in phenotypic effects with different binary plasmids in A. thaliana, and therefore recommend using a plasmid system that is well-established in the respective plant system.
Directed Gene Silencing with Artificial MicroRNAs
83
All restriction sites of the pBluescript Multiple Cloning Sites flanking the aMIRNA precursor in the fusion PCR product (d) can be used to excise the amiRNA precursor (the aMIRNA transcript) from the sequencing plasmid. We frequently use EcoRI and BamHI for the ath-MIR319a backbone, but other enzymes can be used as well. It is, however, necessary to preserve the direction of the aMIRNA precursor, since anti-sense transcripts are not expected to form the same secondary structure. Gatewayassisted cloning is also possible, since the presence of AttB sites adjacent to the amiRNA precursor does not seem to affect its processing (see Note 5). 3.4. Plant Transformation and Analysis of Transgenic Plants 3.4.1. Transformation of Agrobacterium and Plants
3.4.2. Reduced Abundance of Target Transcripts
Most protocols for the generation of transgenic plants rely on an Agrobacterium strain delivering the above-described binary plasmid. Transformation of competent strains (e.g., GV3101 for A. thaliana, LBA4404 or EHA105 for O. sativa) is carried out with standard transformation protocols. Similarly, transfection of plants with the transgenic Agro bacterium strains should be carried out with established protocols, and primary transformants require selection with appropriate selection markers. The observation of phenotypic variation in primary transformants is expected and this might, in some cases, resemble an allelic series of the respective mutant. Gene silencing with transgenes is, in many cases, not complete such that plants resembling null mutants of the respective target gene might not be recovered. See Note 6 in case you do not observe phenotypic changes in primary transformants. To confirm that phenotypic changes are indeed due to reduced abundance of the intended target gene product(s), their levels should be analyzed in pools of primary transformants with similar phenotypes, or in individual plants, and compared to an untransformed or empty-plasmid-transformed control. If available, estimating target protein levels with specific antibodies should be the method of choice. Since plant amiRNAs, like many endogenous miRNAs, typically also affect the accumulation of target mRNA, RT-PCR is often indicative of successful gene silencing. RNA is preferentially isolated from tissues with strong phenotypic effects, either with commercial kits or with TRIzol® reagent (Invitrogen). Commercial reverse transcription kits can be used for cDNA synthesis. RT-PCR products preferentially span the amiRNA-guided cleavage site. See Note 7 when you observe phenotypic abnormalities, but no change in target mRNA levels. To estimate the specificity of gene silencing, it is recommended to also test for the accumulation of closely related transcripts, which contain regions of partial sequence complementarity to the amiRNA (five or fewer mismatches, determined with the WMD-Target search tool; see Note 2). Reduced levels can be the
84
Schwab et al.
result of direct amiRNA targeting, but also of feedback regulation when the two genes participate in the same genetic pathway. To discriminate between the two possibilities, it is necessary to specifically test for the accumulation of cleaved targets by 5¢ RACEPCR, since (a)miRNAs trigger the cleavage of target transcripts – always opposite of positions 10 and 11 of the amiRNA. 3.4.3. Cleavage Site Mapping by RACE-PCR
RACE-PCR typically uses mRNA as a starting material, which can be isolated from total RNA with commercial kits. Standard protocols for 5¢RACE typically start by de-capping full-length mRNAs, whereas this step is omitted for cleavage product detection, and mRNA is directly ligated to the RNA linker oligonucleotide. Reverse transcription is typically carried out with an oligo-dT primer. PCR amplification of cleavage products uses forward oligonucleotides that bind the introduced linker sequence and genespecific reverse oligonucleotides complementary to a region ~200 to 300 nucleotides downstream of the putative amiRNA binding site in the gene of interest. The abundance of cleavage product can be very low, and sometimes a second, nested PCR may be necessary. Amplified products should be ligated into standard cloning vectors and sequenced to determine where the linker had been ligated and hence where the target transcript had been cleaved. Cleavage is predicted to occur at the amiRNA binding site between the two base pairs opposing positions 10 and 11 of the amiRNA.
3.4.4. Genetic Complementation
Since target sites of amiRNAs are small and distinct, it is possible to engineer silent mutations in this region of the target gene, such that the transcript is no longer susceptible to amiRNA-mediated regulation. Introducing this transgene under its endogenous or a stronger promoter should suppress the amiRNA-induced phenotypes. This approach has successfully been used to bypass regulation by endogenous miRNAs (14), and it can provide powerful evidence that the observed phenotypes are caused only by downregulation of the intended target and not by other genes. Silent mutations are typically introduced in as many positions as possible within the amiRNA binding site by PCR-based site-directed mutagenesis, in a similar way as aMIRNAs are produced (3.2.3).
4. Notes 1. When the plant species of interest is not yet included in WMD, but significant sequence information is publicly available, please contact
[email protected] to have the species added to the tool. Obviously, the specificity calculations can only take the available set of sequences into account, so there is always the possibility that amiRNAs affect additional genes
Directed Gene Silencing with Artificial MicroRNAs
85
that are not annotated or only partially annotated in the current sequence release of the respective species. 2. The WMD-Target Search application rapidly identifies target genes for miRNAs and other small RNAs in a given transcript collection/genome annotation. It uses a sequence matching algorithm, based on enhanced suffix arrays (http://vmatch.de (15), and enables the identification of all genes in the collection with a defined number of mismatches to the search sequence. In addition, the WMD-Target Search applies the empirically determined parameters of miRNA target selection (5) to filter for putative target genes. The output includes an alignment of the small RNA (reverse complement) to putative targets as illustrated in Fig. 2. With default settings, WMD-Target Search output lists only one splice form per gene. All splice forms are displayed when the splice form filter is disabled (“Show only one isoform” in Advanced Search Options). This option should be used to examine whether all splice forms are targeted. 3. Failure of WMD to produce suitable aMIRNAs can have several reasons: (a) The input sequence might have been too short to contain suitable target sites. (b) The WMD-Designer is not able to compute a specific amiRNA against a target gene of interest if its nucleotide sequence is very similar to one or several other genes at all potential target sites. The similar gene(s) can be easily identified using WMD-BLAST, and one or several targets will have to be silenced together by adding them as additional targets or as “accepted off-targets.” It might still be possible to conduct conclusive experiments by choosing off-target(s) that do not interfere with the experimental design, or by evaluating the effects of several amiRNA constructs in planta with different off-targets. (c) Some transcript collections contain redundant ESTs, and multiple ESTs might span the locus of interest. Here, all genes/ESTs that are highly related to the gene of interest should be identified with WMD-BLAST and included in the WMD-Designer input as “accepted off-targets.” (d) WMD can only compute a multi-gene amiRNA that targets several genes if they share regions of high nucleotide sequence similarity. Simultaneous silencing of multiple related genes might fail if the genes are not similar enough, or one or more is/are not different enough from other genes (see Note 2). Try to reduce the minimal number of included target genes or silence them individually. 4. A. thaliana MIRNA precursors have successfully been used for amiRNA production in other plants (tomato, tobacco, and
86
Schwab et al.
Physcomitrella, (8, 26)), but precursor functionality across species has not yet been systematically investigated. Therefore, adapting the cloning protocol to MIRNA precursors endogenous to the respective plant species of interest might be the optimal approach. MIRNA precursors have been identified and characterized in several different plants (see miRBase, http://microrna.sanger.ac.uk/, (16), often by homology to known miRNAs. As backbones for amiRNA production, we recommend either using a precursor that has been shown to be expressed and processed, i.e., by northern blot, or – when this information is not available – using a highly conserved precursor, e.g., MIR164 or MIR319. Oligonucleotides I through IV will need to be adapted to reconstruct the proper hairpin structure, such that bulges remain at their respective positions. 5. Cloning with the Gateway® technology seems to not interfere with amiRNA production. In the following, we list possible cloning strategies for using the Gateway® system: (a) The MIRNA precursor fragment can be excised from the sequencing plasmid (3.3.1) with restriction enzymes and ligated into a Gateway® entry plasmid. (b) The fusion product (d) in 3.2.3. can be ligated into a Gateway® entry plasmid as it is. (c) Alternatively, the fusion PCR of the fragments (a), (b), and (c) to (d) in 3.2.3 can be carried out with oligonucleotides that already contain AttB sites at their 5¢ ends. These primers do not necessarily need to bind to the primer binding sites A and B. Primers that bind to sequences in the Multiple Cloning Sites have successfully been used to obtain a short insert and to eliminate undesired restriction sites. The resulting PCR product with AttB sites at both ends can then be ligated into a vector of choice (e.g., pGEM T easy), which then serves as the entry plasmid for a subsequent recombination reaction. 6. Missing phenotypic changes in transgenic plants that (over) express amiRNAs can have several reasons: (a) The phenotypes might not be detectable in the growth conditions tested. (b) The loss-of-function phenotype of the gene of interest might be masked by redundancy. A search for related genes with similar expression patterns (see, e.g., AtGenExpress platform for A. thaliana; http://jsp.weigelworld.org/ expviz/expviz.jsp, (17) might help to identify potentially redundant genes to be used as additional targets. (c) The target gene might not be sufficiently downregulated to detect phenotypic changes. It is critical to achieve
Directed Gene Silencing with Artificial MicroRNAs
87
high amiRNA expression in the tissue(s) of target gene expressions, but even promoters such as the one from the CaMV35 gene are not entirely ubiquitous. A fraction of A. thaliana amiRNAs generated to date (20–25%) does not silence the intended target gene(s), but the reasons are yet to be determined. It is possible that their target sites are not accessible due to extensive local secondary structures, similar to what has been observed for siRNAs in animal systems (19). Ongoing studies address this question, and we are planning to account for this effect by integrating novel tools such as “RNAup” (20) into WMD. When calculating RNA-RNA binding, “RNAup” also considers the folding of the respective RNA molecules to themselves, and can therefore be used to predict the accessibility of the target sites in the target mRNA. At present, we recommend constructing at least two amiRNAs per target gene/group of target genes, with target sites located in different regions of the target transcript(s). (d) If even very potent amiRNAs cause only small effects on the transcript levels, the target genes might be under negative feedback regulation. Those genes may be silenced effectively on a transcriptional level (e.g., by promoter methylation, (18), however not by post-transcriptional gene silencing. 7. Typical (a)miRNA-mediated gene silencing includes the cleavage of target transcripts followed by degradation of the cleavage products, leading to a reduction in transcript abundance, which can be measured by RT-PCR. However, for some endogenous miRNAs, e.g., ath-miR172, translational inhibition is at least as important as miRNA-guided cleavage (21, 22, 27). Thus, phenotypic changes can be present even though mRNA levels might not have appreciably changed. When available, translational effects can be monitored on the protein level, by western blotting with target-specific antibodies. In many published cases, transcripts that were regulated on the translational level were still cleaved by the miRNAs (5, 21–23), and cleavage products were detected by RACE-PCR (see Subheading 3.4.3).
Acknowledgments We thank Markus Riester for his contributions to earlier versions of WMD, Alexis Maizel, Javier Palatnik, Heike Wollmann, and Wolfgang Busch for discussion and sharing technical expertise, and Peter Bommert for comments on the manuscript. Work on small RNAs in the Weigel laboratory is supported by European Community FP6 IP SIROCCO (contract LSHG-CT-2006-037900) and by the Max Planck Society. R.S. is supported by an EMBO Long-term fellowship.
88
Schwab et al.
References 1. Chapman EJ, Carrington JC (2007) Specia lization and evolution of endogenous small RNA pathways. Nat Rev Genet 8:884–896 2. Tang G, Galili G, Zhuang X (2007) RNAi and microRNA: breakthrough technologies for the improvement of plant nutritional value and metabolic engineering. Metabolomics 3:357–369 3. Allen E, Xie Z, Gustafson AM, Carrington JC (2005) MicroRNA-directed phasing during trans-acting siRNA biogenesis in plants. Cell 121:207–221 4. Schwab R, Ossowski S, Riester M, Warthmann N, Weigel D (2006) Highly specific gene silencing by artificial microRNAs in Arabidopsis. Plant Cell 18:1121–1133 5. Schwab R, Palatnik JF, Riester M, Schommer C, Schmid M, Weigel D (2005) Specific effects of microRNAs on the plant transcriptome. Dev Cell 8:517–527 6. Ossowski O, Schwab R, Weigel D (2008) Gene silencing in plants using artificial microRNAs and other small RNAs. Plant J 53(4):674–690 7. Vaucheret H (2005) MicroRNA-dependent trans-acting siRNA production. Sci STKE 2005:pe43 8. Alvarez JP, Pekker I, Goldshmidt A, Blum E, Amsellem Z, Eshed Y (2006) Endogenous and synthetic microRNAs stimulate simultaneous, efficient, and localized regulation of multiple targets in diverse species. Plant Cell 18:1134–1151 9. Niu QW, Lin SS, Reyes JL et al (2006) Expression of artificial microRNAs in transgenic Arabidopsis thaliana confers virus resistance. Nat Biotechnol 24:1420–1428 10. Parizotto EA, Dunoyer P, Rahm N, Himber C, Voinnet O (2004) In vivo investigation of the transcription, processing, endonucleolytic activity, and functional relevance of the spatial distribution of a plant miRNA. Genes Dev 18:2237–2242 11. Qu J, Ye J, Fang R (2007) Artificial miRNAmediated virus resistance in plants. J Virol 81(12):6690–6699 12. Warthmann N, Chen H, Ossowski O, Weigel D, Hervé P. Highly Specific Gene Silencing by Artificial miRNAs in Rice. submitted. 13. Michniewicz M, Zago MK, Abas L et al (2007) Antagonistic regulation of PIN phosphorylation by PP2A and PINOID directs auxin flux. Cell 130:1044–1056 14. Palatnik JF, Allen E, Wu X et al (2003) Control of leaf morphogenesis by microRNAs. Nature 425:257–263
15. Abouelhoda MI, Kurtz S, Ohlebusch E (2004) Replacing suffix trees with enhanced suffix arrays. Journal of Discrete Algorithms 2:53–86 16. Griffiths-Jones S (2004) The microRNA Registry. Nucleic Acids Res 32:D109–D111 17. Schmid M, Davison TS, Henz SR et al (2005) A gene expression map of Arabidopsis thaliana development. Nat Genet 37:501–506 18. Matzke M, Kanno T, Huettel B, Daxinger L, Matzke AJ (2006) RNA-directed DNA methylation and pol IVb in Arabidopsis. Cold Spring Harb Symp Quant Biol 71:449–459 19. Ameres SL, Martinez J, Schroeder R (2007) Molecular basis for target RNA recognition and cleavage by human RISC. Cell 130:101–112 20. Mückstein U, Tafer H, Hackermüller J, Bernhart SH, Stadler PF, Hofacker IL (2006) Thermodynamics of RNA-RNA binding. Bioinformatics 22:1177–1182 21. Aukerman MJ, Sakai H (2003) Regulation of flowering time and floral organ identity by a microRNA and its APETALA2-like target genes. Plant Cell 15:2730–2741 22. Chen X (2004) A microRNA as a translational repressor of APETALA2 in Arabidopsis flower development. Science 303:2022–2025 23. Gandikota M, Birkenbihl RP, Hohmann S, Cardon GH, Saedler H, Huijser P (2007) The miRNA156/157 recognition element in the 3¢ UTR of the Arabidopsis SBP box gene SPL3 prevents early flowering by translational inhibition in seedlings. Plant J 49:683–693 24. Tao Zhao, Wei Wang, Xue Bai and Yijun Qi; Gene silencing by artificial microRNAs in Chlamydomonas; The plant Journal, 2009 25. Attila Molnar, Andrew Bassett, Eva Thuenemann, Frank Schwach, Shantanu Karkare, Stephan Ossowski, Detlef Weigel and David Baulcombe; Highly specific gene silencing by artificial microRNAs in the unicellular alga Chlamydomonas reinhardtii; The Plant Journal, 2009 26. Basel Khraiwesh, Stephan Ossowski, Detlef Weigel, Ralf Reski, and Wolfgang Frank; Specific gene silencing by artificial MicroRNAs in Physcomitrella patens: an alternative to targeted gene knockouts. The Plant Journal, 2008. 27. Brodersen P, Sakvarelidze-Achard L, BruunRasmussen M, Dunoyer P, Yamamoto YY, Sieburth L, Voinnet O, Widespread translational inhibition by plant miRNAs and siRNAs., Science. 2008 May 30;320(5880):1185–90
Chapter 7 Bioinformatics Analysis of Small RNAs in Plants Using Next Generation Sequencing Technologies Kan Nobuta, Kevin McCormick, Mayumi Nakano, and Blake C. Meyers Abstract Next-generation sequencing technologies have a substantial impact on a broad range of biological applications. Like many other groups, we use these new technologies, especially SBS (Sequence-By-Synthesis), for deep profiling of small RNA molecules in plants. Small RNAs are 21–24 nucleotides in length and are known to play a major role in the activation of mRNAs and genomic DNAs. We have generated numerous SBS small RNA libraries; each can consist of more than three million signatures of more than 33 nucleotides in length. Here, we describe the challenges and our strategies to handle the very large quantity of small RNA data generated by these next-generation sequencing technologies. Key words: SBS, MPSS, 454, Gene expression, Small RNA, miRNA, siRNA, Signature
1. Introduction 1.1. Small RNA Molecules
The discovery of RNA-based silencing systems has changed our understanding of mechanisms of transcription, translation, and the regulation of gene expression. There are two major classes of these RNA molecules in plants: microRNAs (miRNAs) and small interfering RNAs (siRNAs), which are approximately 21–24 nucleotides in length. In plants, miRNAs are known to regulate mRNAs primarily at the posttranscriptional level by directing mRNA cleavage, while endogenous siRNAs can trigger DNA methylation and histone modifications, leading to gene silencing. Small RNAs have been identified in nearly all eukaryotes, an indication of both the importance and the ancient origin of these regulatory molecules. The pool of small RNAs in plants is highly complex, consisting primarily of a diverse set of low-abundance siRNAs (1). Extensive
B.C. Meyers and P.J. Green (eds.), Plant MicroRNAs, Methods in Molecular Biology, vol. 592 DOI 10.1007/978-1-60327-005-2_7, © Humana Press, a part of Springer Science + Business Media, LLC 2010
89
90
Nobuta et al.
work on the relatively small number of Arabidopsis miRNAs has demonstrated that these RNAs regulate the expression of many genes that play key roles in development and stress responses (2). 1.2. Next Generation Sequencing Technologies
The technology to sequence a large number of DNA molecules in a short period of time and in a cost-effective manner has developed rapidly in the last few years. Massively Parallel Signature Sequencing (MPSS) was the first methodology of this type (although the “cost-effective manner” of that method is debatable), and we have used this technology for both mRNA and small RNA transcriptome profiling (1, 3). Although MPSS generated hundreds of thousands to millions of signatures per library, the sequence length was limited to 17–20 bases (3). Since the length of small RNA molecules is primarily 21–24 bases in plants, MPSS signatures were not able to capture the information of full-length miRNA and siRNA sequences. Unlike MPSS, the sequencing length of the pyrosequencing-based technology from 454 Life Sciences can exceed 100 bases, which is more than long enough to obtain the full-length sequence of small RNA molecules (http://www.454.com). However, the sampling depth of this technology, which recently measured in the hundreds of thousands of reads, is not enough to capture the full complexity of small RNA molecules, particularly in comparison to other, newer technologies. Illumina (http://www.illumina.com) developed a new sequencing technology often referred to by the generic name, sequencing-by-synthesis, or “SBS.” Illumina’s approach uses reversible nucleotide terminators to sequence short DNA fragments (signatures). Although the sequence length of SBS is currently ~33–35 bases and hence not as long as that of 454, it is long enough to capture the full-length sequence of the small RNA molecules. More importantly, SBS can sequence millions of signatures in parallel. This capacity is deep enough to capture almost all the small RNA molecules in a given sample. There are a number of potentially competing technologies emerging, with Applied Biosystems leading the pack with their “SOLiD” sequencing device; however, these machines are only now being put to routine use in the laboratory. In any case, the methods that we describe here are easily adapted for any high through-put shortread DNA sequencing technology.
1.3. SBS Small RNA Expression Database
We have generated numerous small RNA libraries from various organisms, with the libraries sequenced using MPSS, 454, and SBS. Here, as an example, we mainly focus on the databases we created for Arabidopsis and rice small RNA libraries sequenced with Illumina’s SBS technology. The SBS signatures were handled as we describe below and stored in a relational database.
1.3.1. Trimming and Mapping
In order to extract the biologically relevant data from the raw SBS output, it is important to understand how the libraries are constructed.
Bioinformatics Analysis of Small RNAs in Plants
91
The details of the small RNA library construction method are described in Chapter 8 of this volume. Briefly, RNA molecules in the approximate size range of 20–25 nucleotides are recovered from a sample of total RNA; after ligating 5¢ and 3¢ RNA adapters and making cDNA from this, the product is subjected to SBS sequencing. The sequencing reaction reads not only the small RNA molecule, which is flanked by the two adapters (5¢ and 3¢), but also the adapter sequences. Therefore, all the potential adapter sequences have to be removed from the SBS signatures. We look for both 5¢ and 3¢ adapter sequences with up to two mismatches to allow for sequencing errors and reduce the number of untrimmed sequences. We have used the known microRNA sequences, obtained from Sanger registry (4), as a training set, and we optimized the minimum and maximum of bases in the adapter sequences to consider for trimming. As an example, Fig. 1 shows a typical trimming result from an SBS library. As expected, the majority of the signatures had a 3¢ adapter sequence, and more than 90% of the sequences had a relatively long insert (e.g., >18 bp), which was kept for genome matching (Fig. 1). In contrast, a small number of signatures had a 5¢ adapter sequence, and almost all of them were not usable (Fig. 1). After trimming, we map all the SBS signatures to a genomic sequence. Since both Arabidopsis (Col-0) and rice (Oryza sativa ssp. japonica cv. Nipponbare) genomic sequences have very high quality, and materials derived from either of these inbred genotypes are usually highly homozygous, we consider only perfect matches.
no adaptor 27,164
5’ adaptor trimmed 2,192
error 6,818 5’ adaptor 248,222
Orphan signatures 706,344
trimming
SBS library example 3,170,342 3’ adaptor 2,888,138
mapping Insert < 18 bp Match to genome 2,030,543 3’ adaptor trimmed 2,734,695
Fig. 1. An example of trimming and mapping results with an SBS small RNA library: This specific SBS library had 3,170,342 distinct signatures. As expected, our trimming script identified a high proportion of signatures with 3¢ adaptor sequences and fewer than 10% of the signatures with 5¢ adaptor sequence. A very small portion of the signatures had no adaptor sequence or signatures with nonGATC characters (errors). Among the signatures with a 3¢ adaptor sequence, a large proportion of them had a long insert (>18 bp), while fewer than 1% of those with a 5¢ adaptor sequence passed this filter. More than two million trimmed signatures were mapped successfully to the rice genome.
92
Nobuta et al.
The script scans through the chromosomal sequence from the beginning to the end, performing a simple string search. With the same rice example above, we were able to map more than two million SBS signatures (18–33 bp) on the genome (Fig. 1). The remaining signatures (~700,000 “orphans”) could be the result of mismatches, RNA editing, sequences spanning intron/ exon junctions, or derived from gaps in the rice genomic sequence. The details of the trimming and mapping algorithms are described below (see Subheadings 3.1 and 3.3). 1.3.2. Associations with Annotated Genes and Other Genomic Sequences
Plant small RNA libraries represent a complex mixture of several types of molecules, including the following: highly abundant miRNAs that are few in number; weakly expressed siRNAs that number in the millions in terms of sequence diversity; a large number of “contaminating” tRNAs, rRNAs, snRNAs and snoRNAs; and other small classes of small RNAs like trans-acting siRNAs, or natural-antisense transcript siRNAs that we won’t describe here. The tRNAs, rRNAs, snRNAs, and snoRNAs are all major components of small RNA libraries. Since small RNAs derived from these RNAs share characteristics with miRNAs, like high abundances and representation in most libraries, it is important to flag them for removal from later analyses. In order to associate signatures with annotated genes, we first map them to the genome and store all coordinates (chromosome, position, and strand) in a table. Then, we run a gene mapping script that, for each gene, identifies all signatures between its starting and ending coordinates. If a signature is within a “contaminating” sequence, we mark it with a flag, which will be pertinent later on.
1.3.3. Normalization and Summary Table
The total number of signatures sequenced by SBS differs from library to library. In order to compare the expression level of a particular signature across the libraries, the abundance values must be normalized. This calculation is performed for each signature and for each library, based on the total abundance of the libraries. We select a round number that is close to the total number of sequences as the basis for normalization. However, since not all the signatures match to the genomic sequence (presumably, primarily due to errors or gaps in the sequence), we exclude these signatures from the calculation of the total used for normalization. In addition, we exclude the “flagged” signatures described above (t/r/sn/snoRNAs). We then subtract the abundance of these signatures (flagged and non-matching) from the total raw abundance to determine the adjusted total raw abundance. In order to obtain the normalized value for each small RNA, each raw value is divided by the adjusted total raw abundance and multiplied by a normalization factor that is close to the adjusted total raw abundance but is a round number that is typically a multiple of a million or half a million. These normalized values allow us to accurately compare
Bioinformatics Analysis of Small RNAs in Plants
93
the expression levels of signatures across libraries. It is possible to further normalize the expression levels by dividing the normalized value by the total number of locations to which a tag matches; we refer to this as the “hits-normalized abundance” (HNA). 1.4. Genomic Database and Web Interface
Many small RNA analyses are based on comparisons between annotated genes and features of the genome. We use the most recent annotations of Arabidopsis (TAIR7.0) or rice (TIGR5.0) genomes in our analyses, but any sequenced plant genome could be used. We run an XML parser, which is written in Java with SAX, and extract all the necessary information (e.g., gene coordinates, genomic sequence, etc.) from these files. These sets of information are necessary as described above when we associate signatures to annotated genes. After extracting the genomic sequence from the files, we run the programs RepeatMasker (5), and Einverted/ Etandem (6), to identify the repetitive regions in the genome. These data are necessary to construct the small RNA database, flag specific types of small RNAs (mentioned above), and to display SBS data through our web interface (http://mpss.udel.edu). We have adapted for SBS data the interface that we developed previously to display our MPSS data (7). However, we made minor adjustments to accommodate the large data size of SBS. For example, the number of signatures displayed in a certain window is restricted to human readable size instead of displaying all of them at once. The details of our interface and the available tools can be found elsewhere (7).
2. Materials Our servers use Linux 2.6 as an OS and have at least 8 GB of memory with multiple Intel Xeon processors with more than 2 GHz of CPU speed. We took advantage of these configurations and developed various scripts (Perl and Java), which require large memory space but run in a relatively short period. The scripts extract expression data from the sequencing results and store them in a MySQL server as a “Small RNA Expression Database.” Similarly, we extract genomic information from the files provided by various institutes and store them in a MySQL server as a “Genomic Database.” The web interface, mainly written in PHP, extracts the data requested by the user from these two databases and displays the query results in graphical and analytical outputs. 2.1. Small RNA Expression Database
1. As described in chapter 8, small RNAs can be cloned and sequenced using high-throughput parallel methods like SBS, and we have generated numerous small RNA libraries from diverse plant species. A typical library consists of nonredundant signature sequences and corresponding abundance values.
94
Nobuta et al.
These raw data, as well as the data with normalized values, are downloadable from our website for the users who want to perform their own analyses. 2. In addition to the small RNA libraries generated in our lab, we have downloaded libraries that are publicly available from GEO (Gene Expression Omnibus). At the time of the preparation of this manuscript, our Arabidopsis and rice small RNA databases have numerous libraries that were sequenced by MPSS, 454, or SBS. We developed a package of scripts to parse these data, extract the necessary information, and build MySQL small RNA databases. 3. In addition to the signature sequences and their expression levels, the details of each library are stored in this database. Users can find information such as the library developer, the developmental stage and the condition of the plants that were used for small RNA extraction, etc. Furthermore, we record the number of signatures that match to the given genome as well as to tRNA, rRNA, snRNA, and snoRNA genes. These sets of information are useful to determine the quality of the libraries. 2.2. Genomic Database and Web Interface
1. We use genomic assemblies (pseudochromosomes) provided by public genome projects, as well as annotation files provided by such groups. The Arabidopsis and rice annotations can be downloaded in TIGR XML format from TAIR (The Arabidopsis Information resource: http://arabidopsis.org/) and Michigan State University (MSU, http://rice.plantbiology.msu.edu/), respectively. Our Java XML parser extracts necessary information and stores them in relational database such as MySQL. 2. Although the XML files from TAIR and TIGR contain almost all the information necessary for our analyses, we download additional genomic information from other providers, and those data come in a variety of file formats. For example, we obtain miRNA information in gff file format from the Sanger registry (http://microrna.sanger.ac.uk/sequences/). We customize our scripts to parse different file formats and add the information to the genomic database. 3. In order to complete our genomic database, we perform additional analyses with the data extracted from TAIR, TIGR, and other institutes. For example, we run publicly available programs, such as RepeatMasker, Einverted, and Etandem, to identify repetitive sequences on the given genome and store the information in MySQL tables. We associate this information with expression data and distinguish miRNA from siRNA signatures. In addition, we run a miRNA target prediction program developed in our lab using previously described rules (8) to identify potential targets of all known and novel miRNAs.
Bioinformatics Analysis of Small RNAs in Plants
95
4. Based on the analyses above, users can access from our website a brief description of the small RNA MPSS data, links to known microRNAs, inverted repeats, tandem repeats, pericentromeric regions, weakly predicted transposons, and trans-acting siRNAs. Users can simply click the examples to get an idea of our visualization and data access tools. A schema of a database for storing and handling SBS small RNA data is shown in Fig. 2.
3. Methods In this section, we focus on three major steps in the handling of SBS small RNA libraries and the identification of miRNAs from these data: trimming and mapping of the data, updating the database, and identification of miRNAs. We use a rice small RNA data set as an example to describe these steps. 3.1. Trimming of SBS Small RNA Signatures
We start by scanning each raw sequence for both 5¢ and 3¢ adapter sequences. This script allows up to two mismatches to allow for sequencing errors and reduce the number of untrimmed sequences. After the adapters are removed, the identical sequences are collapsed into a single file of nonredundant sequences with an associated abundance. The trimming steps are described below. 1. The script takes a couple of required and optional arguments, so that the users can run the script with their preferred conditions. For example, one of the optional arguments is the number of mismatches. As a default setting, the script looks for 5¢ and/ or 3¢ adapter sequence with up to two mismatches in a given SBS signature. The script does not allow more than two mismatches but allows the users to change it to either zero or one mismatch. 2. The SBS sequencing result is one of the required fields. Many institutes now have an Illumina SBS machine and provide sequencing services. Depending on the institute, the sequencing results are delivered in a range of file formats. Our script assumes that the file consists of nonredundant sequences with an associated abundance in a particular format. If the given file is not in one of our “standard” formats, the file has to be reformatted to a standard format, such as the signature sequence and the associated abundance value delimited by tabs. 3. The adapter sequences are divided into three parts (start, middle, end) and are used as seed sequences. The length of the seed sequences is based on the minimum sequence that can be recognized as an adapter sequence. The script goes through the input file and creates three separate files: (1) signatures with potential 5¢
mRNA data run_master
run_info
expression_raw
run_id
run_id int tag_id int frequency int confidence int
run_id int tag_length int lib_id tinyint stepper int raw_sum int
run_id int tag_id int tag char raw_value int lib_id tinyint stepper int
run_id
Summary
lib_id
tag_id int tag char stepper_chosen int reliable int significant int lib_norm int hits tinyint chr_id tinyint strand enum(’w’,’c’) position int class tinyint gene char model int
library_info library_master tag_id
tag_id int tag char lib_id tinyint norm_2 norm_4
lib_id
lib_id int name char organism char variety char RNA_extract_method char sample_source char description text
library_detail lib_id
lib_id int tag_length int raw_stepper_2_sum double raw_stepper_4_sum double distinct_tags_cnt int total_tags_cnt int
tag_id
tag_master tag_id int tag char length int
Genomic data
tag_class tag_position_id int tag_id_17 int tag_17 char tag_id_20 int tag_20 char chr_id int strand enum('w','c') position int class char gene char model int exon_id int exon_position int
tag_position
tag_hits tag_id int hits int
tag_id
chr_inverted chr_id char * start1 int end1 int start2 int end2 int score int per tinyint gap int dis int avg_len int
tag_position_id int tag_id_17 int tag_17 char tag_id_20 int tag_20 char chr_id int strand enum('w','c') position int gene char model int
tag_position_id
chr_tandem chr_id char * start int* end int* len int score int size int count int per float
chr_repeats chr_id char * start int end int len int SW_score int repeat_id char * repeat_class char *
chr_id
gene
chr_id
gene_master
chromosome_master chr_id int * length int organism varchar lineage varchar seq_group varchar centromere_location int
chr_id
pep_blast
gene
gene varchar subject varchar e-val double
gene char chr_id int strand enum('w','c') coord_start int coord_end int model_cnt int title varchar
gene_position gene char model int exon_id int utr int coord_start int coord_end int
gene
Small RNA data run_master tag char raw_value int lib_id tinyint
tag
tag_position tag
tag char chr_id char strand enum('w','c') position int gene char
library_details lib_id int flag tinyint raw_sum int distinct_tags_cnt int
lib_id
lib_id
library_info
Summary tag char reliable int significant int lib_norm int hits tinyint
run_info lib_id int raw_sum int
lib_id
lib_id int name char organism char variety char RNA_extract_method char sample_source char description text
Schema Fig. 2. Schema of a database for storing and handling SBS small RNA data: The database is designed with three major sets of data. The lines connecting tables indicate one-to-one (simple lines) or one-to-many (branched lines) relationships. The field name above or below the lines indicates the key that connects the tables. Note that not all the fields and the tables are listed in this figure.
Bioinformatics Analysis of Small RNAs in Plants
97
adapter sequences, (2) signatures with potential 3¢ adapter sequences, and (3) signatures with no potential adapter sequence. Pseudocode: • For each SBS signature in sequencing result file: – Compare the three seed sequences of the 5¢ adapter sequence. (a) If it exists, record the following to a new file (file1): signature sequence, associated abundance, and which seed sequence matched to the query sequence. (b) Else, compare the seed sequences of the 3¢ adapter sequence. * If it exists, record the same information as above to a separate file (file2). * Else, record the rest to another file (file3). 4. We next examine file1 to determine if the 5¢ adapter sequence is contained in the SBS signature. Depending on whether the 5¢ adapter sequence is remaining in the SBS signature, we do one of the following with the signature sequence and associated abundance count: (1) keep these data as well as the length of contaminated adapter sequence in a data structure, (2) record them in a file, or (3) send them to an error file. Pseudocode: • For each SBS signature with 5¢ “start” seed sequence, examine the downstream sequence of this seed sequence: – If the downstream sequence matches upstream of the 5¢ “mid” seed sequence (with or without mismatches), send to error file. – Else, record the signature and the count to a new file (file4). • For each SBS signature with 5¢ “end” seed sequence, examine the upstream sequence of this seed sequence: – If the upstream sequence matches downstream of the 5¢ “mid” seed sequence (with or without mismatches), examine the remaining length. (a) If the length is longer than 18 (by default), record the information in hash-of-array (comboHoA): SBS signature → (length of adapter, count). – Else, record the signature and the count to a new file (file4). • For each SBS signature with 5¢ “mid” seed sequence, examine the sequences both upstream and downstream of this seed sequence: – If the upstream and downstream sequences match to the 5¢ “start” and “end” seed sequence (with or without mismatches), respectively, examine the remaining length and either record the information in the hash-of-array or send to the error file, as above.
98
Nobuta et al.
– Else, record the signature and the count to a new file (file4). 5. Examine the signatures in file3 and look for potential 5¢ adapter seed sequences with mismatches. Treat the signature sequences and the associated counts in the same way as above (step 4) based on the location of the identified 5¢ adapter sequence in the SBS signature. 6. Examine file2 and determine the 3¢ adapter sequence contaminated in the SBS signature. A similar algorithm as step 4 was applied here, except the “start” and “end” seed sequences were treated in an opposite way. For example, in this step, signatures with 3¢ adapter “end” and part of “mid” seed sequences were sent to the error file, while in step 4, signatures with 5¢ adapter “start” and part of “mid” seed sequences were sent to the error file. The reason behind this is that, most likely, the signatures with the end of the 3¢ adapter and the start of 5¢ adapter sequence have no or very short inserts between the two adapters. 7. Another difference between step 4 (5¢ adapter trimming) and step 6 (3¢ adapter trimming) is that the signatures that satisfy the conditions in step 6 are trimmed and kept in the final hash table (allH) and not in comboHoA. Pseudocode: • For each SBS signature with 3¢ adapter sequence with up to two mismatches: – If the length of remaining sequence is longer than 18 (by default), trim the adapter sequence off from the signature and record the information in allH. (a) If the trimmed signature exists in allH, add the new count to the old one. (b) Else, add trimmed signature as a key and the count as a value to allH. 8. The signatures in file4 that were generated during step 4 have no identifiable 5¢ adapter sequence. Each signature is subjected to the 3¢ adapter trimming (steps 6 and 7), and all the resulting data are stored either in allH or the error file. Similarly, the signatures in the comboHoA are subjected to the same steps. If the remaining sequence, after trimming 5¢ adapter sequence and 3¢ adapter sequence (if they exist), is longer than 18 bases, the sequence is kept in allH. 9. In the final step, the script parses allH and, after final adjustment, it prints out the trimmed results with associated abundance to a final file. Pseudocode: • For each key (trimmed signature) and the value (summed abundance value):
Bioinformatics Analysis of Small RNAs in Plants
99
– If the signature has more than 75% A’s, send to the error file (as SBS produces A-rich sequences that are noise and not real), otherwise: (a) If the length of the signature is full-length (no identifiable 5¢ or 3¢ adapter), * Examine the last three bases of the signature, and if it ends with one of TCG, TC, or T, trim if off and print out to the final file. Otherwise, simply print out the untrimmed signature and the abundance value. (b) Else, print out the key and the value to the final file. 3.2. Preparing a Genome Database for the Small RNA Data
We make our data visible via a customized web interface. This visualization tool makes the data more accessible to biologists without computational or bioinformatics skills and improves the interpretability of these data. The detail of the construction of the genomic database can be found elsewhere (7). Here, we focus on the genomic information in the database that is specific for small RNA data analyses. 1. The majority of the information is derived from XML files provided by TIGR and TAIR. One of the most important sets of information is the coordinate of the annotated genes. These sets of information allow us to draw the gene structure and the associated small RNA signatures on our web interface. 2. Another important piece of information is the type of annotated genes. Currently, we categorize the genes into eight classes: protein coding gene, pseudogene, TE (Transposable Element), miRNA, tRNA, sRNA, snRNA, and snoRNA. All the classes, especially the last four classes (t/r/sn/snoRNA genes), are particularly important for small RNA analyses. Although almost all the information of these gene classes can be found in the XML files, some of them are not included in these files. For example, we obtained Indica rice snoRNA information (9) and utilized that information after mapping the snoRNA sequences to the Nipponbare genome. Similarly, we regularly obtain the most updated miRNA information from Sanger registry and manually add them to our existing genome database. 3. The t/r/sn/snoRNAs genes are structural RNAs that are constitutively expressed at high levels in many organisms, and thousands of signatures from each library correspond to these genes. Since most likely these signatures are degradation products of t/r/sn/snoRNAs and not regulatory small RNA molecules such as miRNA and siRNAs, we “flag” the t/r/sn/ snoRNA-matching signatures to allow us to easily exclude them from small RNA analyses.
100
Nobuta et al.
4. Another critical piece of information for the genomic database is the repetitive sequence information. Since repetitive sequences are known to be sources of siRNAs, this information helps us to distinguish miRNA signatures from siRNA signatures. Although TAIR and TIGR provide annotated TE information, we use freely available computer programs, such as RepeatMasker (http://repeatmasker.org) (5), Etandem, and Einverted (6), and extract the coordinates of different kinds of repetitive sequences as well as those with low cut-off values (i.e., poorly predicted or poorly annotated repeats) from the genome sequence. The coordinates of the repeat regions and the type of repeats are stored in the MySQL tables and are displayed on our web interface. 3.3. Mapping of Small RNA Sequences to a Genome and Association to Annotated Genes
To facilitate analyses of the data for plant species for which a genome is available, the small RNA sequences are mapped to the genome. We use and record only perfectly matching sequences to avoid complications of small RNAs matching to multiple closelyrelated sequences. There are two important considerations in this step: (1) sequences may match to more than one location in the genome, and may have originated from a subset of those locations; (2) sequences with errors will be ignored in future analyses. We do not consider point #2 to be important, since the depth of these libraries is already substantial. 1. The mapping script performs a simple string search from the start to the end of the given nucleotide (genomic) sequence, which is converted to an array. Since the length of the strings (small RNA signatures) varies from 18 bases to roughly 33 bases, the script splits each signature into a ‘head’ (18 bases) and the ‘tail’ (the remaining) part, and then stores the information in a hash-of-array with the head sequence as the key and the rest of the sequences in an array format. The script slides a ~33 base (full-length) window through the genomic sequence array, and if the head part of this sequence exists in the hash-of-array, the script then compares rest of the sequences. If the entire (head + tail) sequence matches to the genomic sequence, the script records the coordinates (index + 1). Pseudocode: • For each ~33 base (full-length) sequence in the given nucleotide sequence: – If the head part of this sequence exists in hash-of-array, get the ‘tail’ sequences in an array: (a) For each ‘tail’ sequence in the array, construct the ‘head + tail’ sequence. * If the ‘full-length’ starts with this sequence, record the index of genomic sequence array.
Bioinformatics Analysis of Small RNAs in Plants
101
2. Since all the signatures are stored in a hash-of-array, the script requires a relatively large amount of memory. In addition, an array with the genomic sequence requires large memory size as well. However, depending on the available memory space, users can split the genomic sequences into pieces and identify the coordinates of the signatures. After identifying the coordinates on the reading strand, the script goes through similar steps with the reverse complement strand. 3. After mapping the signatures, the coordinates are compared to the information stored in the genome database. This comparison allows us to associate the location of the signatures relative to the annotated genes. With these data, we can identify the origin of the signatures. For example, the information of the signatures that originated from t/r/sn/snoRNA genes are important for our normalization procedure, which is described below. 3.4. Updating the Database with New Libraries
As the cost of SBS sequencing goes down, generating SBS libraries become more affordable, and the frequency increases with which updates are made to the database with new libraries. Since an average small RNA SBS library consists of millions of distinct signatures and associated abundance values, the updating step is time-consuming and requires large amounts of computational power. There are multiple tables that are affected with the arrival of each new library. Here, we focus on the update of the table that holds almost all the information necessary for small RNA analyses. 1. This table consists of SBS signatures as a primary key, the number of hits to the genome, a flag to indicate if the signature originated from a t/r/sn/snoRNA gene, and the normalized abundance level identified in each library. Pseudocode: • For each new library, alter the table and add a new column that stores the normalized abundance level of each signature. – If the signature exists in the table, normalize the raw abundance value and record. – Else, insert the new signature and the normalized abundance value. The rest of the libraries are assigned an abundance value of zero. 2. The basic normalization formula is described above. Also, noted above is the important point that since some small RNA signatures do not match to the genome, and a large proportion of genome-matching small RNA signatures are derived from t/r/sn/snoRNAs, which are not small RNA molecules, we subtract the abundance value of these molecules from the sum of raw abundance before normalization. 3. Known microRNAs are identified in the data set based on their genomic coordinates, although we don’t currently assign a special flag to these sequences.
102
Nobuta et al.
4. At this point, our database is ready to be connected to our website. 3.5. Analyses of miRNAs from Small RNA Datasets
3.5.1. Identification of Regulated miRNAs
One of the primary analysis goals of producing a small RNA data set is to identify regulated small RNAs or to identify new small RNAs. In this section, we describe the approaches that we use for both of these types of analyses. In both Arabidopsis and rice, hundreds of miRNAs have been identified and registered at Sanger registry (http://microrna. sanger.ac.uk/sequences/). Many of these miRNAs are known to regulate genes that have critical functions in various aspects of plant biology. We have generated libraries of various tissues, developmental stages, stress treatments, etc. to compare the abundance of miRNAs and to identify specific sequences (siRNAmatching regions) that have characteristics suggesting that the small RNAs are differentially regulated. 1. Most of the miRNAs are 21 nt in length and are consistently processed from a precursor to generate the same 21-mer. However, the processing is not always exact, and some small RNAs are derived from the precursor but shifted by one or two bases in the 5¢ or 3¢ direction from the annotated miRNA. Therefore, in order to estimate the expression level of miRNAs, we calculate the “sum of the abundance” of all the signatures that correspond to the precursor of a given miRNA, instead of using just the abundance of the exact, annotated miRNA. 2. All the necessary pieces of data are distributed to three tables: gene_master, tag_position, and summary table on our mySQL server. The gene_master table stores the ID (gene name) of all the annotated miRNAs (in addition to annotated proteincoding and other genes). The tag_position table has all the signatures that map to the genome, including those that correspond to each gene, and the summary table has the abundance value of each signature. We join these three tables and calculate the expression level of each miRNA gene. 3. These values are compared among the libraries to identified miRNAs showing evidence of differential expression or regulation under a certain condition. Unfortunately, at the time of preparation, we do not have enough biological replicates to use statistical approaches for the comparison. Therefore, we focus on the miRNAs that show distinct expression differences (>10 times) among the libraries and verify the findings with other biological methods such as RNA gel blots.
3.5.2. Identification of Novel miRNA Candidates
The aim of this analysis is to identify candidates for new miRNAs. Although many miRNAs have been identified by various laboratories using different approaches (experimental and predictive),
Bioinformatics Analysis of Small RNAs in Plants
103
given the breadth of plant species, we are far from having discovered the full set of miRNAs for most plant species. In order to extract candidate miRNAs from deep sequencing small RNA data, we combine genomic approaches and a variety of filters. Known miRNAs are included in the analysis as a positive control, to determine which filters inadvertently catch and remove the known miRNAs. We have previously described the implementation of multiple data filters that capture most known Arabidopsis miRNAs (1). Here, we describe the filters that we apply to annotated genomes, such as rice and Arabidopsis, and an approach to identify candidate plant miRNAs from species for which there is no genomic sequence. A flowchart describing how these filters are implemented is shown in Fig. 3. The filters include metrics for each small RNA that are based on the number of libraries in which it is expressed, sparse clusters, an abundance filter, a paired filter, and a secondary structure filter. These filters are explained in the following steps: 1. The “number of expressed libraries” looks for the signatures that are expressed in the majority (more than half) of the libraries in a database. Although multiple libraries need to be available to apply this filter, we have found that this filter is one of the best filters to identify miRNAs as they are generally
307,064 unique small RNA tags
Sequence in at least half of multiple libraries (4,194) Sequence with abundance ≥ 100 TPQ (351) Sequence hit to rice genome ≥ 20 (97)
Part of known miRNAs (80)
1. Consistent expression across libraries 2. Abundance
Sequence hit to rice genome 2 million 25–30 nt sequence tags with high accuracy (23, 24). The library construction protocol described here should be applicable to all of these technologies. Because these technologies adapt different protocols in processing the purified PCR products before sequencing, specific RNA adaptors might be required for different sequencing platforms. Consult the sequencing company before designing the adaptors.
3.1.9. Data Analysis
First, the adaptor sequences have to be trimmed from raw sequence data. These sequences are then matched against the corresponding genome and assigned to each location at which a perfect match was found. The numbers of total sequences from each library are different, so sequences are normalized to facilitate comparisons among libraries. Normalization is necessary to ensure that comparisons across libraries accurately reflect biological differences and not merely differences in the total number of tags sequenced. The expression of a miRNA gene is measured by determining the frequency of signatures derived from the gene in a given library. For small RNA abundance determination, we merged the sequencing runs and calculated a single abundance normalized to “Transcripts Per Quarter million (TPQ)” after the removal of rRNA, tRNA, snoRNA, or snRNA signatures. The expression of known miRNAs can be compared across libraries with normalized abundance (16, 19). In theory, in-silico comparisons among libraries would detect all the miRNAs (known or unknown) with different expression patterns. However, unlike animals, the plant small RNA population is very complex and has a predominant class of small-interfering RNAs (siRNAs). In Arabidopsis, ~70% of small RNAs are endogenous siRNAs (16). This number is even higher in other plants that have a larger genome and more repetitive elements (80% in rice, 90% in maize) (17, 18). The large number of siRNAs makes it a major challenge to identify low-abundance miRNAs (20). Several computational filters are used to facilitate the identification of miRNAs. Although different labs developed these filters independently, all the designs were based on consensus properties
High-Throughput Approaches for miRNA Expression Analysis
115
of a known miRNA reference set (17, 19, 21). Some of the common filters include size, abundance, number of genome matches, cluster density, single strand origin, folding structure, and detection of miRNA*. As of July 2009, the miRNA Sanger database contained nearly 1,700 plant miRNAs. Most of them are 21-nt in length. By only considering 21-nt small RNAs, the vast majority of siRNAs can be excluded from the analysis. Most siRNAs are generated from high density clusters and from both strands, and have very low abundance (< 20TPQ) but a large number of genome matches (>20). Therefore, small RNAs matching to >20 genomic locations and originating from siRNAclusters can be eliminated, as can lowly expressed ones. The small RNAs, which pass these filters, can be further cleaned up by predicting stem-loop folding and miRNA* confirmation. Based on our experience, only a limited number of small RNAs are qualified as good miRNA candidates with this set of filters. Notably, some parameters of the filters can be adjusted to meet the desired level of stringency. 3.2. Small RNA Oligonucleotide Microarray
In addition to the high-throughput sequencing technologies aimed at characterizing the small RNA population and identifying novel miRNA genes in plants as described previously, several microarray platforms have been developed to profile miRNA accumulation and to assess the differential expression in several biological tissues (25, 26). The method section below will highlight (1) the different array platforms that have been developed for detecting plant small RNAs (mainly miRNAs), (2) the various labeling strategies that are used to label these small RNA molecules, and (3) the microarray processing and data analysis. Finally, different methods suitable for high-throughput sequencing and microarray data validation will be discussed (e.g., PCR-based, ligation-based assays, etc.). Similar to other technologies involving nucleic acid liquid hybridization, assay specificity and sensitivity are important criteria to characterize, especially when monitoring expression of short nucleotide sequences.
3.2.1. Microarray Manufacturing and Platform Design
Several small RNA pilot microarray designs have been reported in the literature. Some early ones contained only a limited number of features corresponding to Arabidopsis and rice miRNA sequences on the array. More recently, one study spotted more than 100 Arabidopsis mature miRNA sequences (27). Arrays are manufactured by spotting selected probes in replicate onto 1” × 3” activated glass slides. Specific oligonucleotide probes designed to be complementary to the miRNA sequences are generally extended at their 5′ terminus by a linker sequence that serves as an anchor to the solid glass matrix and allows the Tm of each candidate probe to be similar.
3.2.1.1. Custom Arrays
116
Lu and Souret
By labeling and hybridizing a set of synthetic small RNAs at various concentrations, the detection sensitivity of the microarray can be reliably determined and the linear dynamic range of signals be established (26, 28). Assay development for monitoring probe specificity targeting miRNA family members with one or more mismatches (e.g., let-7) could also be applied to plant miRNAs (e.g,. the miR-172 gene family) (28, 29). 3.2.1.2. Commercial Arrays
Several companies are now offering plant miRNA microarrays with probes designed to be complementary to the full-length mature miRNAs based on the latest registered and annotated miRNA sequences in miRBase at The Wellcome Trust Sanger Institute (http://microrna.sanger.ac.uk/sequences/). General characteristics of plant miRNA microarray platforms commercially available are presented in Table 2.
3.2.1.3. Other Array Approaches
Other platforms have also been used to examine miRNA gene expression analysis, including Arabidopsis genome-wide tiling arrays, and an alternative technique validated for mammalian miRNA regulation analysis, RNA-primed, array-based Klenow enzyme (RAKE). High-density tiling DNA microarrays contain several million short oligonucleotide probes (25–60 bp) that cover the entire genome or contigs of the genome in an unbiased fashion (30, 31). This platform, developed to monitor transcriptional activity in all regions of the genome, offers the potential to identify novel noncoding transcripts, including miRNA genes. Several platforms of tiling DNA microarrays have been successfully implemented for hybridization profiling of small RNAs in Arabidopsis (32, 33). The RAKE assay uses on-slide enzymatic reactions to monitor miRNA accumulation from a complex biological sample. miRNAs are detected with a streptavidin-conjugated fluorophore after the following steps: miRNA hybridization to their complementary DNA probes spotted on the slide, Exonuclease I treatment (to degrade single-stranded DNA oligonucleotides linked to the slide), and template (spotted probe)-dependent 3¢ end extension of the hybridized miRNAs with biotin-conjugated dATP (catalyzed by Klenow fragment of DNA polymerase I) (34). Its application to monitor plant miRNA expression profiling has yet to be reported.
3.2.1.4. Control Probes
To monitor hybridization efficiency and to normalize signal intensity of the small RNA microarray, additional controls and references are generally added to the design (see some examples in (13, 28, 29)). Negative control probes are commonly included on the array to estimate fluorescence background and background variance. These features are designed to have minimal cross-hybridization with the experimental samples (e.g., Escherichia coli sequences, sense sequences of the miRNAs spotted, and random oligonucleotides).
High-Throughput Approaches for miRNA Expression Analysis
117
Mismatch probes can also be added to monitor hybridization efficiency (35). Positive control features such as “spike-in” DNA oligonucleotides and other selected probes complementary to noncoding RNAs (e.g., tRNA, snoRNA, and snRNA) can be spotted to monitor labeling efficiency and hybridization, and to assist in data normalization. 3.2.2. Plant Small RNA Labeling
Efficient labeling of plant miRNAs for microarray application can be a challenge, since they represent only a small fraction of the mass of a total RNA sample (~0.01%), and because of the characteristics they display: small size, lack of polyadenylated tail, and 2¢-O-methylation at their 3¢ ends (36). Therefore, the traditional oligo(dT) priming-dependent reverse transcription-based method used for mRNA labeling and other approaches suitable for mammalian miRNA processing (based on miRNA sequence extension using poly(A) polymerase, e.g., Ambion mirVana™ Labeling method and Invitrogen NCode™ miRNA Labeling System) may not be appropriate for labeling plant small RNAs. Nevertheless, several assays, with or without amplification, have been developed and validated for plant miRNA labeling that combine labeling efficiency and streamlined protocols (see below; Tables 1 and 2). When relevant, we have also referenced several approaches used to label animal mature miRNAs that could be used with plant samples.
3.2.2.1. Target Preparation
Preparation of high-quality RNA (total and/or small RNA enriched samples) is critical for successful miRNA profiling and microarray experiments. Protocols for extracting total RNA are discussed in great detail in Chap. 3 (see Note 6).
3.2.2.2. Starting Material
The amount of total RNA and/or small RNA-enriched fraction required per assay will depend on the labeling strategy and the sample to be analyzed, as the relative mass of miRNAs can vary considerably between sample types. Therefore, it may be necessary to test a broad range of input RNA in pilot miRNA profiling experiments to avoid hybridization signal saturation or weakness. Enrichment and/or isolation of mature miRNAs from samples as discussed in Subheadings 3.1.2 and 3.1.3 may also be beneficial to exclude array signals from the miRNA precursors.
3.2.2.3. Labeling Methods: Direct Vs. AmplificationBased Approaches
Small RNA direct labeling methods that minimize sample manipulation and eliminate the potential bias associated with amplificationbased strategies are most likely to accurately measure miRNA populations. Several commercial kits and “home-made” procedures are available and can be used to reliably label plant miRNAs prior to array hybridization. Kreatech ULS™ Small RNA Labeling Kit and Mirus Label IT® miRNA Labeling Kit have been used to label plant miRNAs from 1 mg of small RNA-enriched sample. These labeling systems
118
Lu and Souret
generate Cy-labeled small RNAs by binding fluorescent labels to the N7 position of guanine residues in a nonenzymatic reaction. In addition, biotin-labeled miRNA can also be achieved with the Mirus Label IT® miRNA Labeling Kit. The ULS and Label IT® labeling assays are compatible with several types of miRNA microarray platforms, although it is recommended to consult the manufacturers’ guidelines for array platform compatibility. The major drawback of these approaches might be the absence of label on mature miRNAs lacking G residues. Alternatively, high-efficiency direct labeling of human miRNAs using T4 RNA ligase to attach a single fluorophore-labeled nucleotide (e.g., Cyanine 3-pCp) to the 3¢-end of mature miRNA has been implemented in sample preparation for Agilent miRNA microarrays (37). It would be pertinent to assess this strategy to label plant miRNAs. Direct labeling of small RNAs has also been achieved by taking advantage of the 3¢ hydroxyl group characteristic of mature miRNA that results from dicer-catalyzed processing of precursor miRNAs (1). Using T4 RNA ligase, short Cy-labeled RNA adaptors with a 5¢ phosphate modification can be ligated to the 3¢ hydroxyl group of mature miRNAs in the presence of ATP (27, 38, 39). Optimized ligation reaction conditions, including addition of polyethylene glycol (PEG), use of excess amounts of RNA adaptors, and alternative buffer compositions have been reported (39) and may be a starting point for ligase-based direct labeling of plant miRNAs. Another enzymatic-based method for target preparation involves using random 8-mer biotinylated primers and reverse transcriptase to initiate first-strand cDNA synthesis without further amplification (27). Direct detection of the biotin-labeled miRNAs hybridized to the microarray is then accomplished using Streptavidin-Alexa647 conjugates. Although small RNA enrichment was not performed in this study, it is most likely to improve the labeling efficiency, as the vast majority of the RNA labeled from the total RNA sample represents ribosomal RNAs. Nonenzymatic biotinylation of miRNAs followed by detection with quantum dots (QDs) has been successfully applied to profile the expression of 11 rice (Oryza sativa L.) miRNAs on microarrays (26). Direct addition of a biotin group at the 3¢ end of miRNAs was achieved by periodate oxidation and reaction with biotin-X-hydrazide. Once hybridized to the array probes, the biotinylated miRNAs were detected using QD Streptavidin conjugates. As an alternative detection approach, Liang et al. (26) effectively tested streptavidin-conjugated gold nanoparticles coupled with the silver enhancement method to reliably monitor miRNA accumulation in rice seedlings. Amplification-based labeling of plant miRNAs during sample preparation prior to array hybridization has been exploited by Axtell and Bartel (25). In this study, Arabidopsis and other plant small
High-Throughput Approaches for miRNA Expression Analysis
119
RNAs were first fractionated, and then sequentially ligated to 5¢ and 3¢ adaptors. The ligation products were then reverse-transcribed and PCR-amplified using a 5¢-end Cy-labeled forward primer and a significantly longer unlabeled reverse primer. Using this elegant approach, asymmetric PCR products could be generated, and the shorter Cy-labeled single strands could then be purified on denaturing polyacrylamide gel before hybridization on array (13, 25). By matching the 5¢ Cy3-labeled plant miRNA sample to a Cy3labeled oligonucleotide reference library prior to hybridization, this approach allowed semi-quantitative, sensitive, and highly reproducible plant and vertebrate miRNA expression profiling (13, 25). As an alternative approach to human small RNA labeling based on adaptor ligation, PCR amplification followed by labeled cRNA synthesis using T7 RNA polymerase, has also been reported (35). This allows the labeled pools to be hybridized on sense array probes. This method could most likely be adapted to label plant miRNAs with little optimization needed. 3.2.3. Microarray Hybridization and PostHybridization Wash Processing
It is highly recommended to run array experiments in triplicates. Optimization of hybridization and post-hybridization wash conditions should be determined empirically for best microarray performance, including array specificity, sensitivity, and reproducibility (40). Microarray hybridization and washing conditions associated with commercial and “custom” plant miRNA microarrays are presented in Tables 1 and 2 as examples of validated processing.
3.2.4. Image Acquisition and Data Processing
Following post-hybridization washes, arrays are dried and then scanned. Scanned images should be carefully inspected before raw data are extracted using Feature Extraction Software (Agilent Technologies), LuxScan (CapitalBio), GenePix (Axon Instruments), or similar software. Various methods for microarray data processing have been developed that generally involve lowintensity signal removal, signal normalization, followed by normalized log ratio determination. Clustering of log transformed values, dendrogram, and expression heat map creation can be executed using R (http://CRAN.R-project.org/) or Cluster, and visualized using TreeView. Additional microarray analysis software such as GeneSpring Software (Agilent Technologies) or Quantarray (PerkinElmer) may be useful for additional data analysis and other visualization tools. Original microarray data can be deposited at the Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/) following MIAME (Minimum Information About a Microarray Experiment) guidelines (41).
3.3. Data Validation
All available high-throughput sequencing methods are still expensive, so it is cost-prohibitive to analyze multiple biological and technical replicates. This makes data validation very important and
Cy dye
Biotin-Xhydrazide
Cy dye
Amersham Biosciences Codelink activated glass slides
Sigma GOPTSactivated slides
Aldhydemodified glass slides
(25)
(26)
(27)
pCU-Cy ligation
Direct labeling
Adaptor ligation followed by amplification
Sensitivity
1.5 mg (as low as 0.2 mg) ~0.4 fmol biotinylated DR: 0.16– miRNAs 20 nM
Mass input Labeling method per array
GOPTS: glycidyloxipropyltrimethoxysilane; DR dynamic range
Label type
Reference Platform
Table 1 Custom array characteristics and study references
3× SSC, 0.2% SDS, 15% formamide; overnight at 42°C
Formamide-based buffer; overnight at 37°C
3.5× SSC, 1% BSA, 0.1% SDS, 93 mg/ml salmon tested DNA, 187 mg/ml E. coli tRNA, 37 mg/ml polyadenine; 16 h at 57°C
2× SSC, 0.2% SDS for 5¢ at 42°C; then 0.2× SSC for 5¢ at RT
1× SSC, 0.5% SDS for 10¢ at 37°C
2× SSC, 0.1% SDS for 5¢ at 50°C; then 0.1× SSC, 0.1% SDS for 10¢ at RT; and 0.1× SSC for 1¢ at RT (3 times)
Hybridization buffer and Post-hybridization conditions wash conditions
120 Lu and Souret
Plant species
Plant
Arabidopsis
Arabidopsis Rice Maize Black cottonwood Soybean
Arabidopsis
Arabidopsis Rice Maize Viridiplantae
Array manufacturer
CapitalBio Corp http://www. capitalbio.com
Combimatrix http://www. combimatrix. com
Exiqon miRCURY™ LNA microRNA arrays http://www.exiqon. com
GenoSensor GenoExplorer™ http://www. genosensorcorp. com
LC Sciences http://www. lcsciences.com
158 254 44 1014
Service (Cy3® and/or Cy5® labeling)
GenoExplorer™ microRNA 5¢end labeling with array labeling kit biotin label GenoExplorer™ microRNA array chip GenoExplorer™ microRNA array probe set Hybridization and blocking buffers, washing and staining solutions
mParaflo™ Microfluidic N/A Chip Technology Detection Limit 3.5 logs
187 plus controls 1” × 3’ slideAmine-modified (include mature oligonucleotide probes and precursor Multiple sub-arrays on each miRNAs) slide Detection limit 3 logs
33
5 µg total RNA (then miRNA sample enrichment)
5–10 mg total RNA (no enrichment required)
0.5–2 mg of a labeled target sample per array sector required
Starting material Comments
Enzymatic fluorescent From 30 ng labeling (Hy3™ total RNA and Hy5™) (no enrichment required)
1” × 3’ slide LNA capture probes 8 sub-arrays in 4 replicates on each slide Feature size: 90 mm Detection limit 4 logs
203 277 98 234
miRCURY™ LNA microRNA Labeling kit miRCURY™ LNA Array Spike-in miRNA kit
Biotin or Fluorochrome Labeling (Cy3®, Cy5®, or AlexaFluor® 555 and 647 fluorescent dyes)
1” × 3’ slide Antisense oligonucleotide probes 4 × 2 K array
labeling procedure
971 plus 32 controls
Convenient reagents Service
Platform characteristics
426 plus controls (include mature and precursor miRNAs)
Number of features
Table 2 Commercial arrays available for plant miRNA expression profiling.
High-Throughput Approaches for miRNA Expression Analysis 121
122
Lu and Souret
challenging, particularly for some treatments with a high level of background noise. Also, many investigators are unfortunately reporting microarray data without confirming their results. Considering the amount of data generated from high-throughput sequencing methods and the information emerging from expression profiling of miRNAs using microarrays, it is becoming critical to confirm the results obtained by other traditional gene expression techniques. Northern blotting is still the gold standard in assessing miRNA expression, so it is preferable to have RNA gel confirmation for any sequencing data. In our first high-throughput sequencing study (comparing Arabidopsis flowers and seedlings), we saw a good correlation rate between gel blots and MPSS data, particularly for highly regulated miRNAs (ratio >10). However, in our stress-treatment experiments, only ~20% of highly regulated candidates predicted by sequencing data could be validated by northern blots. This could be partially explained by crosshybridization of the oligonucleotide probes used for the blots with nearly identical small RNAs, or sequencing errors in some of the data. More importantly, this suggests that the fluctuation in miRNA expression is very high when plants are under stress conditions. Certainly, adding biological replicates should be able to greatly reduce the “noise” level. As sequencing costs drop and sequencing power rises, biological and technical replicates are likely to be feasible in the near future. Other methods, requiring smaller amounts of starting material, have also been developed to confirm plant miRNA expression profiling data and involve PCR-based or splinted-ligation techniques. Taqman probes from Applied Biosystems Inc. (ABI) are now available to detect and monitor the expression of nearly 70 Arabidopsis miRNAs. A method for direct labeling and isotopic detection of plant small RNAs has also been developed based on splinted-ligation technology. Using this approach, the accumulation of Arabidopsis and rice endogenous small RNAs were monitored from as little as 250 ng of total RNA (17, 42). Although, cumbersome and time-consuming, ribonuclease protection assay (RPA) has also been used to examine the accumulation of endogenous small RNAs.
4. Notes 1. The exact sequence of the adapters can be changed based on specific needs. Both adaptors were PAGE-purified by Dharmacon. 2. The pellet from this step can be dissolved in DEPC-treated water and used for regular northern blots.
High-Throughput Approaches for miRNA Expression Analysis
123
3. When run next to HMW RNA, the most prominent band in the LMW RNA will be the tRNA at about 75 nt. 4. To get enough cDNA for sequencing and to maintain quantitative information at the same time, less than 20 PCR cycles are usually used for amplification. Based on our experience, starting from 100 mg of total RNA, 18 PCR cycles can generate ~100 ng of purified product. If a larger amount of PCR product is required, the volume of PCR reaction can be scaled up accordingly. 5. A shorter product of 50 bp band may also be seen in the gel. This band is likely generated from adaptor ligation product without small RNA inserts. Because most of the PCR purification kits have poor recovery efficiency for small-sized DNA fragments, gel purification is recommended. 6. Although this step is optional in the miRNA microarray general procedure, total RNA and small RNA fractions should be characterized to evaluate the quality and integrity of the RNA sample that has been extracted prior to labeling and hybridization. In general, these factors can be assessed using the Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA), by running a small aliquot on a denaturing gel (e.g., UREApolyacrylamide gel), or by performing real-time RT-PCR reactions on selected targets, etc.
Acknowledgments Research by the authors was supported, in part, by NSF grant MCB#0445638 and USDA grant # 2007-01991 to P.J. Green and NSF grant MCB#0548569 to P.J. Green and B.C. Meyers. We thank Sharon Bancroft for editorial assistance.
References 1. Chen X (2005) MicroRNA biogenesis and function in plants. FEBS Lett 579(26): 5923–5931 2. Jones-Rhoades MW, Bartel DP, Bartel B (2006) MicroRNAS and their regulatory roles in plants. Annu Rev Plant Biol 57:19–53 3. Aukerman MJ, Sakai H (2003) Regulation of flowering time and floral organ identity by a MicroRNA and its APETALA2-like target genes. Plant Cell 15(11):2730–2741 4. Juarez MT, Kui JS, Thomas J, Heller BA, Timmermans MC (2004) microRNA-mediated
repression of rolled leaf1 specifies maize leaf polarity. Nature 428(6978):84–88 5. Palatnik JF et al (2003) Control of leaf morphogenesis by microRNAs. Nature 425(6955): 257–263 6. Borsani O, Zhu J, Verslues PE, Sunkar R, Zhu JK (2005) Endogenous siRNAs derived from a pair of natural cis-antisense transcripts regulate salt tolerance in Arabidopsis. Cell 123(7):1279–1291 7. Jones-Rhoades MW, Bartel DP (2004) Computational identification of plant microRNAs and their targets, including a stressinduced miRNA. Mol Cell 14(6):787–799
124
Lu and Souret
8. Sunkar R, Chinnusamy V, Zhu J, Zhu JK (2007) Small RNAs as big players in plant abiotic stress responses and nutrient deprivation. Trends Plant Sci 12(7):301–309 9. Sunkar R, Kapoor A, Zhu JK (2006) Posttranscriptional induction of two Cu/Zn superoxide dismutase genes in Arabidopsis is mediated by downregulation of miR398 and important for oxidative stress tolerance. Plant Cell 18(8):2051–2065 10. Sunkar R, Zhu JK (2004) Novel and stressregulated microRNAs and other small RNAs from Arabidopsis. Plant Cell 16(8):2001–2019 11. Valoczi A et al (2004) Sensitive and specific detection of microRNAs by northern blot analysis using LNA-modified oligonucleotide probes. Nucleic Acids Res 32(22):e175 12. Varallyay E, Burgyan J, Havelda Z (2007) Detection of microRNAs by Northern blot analyses using LNA probes. Methods 43(2): 140–145 13. Baskerville S, Bartel DP (2005) Microarray profiling of microRNAs reveals frequent coexpression with neighboring miRNAs and host genes. RNA 11(3):241–247 14. Lim LP et al (2005) Microarray analysis shows that some microRNAs downregulate large numbers of target mRNAs. Nature 433(7027): 769–773 15. Thomson JM, Parker J, Perou CM, Hammond SM (2004) A custom microarray platform for analysis of microRNA gene expression. Nat Methods 1(1):47–53 16. Lu C et al (2005) Elucidation of the small RNA component of the transcriptome. Science 309(5740):1567–1569 17. Lu C et al (2008) Genome-wide analysis for discovery of rice microRNAs reveals natural antisense microRNAs (nat-miRNAs). Proc Natl Acad Sci U S A 105(12):4951–4956 18. Nobuta K et al (2007) An expression atlas of rice mRNAs and small RNAs. Nat Biotechnol 25(4):473–477 19. Fahlgren N et al (2007) High-throughput sequencing of Arabidopsis microRNAs: Evidence for frequent birth and death of MIRNA genes. PLoS ONE 2:e219 20. Lu C et al (2006) MicroRNAs and other small RNAs enriched in the Arabidopsis RNAdependent RNA polymerase-2 mutant. Genome Res 16(10):1276–1288 21. Rajagopalan R, Vaucheret H, Trejo J, Bartel DP (2006) A diverse and evolutionarily fluid set of microRNAs in Arabidopsis thaliana. Genes Dev 20(24):3407–3425 22. Shendure J, Mitra RD, Varma C, Church GM (2004) Advanced sequencing technologies:
methods and goals. Nat Rev Genet 5(5): 335–344 23. Cokus SJ et al (2008) Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature 452(7184):215–219 24. Mi S et al (2008) Sorting of small RNAs into Arabidopsis argonaute complexes is directed by the 5’ terminal nucleotide. Cell 133(1): 116–127 25. Axtell MJ, Bartel DP (2005) Antiquity of microRNAs and their targets in land plants. Plant Cell 17(6):1658–1673 26. Liang RQ et al (2005) An oligonucleotide microarray for microRNA expression analysis based on labeling RNA with quantum dot and nanogold probe. Nucl Acids Res 33(2):e17 27. Liu CG, Spizzo R, Calin GA, Croce CM (2008) Expression profiling of microRNA using oligo DNA arrays. Methods 44(1):22–30 28. Wang H, Ach RA, Curry B (2007) Direct and sensitive miRNA profiling from low-input total RNA. RNA 13(1):151–159 29. Sun Y et al (2004) Development of a microarray to detect human and mouse microRNAs and characterization of expression in human organs. Nucleic Acids Res 32(22):e188 30. Stolc V et al (2005) A pilot study of transcription unit analysis in rice using oligonucleotide tilingpath microarray. Plant Mol Biol 59(1):137–149 31. Yamada K et al (2003) Empirical analysis of transcriptional activity in the Arabidopsis genome. Science 302(5646):842–846 32. Boccara M et al (2007) New approaches for the analysis of Arabidopsis thaliana small RNAs. Biochimie 89(10):1252–1256 33. Stolc V et al (2005) Identification of transcribed sequences in Arabidopsis thaliana by using high-resolution genome tiling arrays. Proc Natl Acad Sci U S A 102(12):4453–4458 34. Nelson PT et al (2004) Microarray-based, high-throughput gene expression profiling of microRNAs. Nat Methods 1(2):155–161 35. Barad O et al (2004) MicroRNA expression detected by oligonucleotide microarrays: system establishment and expression profiling in human tissues. Genome Res 14(12):2486–2494 36. Yu B et al (2005) Methylation as a crucial step in plant microRNA biogenesis. Science 307(5711):932–935 37. Wang X, Wang X (2006) Systematic identification of microRNA functions by combining target prediction and expression profiling. Nucleic Acids Res 34(5):1646–1652 38. Castoldi M, Benes V, Hentze MW, Muckenthaler MU (2007) miChip: a microarray platform for expression profiling of microRNAs based on
High-Throughput Approaches for miRNA Expression Analysis
locked nucleic acid (LNA) oligonucleotide capture probes. Methods 43(2):146–152 39. Yin JQ, Zhao RC (2007) Identifying expression of new small RNAs by microarrays. Methods 43(2):123–130 40. Miska EA et al (2004) Microarray analysis of microRNA expression in the developing mammalian brain. Genome Biol 5(9):R68
125
41. Edgar R, Barrett T (2006) NCBI GEO standards and services for microarray data. Nat Biotechnol 24(12):1471–1472 42. Maroney PA, Chamnongpol S, Souret F, Nilsen TW (2007) A rapid, quantitative assay for direct detection of microRNAs and other small RNAs using splinted ligation. RNA 13(6):930–936
Chapter 9 In Situ Detection of miRNAs Using LNA Probes Zoltán Havelda Abstract A spatial and temporal analysis of miRNA accumulation by in situ analyses is the prerequisite of understanding the precise biological functions of miRNAs. Since miRNAs are very short molecules, their in situ analysis is technically demanding. Here, we describe a protocol for miRNA in situ detection in plants based on LNA-modified oligonucleotides probes. LNA modification significantly enhances the sensitivity and specificity of miRNA detecting probes and provides relatively easy in situ miRNA detection. Key words: miRNA, Plant, LNA, In situ hybridization
1. Introduction Understanding the precise role of miRNAs in the biological processes requires the spatial and temporal investigation of mature miRNA accumulation by in situ hybridization. The technical problem associated with this technology derives from the length of the target RNAs (21–25 nt), which inhibits their reliable and sensitive detection. Locked nucleic acid (LNA) modified oligonucleotide-based probes have been introduced to enhance both the sensitivity and the specificity of miRNA detection by northern blotting and in situ hybridization (1–3). LNA modifications in DNA oligonucleotides bring about a dramatically higher target affinity and specificity compared to a traditional DNA oligonucleotide (4). Using this technology, miRNAs can be detected relatively easily in both plant and animals (Fig. 1). Here we describe a detailed protocol for in situ hybridization of plant tissue sections using LNA-modified oligonucleotides as probes.
B.C. Meyers and P.J. Green (eds.), Plant MicroRNAs, Methods in Molecular Biology, vol. 592 DOI 10.1007/978-1-60327-005-2_9, © Humana Press, a part of Springer Science + Business Media, LLC 2010
127
128
Havelda
miR160
miR167
miR124
Fig. 1. Differential accumulation of miR160 and miR167 in A. thaliana flowers. Longitudinal near consecutive sections of A. thaliana flowers have been hybridized with 5¢ and 3¢ double-labeled LNA-modified oligonucleotides detecting miR160, miR167, and miR124 (mouse miRNA used as negative control). The hybridization was carried out at 60° C overnight. Arrows show the accumulation of miR167.
2. Materials 2.1. Equipment
1. Vacuum chamber and incubation chamber. 2. Rotary microtome (e.g., HM335E from Microm, Germany). 3. Hotplate. 4. Hybridization oven. 5. Standard light microscope with camera.
2.2. Chemicals
1. Ethanol (100%). 2. Wax (e.g., Paraplast plus, Sigma-Aldrich). 3. Eosin Y disodium salt (Fluka). 4. Histoclear (National Diagnostics, Atlanta, Georgia) or RotiClear (Roth, Germany).
In Situ Detection of miRNAs Using LNA Probes
129
5. Wax: Paraplast X-tra (Sigma). 6. Commercially prepared poly-l-lysine slides (e.g., Poly-Prep Slides, Sigma), coverslips. 7. Protease from Streptomyces griseus (Pronase, Sigma), 40 mg/ mL in water and self-digested at 37° C for 2 h to remove contaminant nuclease activities. Store in aliquots at −20 C. 8. Acetic anhydride (Sigma). 9. Triethanolamine (Sigma). 10. RNaseA (Sigma). 11. LNA-modified oligo (see Note 1). 12. DIG Oligonucleotide 3¢-End Labeling Kit (Roche). 13. Anti-Digoxigenin-AP, F¢ab fragments (Roche). 14. Ribonucleic acid, transfer from E. coli strain W. 15. Blocking Reagent (for nucleic acid hybridization, Roche). 16. BSA Fraction V (Sigma). 17. NBT (nitro blue tetrazolium): 50 mg/mL, BCIP (5-bromo4-chloro-3-indolyl-phosphate): 50 mg/mL (Promega). 18. Alcian Blue solution in 3% acetic acid. 19. DPX Mountant for histology (Fluka). 2.3. Buffers and Solutions
1. Fixative solution: Paraformaldehyde (4%, w/v, Sigma) and 0.1% Triton-X 100 in phosphate-buffered saline (PBS): 0.13 M NaCl, 7 mM Na2HPO4, 3 mM NaH2PO4, pH 6.7 (see Note 2). 2. 10× saline: 8.5%, w/v NaCl in water. 3. SB buffer: 0.1 M Tris-HCL, 0.1 M NaCl, 0.05 M MgCl2, pH 9.5. Prepare 10× stock (1 M Tris, 1 M NaCl, pH 9.5) and add MgCl2 to 1× buffer just before use. 4. 10×TBS buffer: 1 M Tris-HCL, 1.5 M NaCl, pH 7.2. 5. Hybridization solution: 0.3 M NaCl, 10 mM Tris–HCl pH 6.8, 10 mM NaHPO4 pH 6.8, 5 mM EDTA, 50% formamide, 10% dextran sulfate, 1× Denhardt’s solution, 1 mg/ mL tRNA (Sigma) (see Note 3). 6. Washing solution: 0.2× SSC. 7. 10× NTE buffer: 5 M NaCl, 100 mM Tris–HCl pH 7.5, 10 mM EDTA. 8. Blocking solution 1: Dissolve 0.5% Blocking Reagent (Roche) in 1× TBS buffer at 60°C and cool down. 9. Blocking solution 2: Dissolve 1 % BSA in 1× TBS and add 0.3% Triton X-100. 10. 10× salts buffer: 3 M NaCl, 0.1 M Tris–HCl pH 6.8, 0.1 M NaHPO4 pH 6.8, 50 mM EDTA.
130
Havelda
11. 20× pronase buffer: 1 M Tris–HCl pH 7.5, 0.1 M EDTA. 12. 10× PBS buffer: 1.3 M NaCl, 0.07 M Na2HPO4, 0.03 M NaH2PO4 (pH 6.5–7).
3. Methods In situ hybridization of miRNAs depends upon the detection of RNA; therefore, it is very important to avoid contamination with RNases. The working environment should be clean, and nuclease-free tubes, bottles, etc. should be used. The water and all aqueous solutions should be autoclaved and preferably aliquotted for single usage. Because of its hazardous nature, we do not favor the use of diethyl pyrocarbonate (DEPC) treatment. 3.1. Embedding and Sectioning
1. Embedding of tissue parts in wax is a long and slow process that takes several days. Once the samples are removed from the plant, they should be placed in ice-cold fixative solution immediately. After transferring the material into the fixative, apply vacuum until bubbles are formed and hold the vacuum for 5–10 min. Release the vacuum very slowly. Formaldehyde in the fixative is toxic, so work in a fume hood when possible. Keep the tubes on ice during the procedure. 2. Repeat the vacuum treatment and change the fixative solution. 3. Repeat the whole procedure until all the samples sink to the bottom of the tube. This is an indication of the complete infiltration of fixative solution into the tissue samples. 4. Exchange the fixation solution when all the samples have sunk down and fix the samples at 4° C for 4 h overnight, with gentle shaking (see Note 4). 5. Wash the fixative solution away with ice-cold 1× saline solution. 6. Dehydrate the samples on ice by passing them through a graded ethanol series with gentle rotation. Depending on the size of the samples, every step takes 1–3 h. Use the following concentrations at each step for the series: 30% EtOH/1× saline, 40% EtOH/1× saline, 50% EtOH/1× saline, 60% EtOH/1× saline, 70% EtOH/1× saline, 85% EtOH/1× saline, 95% EtOH/H2O. You can interrupt the protocol and leave the samples at 4° C at any step after 50% EtOH/1× saline. 7. The following steps are performed at room temperature with gentle shaking. Replace 95% EtOH/H2O with 100% EtOH containing 0.1% EosinY (Sigma) for 1–3 h. Replace the staining solution with 100% EtOH and incubate for 1–3 h. Repeat once. 8. Pass the samples through a graded Roti-Histol (Roth, Karlsruhe, Germany) series with gentle rotation. It is advisable to carry out the Roti-Histol steps in glass vials or good
In Situ Detection of miRNAs Using LNA Probes
131
quality reaction tubes. Depending on the size of the samples, every step takes 1–3 h. Use the following concentrations at each step for the series: 25% Roti-Histol/EtOH, 50% Roti-Histol/ EtOH, 75% Roti-Histol/EtOH, 100% Roti-Histol. Repeat the 100% Roti-Histol step twice.
9. Add one Paraplast X-tra chip (Sigma) to about one mL of Roti-Histol and leave to dissolve at room temperature. Once the first chip has dissolved, add another one, and so on, until the chips will not dissolve any more. At this stage, transfer the samples to 42° C and saturate the Roti-Histol by adding more and more Paraplast X-tra chips until saturation. 10. Put samples at 60° C and exchange half of the volume for freshly melted Paraplast X-tra (avoid heating Paraplast X-tra over 60° C). Try to form a cap of melted wax on top of the solution to avoid the heat stress of the samples. Repeat once and leave at 60° C overnight. 11. Change the wax, replacing with freshly melted wax, twice a day for 3 days. Decant the old Paraplast X-tra gently and pour in freshly melted Paraplast X-tra. Leave the tubes open to allow the evaporation of Roti-Histol. 12. Prior to sectioning the tissue, samples must be solidified in a wax block. Put the plastic or metal molds on a hotplate at 60° C, shake the samples and pour them out into the mold. Ideally, about 10–12 samples should be transferred into one mold. Arrange the samples in rows, leaving sufficient space between them to allow the cutting out of single blocks containing one or few samples. Move the samples on ice or at 4° C and allow them to harden. Apply the tissue block on a holder compatible with the microtone. 13. Make tissue sections (8–20 mm) using a retracting rotary microtome. Trim the block so that the upper and lower faces are parallel. Repeated sectioning will form a ribbon of sections. Trimming a trapezoid-shaped block helps the identification of a single section in the ribbon. 14. Mount the sections onto poly-l-lysine-coated preprepared slides. Wax sections that need to be stretched before adhesion to the glass slide. Sections should be put onto a layer of de-gassed water on a slide held on a warmed hotplate (40– 42° C). Once the section has stretched, drain away the excess water and leave the slide to dry overnight at 40–42° C. 3.2. LNA Probe Preparation
LNA-modified oligonucleotide-based probes can be purchased from Exiqon (Denmark; www.exiqon.com), and the level of their labeling depends on the required sensitivity. Arabidopsis thaliana tissue sections respond less efficiently in in situ experiments than other plant tissues (for example Nicotiana benthamiana). While 3¢ end-labeled probes (using the DIG Oligonucleotide 3¢-End Labeling Kit, Roche) work well in N. benthamiana, they do not
132
Havelda
provide reliable signals in A. thaliana. To achieve good signals in A. thaliana in situ experiments, it is recommended to order a 5¢ chemically DIG-labeled LNA-modified probe, and also label this probe at the 3¢ end, producing a double-labeled probe. Alternatively, double-labeled probes can be ordered. It is very important to use a similarly labeled negative control (for example, an animal-specific miRNA) during the experiments to test the background hybridization. 1. Label the 50 pmol LNA-modified oligonucleotide or 5¢ DIGlabeled LNA-modified oligonucleotide in 10 mL volume using the DIG Oligonucleotide 3¢ End Labeling Kit (Roche) according to the manufacturer’s instructions. It is not necessary to purify the probe after labeling. 2. Remove 0.2 mL (for probe checking) from the 10 mL reaction and add 10 mL deionized formamide. 3. To check the quality of the probe, spot the 0.2 mL aliquot (and the labeled control oligonucleotide provided by the kit) on a piece of membrane and UV cross-link. 4. Put the membrane in 1× TBS buffer for 2 min and incubate the membrane in TBS buffer containing 5% of powdered milk for 10 min. Add antiDIG-alkaline phosphatase and Fab fragments (1:2,000) and hybridize with gentle shaking for 30 min. 5. Wash at least three times in 1× TBS buffer for 5 min each and transfer to 1× SB buffer for 2 min. 6. Develop the color reaction by adding SB buffer containing NBT and BCIP (add 30 mL NBT and 30 mL BCIP solution to 10 mL SB buffer). Stop the reaction by rinsing the membrane with water and dry the membrane. 3.3. Slide Preparation
1. Transfer the slides into Roti-Histol at room temperature and incubate for 10 min. (For the first treatment, the Roti-Histol can be reused from the second treatment of a previous experiment.) Repeat using fresh Roti-Histol. 2. Transfer slides into 100% EtOH (can be reused from the previous experiment before) for 2 min. Repeat once using fresh 100% EtOH. 3. Hydrate the samples by passing them through a graded ethanol series for 2 min at each step. Use for the series the following concentrations at each step: 95% EtOH/H2O, 85% EtOH/1× saline, 70% EtOH/1× saline, 60% EtOH/1× saline, 50% EtOH/1× saline, 40% EtOH/1× saline, 30% EtOH/1× saline, 1× saline. 4. Equilibrate the slides in 1× pronase buffer for 2 min. Incubate the slides in pronase solution containing 0.125 µg/mL pronase for 10 min at room temperature. Incubate slides in 0.2% glycine in 1× PBS for 2 min, then wash in 1× PBS for 2 min.
In Situ Detection of miRNAs Using LNA Probes
133
Postfix in 4% formaldehyde/ PBS solution (in a fume hood) for 30 min. Wash the slides twice in 1× PBS for 2 min each. 5. To eliminate background reactions due to electrostatic binding of the hybridization probe, amino groups on the section should be acetylated using an acetic anhydride treatment (0.5 mL acetic anhydride in 100 mL 0.1 M triethanolamine– HCl, pH 8). Rinse the slides twice in 1× PBS for 2 min. To prepare the acetylation buffer, add 1.25 mL triethanolamine and 0.5 mL HCl to 98.25 mL water and stir. Add 0.5 mL acetic anhydride to the triethanolamine buffer and stir vigorously (work under fume hood). Since the acetic anhydride is very unstable in water, it has to be added just before using. Incubate the slides in buffered acetic anhydride for 10 min at room temperature. Wash the slides once in 1× PBS, then 1× saline for 2 min each. 6. Dip the slides in fresh saline solution for 2 min and dehydrate through a graded ethanol/saline series for 2 min each step. Use for the series the following concentrations at each step: 30% EtOH/1× saline, 40% EtOH/1× saline, 50% EtOH/1× saline, 60% EtOH/1× saline, 70% EtOH/1× saline, 85% EtOH/1× saline, 95% EtOH/H2O. 7. Transfer the slides to 100% EtOH. Now the slides are ready for hybridization. You can stop here and keep the slides safely in EtOH for hours. 3.4. Hybridization and Washing
1. Prepare the hybridization solution, about 100–200 mL hybridization solution per slide, depending on the number and size of the sections. Prepare more than needed to account for losses (see Note 3). 2. Add 1–2 mL labeled LNA probe (in 50% formamide) per slide to the hybridization solution. Mix well, centrifuge to eliminate bubbles, and keep at the temperature of hybridization (50– 60° C; see Note 5). 3. Put one slide on a hotplate at 50° C and allow it to dry. Apply the hybridization solution with probe as a band along the middle of the slide. Carefully cover with a coverslip, avoiding air bubbles. Put the slide in a closed environment saturated with 50% formamide/2× SSC prewarmed to the temperature of hybridization. (Prepare a plastic box with 50% formamide/2× SSC at the bottom. Place the slides on a horizontal support inside the plastic box). 4. Repeat with every slide individually. Close the box, seal with clingfilm, and incubate the slides at the temperature of hybridization overnight. Prepare washing solution in excess (0.2× SSC) and place at the temperature of hybridization. 5. Perform wash at 50–60° C (depending on the temperature of hybridization) in 0.2× SSC. Put the slides in washing solution
134
Havelda
and carefully remove the cover slips. Rinse the slides, having different probes, quickly several times in washing solution to avoid cross-contamination of probes. Wash the slides twice at the temperature of hybridization for 1 h each. 6. Immerse the slides in 1× NTE buffer, prewarmed to 37 ºC. Repeat in fresh buffer. Incubate slides in NTE containing 20 mg/mL RNase A at 37 ºC for 30 min to remove background hybridization. 7. Rinse the slides in 1× NTE for 5 min and transfer the rack to washing solution (0.2× SSC) for 1 h at the temperature of hybridization. Dip the slides into 1× SSC for 2 min then into 1× TBS twice for 5 min each time. Slides are now ready for the detection step. 3.5. Signal Detection
1. Incubate the slides in their rack in blocking solution 1 for 30 min. Transfer the slides into blocking solution 2 and incubate for 30 min. 2. Add antiDIG-alkaline phosphatase and Fab fragments (1:2,000, Roche) to the required amount of blocking solution 2 (0.5 mL per slide). Place the slides on a support and put them into a moist plastic or glass chamber on a tray. Apply the antibody solution onto the slides and incubate for 90 min at room temperature. 3. Stop the reaction by washing the slides (transferred to a rack) at least five times in excess 1× TBS for 5 min. Equilibrate the slides in 1× SB buffer for few minutes. 4. Develop the color reaction by adding 1× SB buffer containing NBT and BCIP (add 30 mL NBT and 30 mL BCIP solution to 10 mL 1× SB buffer). To develop the color reactions, put the slides into a moisture chamber and cover them individually with about 1 mL of substrate solution. Remove slides one by one from equilibrating 1× SB buffer and immediately apply the substrate solution, since after drying it can be difficult to spread the liquid. 5. Monitor the signal development for 2–24 h. Stop the reaction at the desired signal intensity by rinsing the slides in water. 6. Wash the slides in a graded EtOH series for 2–5 min (depending on the intensity of the signal and background) each step. Use the following concentrations at each step for the series: 40% EtOH/H2O, 70% EtOH/H2O, 95% EtOH/H2O, 100% EtOH. Repeat the process in reverse direction. 7. Counter-stain the sections by dipping the slides for 5–15 min in 0.25% Alcian blue in 3% acetic acid. The slides should be
In Situ Detection of miRNAs Using LNA Probes
135
monitored for the intensity of staining (tissue having no hybridization signals should show a faint blue staining). Rinse the slides in water and air dry. 8. Cover the section with a coverslip using mounting solution (DPX), about 100–200 mL per slide. Leave the slides to dry for a few hours. Now the sections are ready for data recording using a standard light microscope.
4. Notes 1. LNA-modified oligonucleotide probes detecting miRNAs can be ordered from Exiqon (http://www.exiqon.com) and a website for probe design is also available (http://lnatools.com). 2. It should be prepared in a screw-top bottle (e.g., Duran type) in a fume hood. Take 50 mL of (water) 1XPBS, and using a solution of 5 M KOH, adjust to pH >12. Heat the solution to 60° C on a hot plate and add 4 g paraformaldehyde while heating. Shake vigorously for about 30 s, and release the pressure every 5–10 s. The paraformaldehyde should dissolve completely, although very slight cloudiness is acceptable. Cool it on ice. Adjust the pH back down to 7 using H2SO4 (do not use HCl, as this releases a carcinogen). Then bring the volume up to 100 mL by adding 1× PBS. Add 0.1 mL of Triton-X 100. You can prepare a larger volume of solution and store it at −20° C in aliquots. Once you have thawed an aliquot, do not re-freeze. 3. For 1 mL hybridization solution, add 100 mL 10× salts buffer, 500 mL deionized formamide, 200 mL 50% dextran sulfate, 10 mL 100 mg tRNA, 10 mL 100× Denhardt’s solution, and water. The volume of the probe usually does not alter the concentration of hybridization solution significantly. Prepare a little bit more than the desired volume. 4. The fixation time strongly depends on the size and the tissue type of the samples. Larger and compact tissue samples usually require longer fixation. However, over-fixation can reduce the hybridization signal. 5. The temperature of hybridization strongly depends on the nature of the particular probe. A good starting temperature is 55° C. If the probe tends to give background hybridization, then increase the temperature of hybridization to 60° C, and in parallel increase also the temperature of washing. If no signal is detected, then lower the temperature of hybridization and washing to 50° C.
136
Havelda
Acknowledgments This work was supported by a grant from the Hungarian Scientific Research Fund (OTKA; K61461). ZH is a recipient of Bolyai Janos Fellowship.
References 1. Valoczi A, Hornyik C, Varga N, Burgyan J, Kauppinen S, Havelda Z (2004) Sensitive and specific detection of microRNAs by northern blot analysis using LNA-modified oligonucleotide probes. Nucleic Acids Res 32(22):e175 2. Valoczi A, Varallyay E, Kauppinen S, Burgyan J, Havelda Z (2006) Spatio-temporal accumulation of microRNAs is highly coordinated in developing plant tissues. Plant J 47(1):140–151
3. Kloosterman WP, Wienholds E, de Bruijn E, Kauppinen S, Plasterk RH (2006) In situ detection of miRNAs in animal embryos using LNAmodified oligonucleotide probes. Nat Methods 3(1):27–29 4. Kauppinen S, Vester B, Wengel J (2006) Locked nucleic acid: high-affinity targeting of complementary RNA for RNomics. Handb Exp Pharmacol 173:405–422
Chapter 10 Analysis of miRNA Modifications Bin Yu and Xuemei Chen Abstract After transcription, a large number of cellular RNAs employ modifications to increase their diversity and functional potential. Modifications can occur on the base, ribose, or both, and are important steps in the maturation of many RNAs. Our lab recently showed that plant microRNAs (miRNAs) possess a 2¢-O-methyl group on the ribose of the 3¢ terminal nucleotide, and that this methyl group is added after miRNA/ miRNA* formation. One function of this modification is to protect miRNAs from 3¢ terminal uridylation by an unknown enzymatic activity. It is possible that uridylation of miRNAs triggers their degradation. Here we describe a protocol to purify a specific miRNA in order to determine its molecular mass so that the presence of a modification can be inferred, an in vivo method to detect 3¢ terminal modification of miRNAs, and an (a-32P) dATP incorporation assay to study 3¢ terminal uridylation of miRNAs. Key words: miRNA, Methylation, Uridylation, b elimination
1. Introduction MicroRNAs (miRNAs) are short noncoding RNAs that recognize partially or completely complementary sequences inside target mRNAs and guide cleavage or translational inhibition of target mRNAs (1). This ability has made miRNAs important regulators of gene expression in both animals and plants (1). miRNAs are generated from long stem-loop precursor transcripts known as primiRNAs (1). In animals, an RNAase III enzyme Drosha processes pri-miRNAs into pre-miRNAs, which are processed by another RNAase III enzyme Dicer to generate transient 20–24 nucleotide (nt) miRNA/miRNA* duplexes (2–5). In plants, an RNAse III enzyme DICER LIKE1 (DCL1) processes priamiRNAs to preamiRNAs and pre-miRNAs to miRNA/miRNA* duplexes (6, 7) with the aid of HYL1 and SERRATE (8–11). miRNA/miRNA* duplexes show typical features of RNAase III products, 5¢ P, 3¢OH and a 2 nt overhang on each strand (4, 12). B.C. Meyers and P.J. Green (eds.), Plant MicroRNAs, Methods in Molecular Biology, vol. 592 DOI 10.1007/978-1-60327-005-2_10, © Humana Press, a part of Springer Science + Business Media, LLC 2010
137
138
Yu and Chen
Recently, our lab showed that Arabidopsis miRNA/miRNA* duplexes have an additional feature, 2¢-O-methylation on the 3¢ terminal ribose (13, 14), and an enzyme named HUA ENHANCER1 (HEN1; 15) catalyzes the methylation reaction (13). We revealed the presence of a methyl group on miR173 via mass spectrometry analysis of miR173 purified from Arabidopsis total RNAs. We also demonstrated the presence of methylation at the 3¢ termini of miRNAs by treating miRNAs with sodium periodate followed by b elimination (13). Complete or partial loss-offunction mutations in HEN1, such as hen1-1 or hen1-2, result in reduced accumulation and size heterogeneity of miRNAs and pleiotropic developmental defects (6, 15, 16). With the cloning of particular miRNAs and with an (a-32P) dATP incorporation assay, we showed that the size heterogeneity of miRNAs in hen1 mutants comes from 3¢ terminal uridylation, suggesting that unmethylated miRNAs are modified by an unknown polymerase activity in plants (17). In this chapter, we describe a procedure to purify a specific miRNA for mass spectrometry analysis and a protocol to perform b elimination to detect modifications on the 3¢ terminal ribose of miRNAs. We also describe an (a-32P) dATP incorporation assay to detect 3¢ terminal uridylation.
2. Materials 2.1. Purifying miR173 with a Complementary Oligonucleotide Probe Coupled to Biotin
1. Tri-reagent (Molecular Research Center, Inc. Cat# TR 118).
2.1.1. Extraction of RNAs
3. Chloroform (VWR, cat# EM-CX1055-14).
2. Diethyl pyrocarbonate (DEPC)-treated water. Add 1 ml DEPC to 1 l deionized water, stir overnight, and autoclave the next day. 4. Isopropanol (VWR, cat# EM-PX1838-1). 5. 100% ethanol and 70% ethanol. Mix 70-ml 100% ethanol with 30-ml DEPC-treated water.
2.1.2. Annealing of Probe
1. Biotinylated probe. 5¢ biotin-aagtgatttctctctgcaagcgaa 3¢ (see Note 1). 2. 20× SSC. 3. RNasin Plus RNase Inhibitor (Promega, cat # N2615).
2.1.3. Preparation of Streptavidin Magnetic Particles
1. Streptavidin magnetic particles (Roche, cat# 11641778001). 2. 0.5× SSC. 3. Magnetic stand (Promega).
2.1.4. Capturing of Annealed BiotinylatedOligonucleotide/miRNA Hybrids and Elution of miRNA
3. Quantification of Purified miRNA by RNA Filter Hybridization
Analysis of miRNA Modifications
139
1. Exonuclease I (GE Healthcare, cat# E70073Z).
1. 5× TBE and 0.5× TBE. 2. 2× RNA loading buffer. Mix 8 ml formamide, 2 ml 5× TBE, 10 mg bromophenol blue and 10 mg xylene cyanol. 3. 15% polyacrylamide gel containing 42% urea. Dissolve 42 g urea in 4 ml 5× TBE and 15 ml 40% acrylamide (acrylamide:bisacrylamide, 29:1) and add water to 40 ml. Add 320 ml 10% APS and 24 ml TEMED (see Note 2). 4. Zeta-probe GT membrane (BioRad, cat# 162-093). 5. Ultrahyb-Oligo hybridization buffer (Ambion, cat# AM8663). 6. T4 polynucleotide kinase (NEB, cat# M0201S). 7. (g-32P) ATP (PerkinElmer). 8. DNA probe 5¢ gtgatttctctctgcaagcgaa 3¢ and synthesized miR173 5¢ UUCGCUUGCAGAGAGAAAUCAC 3¢ (a, t, g and c are deoxyribonucleotides; A, U, G and C are ribonucleotides). 9. 2× SSC with 0.5% SDS.
3.1. Monitoring 3 ¢ Terminal Methylation by b Elimination
1. 0.06 M borax/boric acid buffer (pH 8.6; see Note 3).
3.1.1. Extraction of RNAs (See Subheading 2.1.1)
4. Glycerol.
3.1.2. Periodate Treatment and b Elimination
6. 3 M sodium acetate (pH 5.2, DEPC-treated).
3.2. Monitoring 3 ¢ Uridylation by an [a-32P] dATP Incorporation Assay
1. 50% polyethylene glycol 8000 (DEPC-treated).
3.2.1. Enrichment of Small RNAs
2. 0.055 M borax/boric acid/NaOH (pH 9.5). 3. 200 mM sodium periodate (see Note 4). 5. Glycogen (Fermentas, cat# R0551). 7. G25 column (GE Healthcare).
2. 5 M NaCl (DEPC-treated).
140
Yu and Chen
3.2.2. Isolation of 18–30 nt Small RNAs by Electrophoresis
1. Decade™ markers (Ambion, cat# AM7778). 2. RNA elution buffer containing 20 mM Tris-HCL (pH 7.5), 0.5 M sodium Acetate, 10 mM EDTA, and 1% SDS. 3. Glass wool. 4. Chloroform/Phenol (1:1).
3.2.3. Ligation to 3 ¢ Adaptor and Purification of Small RNAs After 3 ¢ Adaptor Ligation
1. Alkaline Phosphatase, Calf Intestinal (CIP) (NEB, cat# M0290L).
3.2.4. Reverse Transcription and Amplification of miR167
1. microP2 primer. 5¢ attgatggtgcctacagttt 3¢.
2. RNA ligase (GE Healthcare, cat# E2050Y). 3. 3¢ adaptor. 5¢ P-UUUctgtaggcaccatcaat-iT 3¢ (P is phosphate, a, t, g and c are deoxyribonucleotides; U is ribonucleotide, iT is inverted deoxythymidine).
2. miR167P1. 5¢ tgaagctgccagcatga 3¢ (see Note 5). 3. M-MuLV reverse transcriptase (NEB, cat# M0253S). 4. 10 mM dNTP. 5. Gotaq DNA polymerase (Promega, cat# M3005).
3.2.5. Purification of DNA by Electrophoresis
1. 6× DNA loading buffer. 40% sucrose, 1 mg/ml bromophenol blue, and 1 mg/ml xylene cyanol. 2. 12% native polyacrylamide gel. 3. TrackIt 10 bp DNA ladder (Invitrogen, cat# 10488-019). 4. 0.3 M Sodium acetate (DEPC-treated).
3.2.6. [a-32P] dATP Incorporation Assay
1. (a-32p) dATP (PerkinElmer).
4. Methods Outline of the methods described below: 1. Purification of miR173. 2. Detection of RNA 3¢ terminal methylation by b elimination. 3. Detection of 3¢ uridylation by an (a-32P) dATP incorporation assay. 4.1. Purification of miR173 with a Complementary Oligonucleotide Probe Coupled to Biotin
We describe an affinity procedure to purify miR173 from total RNAs (see Fig. 1). Briefly, a complementary oligonucleotide probe coupled to biotin will be annealed with miR173 in a high salt solution and the hybrids will then be captured by magnetic streptavidin particles. After washes, miR173 will be eluted with water.
Analysis of miRNA Modifications
141
Fig. 1. A schematic illustration of the purification of miR173 from Arabidopsis total RNAs. The purification is achieved in three steps. The first step is the annealing of a biotinylated antisense miR173 probe to miR173 in total RNAs. The second step is the magnetic capturing of the duplex. The third step is the elution of miR173 after washes. Small box indicates biotin; SMP, streptavidin magnetic particle. 4.1.1. Extraction of Total RNAs
1. Grind Arabidopsis tissue in liquid nitrogen to fine powder with a mortar and pestle. 2. Transfer the powder to a centrifuge tube, add tri-reagent (10 ml per 1 g of fresh tissue), mix vigorously by vortexing and incubate at room temperature (RT) for 5 min. 3. Add chloroform (1/5 volume), mix vigorously, and incubate at RT for 15 min. 4. Centrifuge at 12,000 g for 15 min at 4°C. 5. Transfer the aqueous phase to a fresh centrifuge tube, add isopropanol (1/2 volume), mix, and incubate for 10 min at RT. 6. Centrifuge at 12,000 g for 10 min at 4°C. 7. Remove supernatant, wash with 70% ethanol (1 ml per 1 ml tri-reagent used), and air-dry pellet for 5 min (see Note 6). 8. Dissolve RNA in water by mixing through a pipette tip and incubating for 10–15 min at 60°C.
142
Yu and Chen
4.1.2. Annealing of Probe
1. Transfer 500 ml of total RNA (1–2 mg/ml) to an RNAse free tube and incubate for 15 min at 65°C (see Note 7). 2. Add 3 ml biotinylated oligonucleotide probe, 5 ml RNase inhibitor and 13 ml 20× SSC to RNA, and incubate at 50°C for 5–12 h.
4.1.3. Preparation of Streptavidin Magnetic Particles
1. Transfer 50 ml of streptavidin magnetic particles (SMPs) to an RNAase-free tube. Capture the particles by placing the tube in the magnetic stand until the SMPs have collected on one side of the tube (approximately 30 s). 2. Carefully remove the supernatant. Do not centrifuge the particles. 3. Wash the SMPs by adding 250 ml of 0.5× SSC followed by the capture of the SMPs using the magnetic stand and carefully removing the supernatant. Repeat these steps two more times.
4.1.4. Capturing of Annealed OligonucleotidemiRNA Hybrids and Elution of the miRNA
1. Transfer the annealing reaction to the tube containing the washed SMPs. 2. Incubate at RT for 20 min. Gently mix by inverting the tube every 1–2 min. 3. Capture the SMPs using the magnetic stand and carefully remove the supernatant without disturbing the SMP pellet (see Note 8). 4. Wash the particles four times with 0.5× SSC (200 ml per wash). After the final wash, remove as much of the supernatant as possible without disturbing the SMPs. 5. Elute the miRNA from the SMPs by adding 50 ml of H2O followed by incubation at 65°C for 5 min. 6. Add 2 ml of exonuclease I and incubate for 1 h to degrade any DNA oligonucleotide that is co-eluted with the miRNA.
4.1.5. Quantification of the Purified miRNA by RNA Filter Hybridization
The amount of purified miR173 can be estimated by northern blotting and comparing its signal intensity to that of a series of standards of known concentrations. 1. Prepare solutions of the synthesized miR173 standard in four different concentrations by adding 0.5 ng, 1 ng, 2.5 ng and 5 ng miR173 in 5 ml H2O. Add 5 ml of RNA loading buffer to 5 ml of purified miR173 and the four standards, incubate at 65°C for 5 min, and leave on ice. 2. Resolve RNAs on a 15% polyacrylamide gel containing 42% urea. 3. Transfer the RNAs to Zeta-probe GT membrane using a semi-dry transfer apparatus (see Note 9). 4. Fix RNA to the membrane by ultraviolet cross-linking for 1 min followed by baking at 80°C for 1 h.
Analysis of miRNA Modifications
143
5. Prehybridize in Ultrahyb-Oligo hybridization buffer for 1.5 h at 42°C. 6. Prepare the 5¢ end labeled probe by incubating a mixture of 34.5 ml H2O, 5 ml 10× T4 polynucleotide kinase (PNK) buffer (700 mM Tris–HCl, 100 mM MgCl2 and 50 mM Dithiothreitol, pH 76.6), 5 ml PNK, 0.5 ml 100 mM DNA oligonucleotide, and 5 ml (g-32P) ATP (6,000 Ci/mMol) at 37°C for 1 h. 7. Pass the labeling reaction through a G-25 column to eliminate the free ATP. 8. Add the probe to the prehybridization reaction and incubate for 18 h in a hybridization oven. 9. Wash the membrane three times with 2× SSC/0.5% SDS at 42°C. 10. Visualize and quantify the radioactive signals with a PhosphoImager. 4.2. Detection of 3 ¢ Terminal Methylation by b Elimination
The presence of a methyl group on the 3¢ terminal ribose of miR173 was detected by filter hybridization of total RNAs that have been treated with sodium periodate followed by b elimination (13). As shown in Fig. 2a, periodate cleaves the vicinal hydroxyl groups of the last nucleoside of miR173 to produce a dialdehyde when free hydroxyl groups are present in both 2¢ and 3¢ positions on the ribose of the last nucleotide (18). The b elimination reaction then removes the last nucleotide to generate an RNA that is 1 nt shorter and that has a phosphate group at the 3¢ terminus (see Fig. 2a). Thus, after the chemical treatments, miR173 with two free hydroxyl groups at the 3¢ terminus will migrate approximately 2 nt faster than it will without treatment, which can be detected by RNA filter hybridization (see Fig. 2b, hen1-1). If methylation occurs on the 3¢ terminal ribose of miR173, the methyl group will block the chemical reactions. Therefore, the chemical treatment will not change the mobility of methylated miR173 (see Fig. 2b, Ler).
4.2.1. Preparation of RNAs from Ler and hen1-1 (See subheading 3.1.1)
1. Dissolve ~100 mg of RNA in 88 ml borax/boric acid buffer and add 12.5 ml of sodium periodate.
4.2.2. Periodate Treatment and b Elimination
3. Add 10 ml of glycerol and incubate for another 30 min to stop the reaction.
2. Incubate in the dark at RT for 1 h.
4. Add 1 ml glycogen, 10 ml sodium acetate, and 300 ml ethanol to precipitate RNA. 5. Dissolve precipitated RNA in 100 ml of borax/boric acid/ NaOH and incubate for 90 min at 45°C. 6. Pass the reaction through a G25 column to remove salts (optional). 7. Precipitate RNA with ethanol.
144
Yu and Chen
Fig. 2. Detection of miRNA methylation by b elimination. (a) Diagram of periodate treatment followed by b elimination. The last two nucleotides of miR173 are shown. The vicinal hydroxyl groups of the 3¢ terminal ribose react with periodate such that the last nucleoside is converted into a dialdehyde, which is subsequently removed by b elimination. The resulting miR173 is one nucleotide shorter and carries a 3¢ P. (b) The methylation status of miR173 in Ler (wild type) and hen1-1. Total RNAs of Ler or hen1-1 were treated with sodium periodate followed by b elimination, resolved by gel electrophoresis, and hybridized to an antisense miR173 probe, and the hybridization signals were visualized using a PhosphoImager.
4.3. Probing miR173 by RNA Filter Hybridization (See Subheading 3.1.5) 4.4. Detection of 3 ¢ Uridylation with an [a-32P] dATP Incorporation Assay
We employ an (a-32P) dATP incorporation assay to study the 3¢ uridylation of miRNAs (17). After small RNAs are isolated, they are ligated to a 3¢ adaptor, and reverse transcribed with a primer complementary to the 3¢ adaptor (see Fig. 3a). After this, miR167 is selectively amplified with an miRNA-specific primer that corresponds to the 5¢ portion of miR167 and the 3¢ adaptor primer. miR167 with U-tails in hen1-2 will generate a pool of PCR products with various numbers of T residues adjacent to the 3¢ adaptor, whereas miR167 from the wild type will produce products in which no Ts are adjacent to the 3¢ adaptor (see Fig. 3a). Taq DNA polymerase will be used to extend the RT-PCR products with a primer complementary to the 3¢ adaptor, in the presence of only (a-32P) dATP (see Fig. 3a). In this primer extension, the templates from U-tailed miRNAs will generate a ladder of products with varying numbers of A residues in the hen1-2 sample (see Fig. 3b, hen1-2), whereas the
Analysis of miRNA Modifications
145
Fig. 3. (a-32P) dATP incorporation assay. (a) A schematic diagram of an (a-32P) dATP incorporation assay (Adapted from Ref 17). (b) (a-32P) dATP incorporation assay performed on miR167 from Ler and hen1-2.
products from the wild-type sample will be rarely extended beyond the adaptor (see Fig. 3b, Ler). 4.4.1. Enrichment of Small RNAs
Our lab uses a polyethylene glycol/NaCl (PEG/NaCl) method to separate low molecular weight RNAs from high molecular weight RNAs. 1. Dissolve ~1 mg total RNA pellet from Ler or hen1-2 in 400 ml of H2O, add 50 ml of PEG (50%), and 50 ml of NaCl (5 M), mix and leave on ice for at least 1 h.
146
Yu and Chen
2. Centrifuge at 13,000 g for 10 min. Transfer the supernatant to a new tube. 3. Add 1 ml of glycogen, 50 ml of sodium acetate, and 3 volumes of 100% ethanol. Incubate at −20°C for at least 2 h. 4. Centrifuge at maximum speed for 20 min at 4°C. Wash the pellet with 70% ethanol. 5. Air-dry the pellet for 5 min and dissolve in DEPC-treated water. 4.4.2. Isolation of 18–30 nt Small RNAs by Electrophoresis
1. Resolve small RNAs and 32P-labelled RNA size markers on a 15% polyacrylamide gel containing 42% urea. 2. Excise 20–30 nt small RNAs (sizes were estimated based on RNA decade markers) from the gel. 3. Elute small RNAs by incubating the gel slice in RNA elution buffer at 65°C for 4 h. Pass the solution through glass wool, extract with equal volumes of chloroform/phenol twice, and precipitate RNAs with three volumes of 100% ethanol. 4. Air-dry the pellet for 5 min and dissolve in 25 ml of DEPCtreated water.
4.4.3. Ligation to 3¢ Adaptor and Purification of Small RNAs Ligated to the 3¢ Adaptor
1. Dephosphorylate small RNAs by adding 3 ml of 10× NEB Buffer 3 (500 mM tris–HCl, 1,000 mM NaCl, 10 mM MgCl2, and 10 mM Dithiothreitol, pH 7.9) and 2 ml of CIP. Incubate at 37°C for 1 h. 2. Add 70 ml of water, extract with 100 ml of chloroform/phenol and precipitate with ethanol. 3. Dissolve RNAs in 10 ml of water and add 3 ml 10× ligation buffer (500 mM Tris–HCl, 100 mM MgCl2, 10 mM ATP and 100 mM Dithiothreitol, pH7.8), 3 ml BSA, 13 ml adaptor and 1 ml T4 RNA ligase. Incubate for 16 h at 8°C. 4. Purify small RNAs ligated to the 3¢ adaptor by electrophoresis (see Subheading 3.3.2)
4.4.4. Reverse Transcription and PCR Amplification (RT-PCR)
1. Mix 13.5 ml of small RNAs ligated to the adaptor and 2 ml of microP2 primer, incubate at 65°C for 5 min, and leave on ice. 2. Add 2 ml 10× RT buffer (500 mM tris–HCl, 750 mM KCl, 30 mM MgCl2 and 100 mM Dithiothreitol, pH 8.0), 1 ml dNTP (10 mM), 0.5 ml RNase inhibitor, and 1 ml MuLV reverse transcriptase. Incubate at 42°C for 1 h. 3. Perform PCR in the solution containing 38.5 ml H2O, 4 ml RT products, 5 ml 10× PCR buffer (2,000 mM Tris–HCl, 500 mM KCl and15-mM MgCl2, pH 8.4), 1 ml dNTP (10 mM), 1 ml miR167P1, 1 ml microP2, and 0.5 ml Taq DNA polymerase.
4.4.5. Purification of DNA by Electrophoresis
Analysis of miRNA Modifications
147
1. Resolve PCR products and DNA size markers on a 12% native polyacrylamide gel and visualize DNA by ethidium bromide staining. 2. Excise the DNA band from the gel and cut the gel slices into many small pieces. 3. Add 500 ml of 300 mM sodium acetate (pH 5.2), and shake at 37°C for 1 h. 4. Pass the solution through glass wool, extract with equal volumes of chloroform/phenol twice and precipitate with two volumes of 100% ethanol. 5. Dissolve the DNA pellet in 50 ml of water.
4.4.6. (a-32P) dATP Incorporation Assay
1. Mix 12.2 ml H2O, 1 ml DNA (see Subheading 3.3.5), 1.5 ml 10× PCR buffer (2,000 mM Tris–HCl, 500 mM KCl and 15 mM MgCl2, pH 8.4) 0.2 ml (a-32P) dATP, 0.4 ml microP2 (10 mM), and 0.2 ml Taq DNA polymerase. 2. Perform one cycle PCR (94°C for 90 s, 55°C for 30 s, and 72°C for 10 s). 3. Add 15 ml of 2× loading buffer and resolve 5 ml of the reaction in a 15% polyacrylamide gel containing 42% urea. 4. Visualize the radioactive signals with a PhosphoImager.
5. Notes 1. The molecular weight of the biotinylated probe should have a large difference from that of the miRNA to be isolated. This is to prevent the biotinylated probe, which will be inevitably eluted in the purification process together with the miRNA, from interfering with the mass spectrometry analysis of the miRNA. 2. It is convenient to make a 1 l stock without the addition of APS and TEMED. The stock can be stored at 4°C in the dark. 3. To make borax/boric acid buffer (0.06 M, pH 8.6), make 0.06 M borax and 0.06 M boric acid. Use borax to adjust the pH of the boric acid to 8.6. 4. Sodium periodate needs to be kept in the dark, as it is sensitive to light. 5. As this experiment is to study the 3¢ terminus of the miRNA, the miRNA-specific primer should correspond to the 5¢ portion of the miRNA. 6. Do not completely dry the RNA pellet, as this will greatly decrease its solubility.
148
Yu and Chen
7. To obtain enough miRNA for mass spectrometry analysis, the starting amount of total RNAs should be scaled up based on the amount described here. 8. Save the supernatant from step 3 until you are certain that satisfactory binding and elution of the miRNA have occurred. 9. The current for the transfer is 2 mA per cm2 membrane, but this needs to be experimentally determined for other transfer apparatus. References 1. Bartel DP (2004) MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116:281–297 2. Grishok A, Pasquinelli AE, Conte D, Li N, Parrish S, Ha I, Baillie DL, Fire A, Ruvkun G, Mello CC (2001) Genes and mechanisms related to RNA interference regulate expression of the small temporal RNAs that control C. elegans developmental timing. Cell 106:23–34 3. Hutvágner G, McLachlan J, Pasquinelli AE, Balint É, Tuschl T, Zamore PD (2001) A cellular function for the RNA-interference enzyme Dicer in the maturation of the let-7 small temporal RNA. Science 293:834–838 4. Lee Y, Ahn C, Han J, Choi H, Kim J, Yim J, Lee J, Provost P, Radmark O, Kim S, Kim VN (2003) The nuclear RNase III Drosha initiates microRNA processing. Nature 425: 415–419 5. Ketting RF, Fischer SE, Bernstein E, Sijen T, Hannon GJ, Plasterk RH (2001) Dicer functions in RNA interference and in synthesis of small RNA involved in developmental timing in C. elegans. Genes Dev 15:2654–2659 6. Park W, Li J, Song R, Messing J, Chen X (2002) CARPEL FACTORY, a Dicer homolog, and HEN1, a novel protein, act in microRNA metabolism in Arabidopsis thaliana. Curr Biol 12:1484–1495 7. Reinhart BJ, Weinstein EG, Rhoades MW, Bartel B, Bartel DP (2002) MicroRNAs in plants. Genes Dev 16:1616–1626 8. Lobbes D, Rallapalli G, Schmidt DD, Martin C, Clarke J (2006) SERRATE: a new player on the plant microRNA scene. EMBO Rep 7:1052–1058 9. Yang L, Liu Z, Lu F, Dong A, Huang H (2006) SERRATE is a novel nuclear regulator in primary microRNA processing in Arabidopsis. Plant J 47:841–850
10. Fang Y, Spector DL (2007) Identification of nuclear dicing bodies containing proteins for microRNA biogenesis in living Arabidopsis plants. Curr Biol 17:818–823 11. Song L, Han MH, Lesicka J, Fedoroff N (2007) Arabidopsis primary microRNA processing proteins HYL1 and DCL1 define a nuclear body distinct from the Cajal body. Proc Natl Acad Sci U S A 104:5437–5442 12. Basyuk E, Suavet F, Doglio A, Bordonne R, Bertrand E (2003) Human let-7 stem-loop precursors harbor features of RNase III cleavage products. Nucleic Acids Res 31:6593–6597 13. Yu B, Yang Z, Li J, Minakhina S, Yang M, Padgett RW, Steward R, Chen X (2005) Methylation as a crucial step in plant microRNA biogenesis. Science 307:932–935 14. Yang Z, Ebright YW, Yu B, Chen X (2006) HEN1 recognizes 21–24 nt small RNA duplexes and deposits a methyl group onto the 2¢ OH of the 3¢ terminal nucleotide. Nucleic Acids Res 34:667–675 15. Chen X, Liu J, Cheng Y, Jia D (2002) HEN1 functions pleiotropically in Arabidopsis development and acts in C function in the flower. Development 129:1085–1094 16. Boutet S, Vazquez F, Liu J, Beclin C, Fagard M, Gratias A, Morel JB, Crete P, Chen X, Vaucheret H (2003) Arabidopsis HEN1: a genetic link between endogenous miRNA controlling development and siRNA controlling transgene silencing and virus resistance. Curr Biol 13:843–848 17. Li J, Yang Z, Yu B, Liu J, Chen X (2005) Methylation protects miRNAs and siRNAs from a 3’-end uridylation activity in Arabidopsis. Curr Biol 15:1501–1507 18. Alefelder S, Patel BK, Eckstein F (1998) Incorporation of terminal phosphorothioates into oligonucleotides. Nucleic Acids Res 26: 4983–4988
Chapter 11 MicroRNA Promoter Analysis Molly Megraw and Artemis G. Hatzigeorgiou Abstract In this chapter, we present a brief overview of current knowledge about the promoters of plant microRNAs (miRNAs), and provide a step-by-step guide for predicting plant miRNA promoter elements using known transcription factor binding motifs. The approach to promoter element prediction is based on a carefully constructed collection of Positional Weight Matrices (PWMs) for known transcription factors (TFs) in Arabidopsis. A key concept of the method is to use scoring thresholds for potential binding sites that are appropriate to each individual transcription factor. While the procedure can be applied to search for Transcription Factor Binding Sites (TFBSs) in any pol-II promoter region, it is particularly practical for the case of plant miRNA promoters where upstream sequence regions and binding sites are not readily available in existing databases. The majority of the material described in this chapter is available for download at http://microrna.gr. Key words: MicroRNA, Transcription factors, Promoter, Sequence scanning, Position-specific weight matrices
1. Overview Plant miRNAs are primarily encoded in intergenic regions, and down-regulate the expression of a gene by guiding an Argonaute protein complex in slicing a highly complementary mRNA-target molecule (1). They are located inside longer transcripts, which are transcribed by RNA polymerase II (pol-II). Although much effort has been focused on elucidating the mechanism of miRNA target gene regulation, relatively little is known and published about the regulation of miRNA genes themselves. The nature of miRNA promoter elements remains one of the most interesting, open problems in the study of miRNA biogenesis, since their identification would aid the understanding of regulatory networks in which miRNAs play a crucial role.
B.C. Meyers and P.J. Green (eds.), Plant MicroRNAs, Methods in Molecular Biology, vol. 592 DOI 10.1007/978-1-60327-005-2_11, © Humana Press, a part of Springer Science + Business Media, LLC 2009
149
150
Megraw and Hatzigeorgiou
While TFBS prediction in vertebrates is greatly facilitated by hundreds of transcription factor binding motifs characterized as Positional Weight Matrices (PWMs) in the database TRANSFAC (2), and several plant databases contain predicted sites within the regions upstream of protein-coding loci (3–5), we found that searching for TFBSs upstream of noncoding loci within plant genomes essentially meant starting from scratch with few available PWMs and without standardized tools for sequence extraction or analysis of intergenic regions. In Megraw et al. (6), we describe how we constructed our own collection of 99 PWMs for known transcription factors in Arabidopsis and implemented a log-likelihood-based scoring function to identify putative binding sites for these factors. Furthermore, we discovered that the use of PWM-specific threshold values is superior to the common procedure of using a single score threshold across all PWMs, and that this method was key to obtaining meaningful results in our analysis of Arabidopsis miRNA promoter regions. The purpose of this chapter is to provide a step-by-step guide to TFBS prediction for plant researchers, a guide that takes advantage of this procedure. Specifically, we present a practical statistical method, which allows the researcher to obtain a set of predicted Transcription Factor Binding Sites (TFBSs), which are at least as likely as any trusted collection of sites. This collection may be experimentally determined (7) or include predicted sites (3–5). The central idea of this method is to use cutoff thresholds appropriate to each individual TF as well as to background sequence composition. The method is conceptually simple and easy to use. While the procedure can generally be applied to search for TFBSs in any pol-II promoter region, it is particularly practical for the case of plant miRNA promoters where upstream regions and binding sites are not readily available in existing databases.
2. Materials 1. PC/Mac with Java installed (http://www.java.com/en/download/index.jsp), in order to use an existing computer program to perform all required calculations. Alternatively, necessary calculations can be implemented by the reader. 2. Source of transcription factor binding motifs for TFs of interest. Alternatively, begin with representations already constructed at http://www.diana.pcbi.upenn.edu/Tools/ PlantTFBS/. 3. Source of miRNA promoter sequences in FASTA format. Alternatively, begin with experimentally supported miRNA promoter sequences available at http://www.diana.pcbi. upenn.edu/Tools/PlantTFBS/.
MicroRNA Promoter Analysis
151
3. Methods The methods described below outline (1) the construction of a Positional Weight Matrix (PWM) representation of a TF binding motif, (2) the construction of a background model, (3) the choice of a threshold for each TF, (4) the choice of promoter regions and methods of sequence extraction, and (5) scanning promoter regions for putative TFBSs. 3.1. Construction of PWMs for Each TF
A PWM is a standard way of representing TF binding motifs (8). A PWM represents each TF binding motif as a simple probability model that describes the chance of finding a particular nucleotide (A, C, G, or T) at a particular position of the motif. This model is encoded by a matrix which consists of the number of occurrences of each nucleotide in each binding position of the motif. Fig. 1 illustrates this concept. In order to build a PWM for a particular TF of interest, simply follow these steps: 1. List the sequences of acceptable binding sites for the TF. Ideally, each sequence represents one experimentally supported binding observation at a particular site. If multiple binding observations occur at sites with the same sequence, each one should be listed separately. If some instances of the motif contain more positions than others, choose a common “core” of meaningful positions (see Note 1). 2. Using this list of sequences, count the number of A’s, C’s, G’s, and T’s at each sequence position. Record the counts in a table similar to the one shown in Fig. 1. 3. To ensure that no entry of the PWM matrix will be exactly zero, add a small number of pseudocounts (e.g., 0.25 counts) to each entry of the count matrix (see Note 2). 4. For each column of the table (sequence position), add up the counts in the column. 5. For each entry (nucleotide) of each column, divide the entry by the column total computed in step 3. A frequency matrix similar to the one shown in Fig. 1 will result. As a check, the entries in each column should now sum up to one. We used this procedure to construct PWMs representing 99 TF binding motifs in Arabidopsis from the AGRIS (3) and AtProbe (7) databases. Sites obtained from AtProbe are experimentally supported observations that have been manually curated from the literature, whereas the AGRIS database includes sites that have not been experimentally validated. A command-line Java tool to construct PWMs from sequences and the entire PWM collection constructed in our study (6) are freely available at http://www. diana.pcbi.upenn.edu/Tools/PlantTFBS/.
152
Megraw and Hatzigeorgiou Acceptable binding sites for a TF
C
C
C
C
C C
A
A
T
G
T
T
G
C
A
T
C
A
T
A
T
G
T
G
Positional weight matrix for a TF
1
2
3
4
5
6
A
0
0
1
.25
0
0
C
1
1
0
0
0
0
G
0
0
0
0
0
1
T
0
0
0
.75
1
0
Fig. 1. Given a collection of sequences, which represent observed binding sites for a TF, a PWM “counts up” the number of A’s, C’s, G’s, and T’s in each position in order to describe the chance of finding each nucleotide in this position. The PWM can be visualized as a “logo” which describes how often the TF is expected to bind certain types of sites.
3.2. Construction of a Background Model
There is a noted compositional bias toward A-T enrichment in plant promoters (9), and we also observed such a bias in the Arabidopsis promoter sets that we examined in our study (6). Specifically, we observed that the TAIR6 genome build contains a mean A-T content of 64.0%, while our miRNA and proteincoding gene promoter sets in the region from the TSS to 800 bp upstream each contain a mean A-T content of 68.9 and 67.8%, respectively. The similarity in base composition between miRNA promoter and protein coding gene promoter sets supports the idea that an A-T rich background is a biologically meaningful aspect of Arabidopsis pol-II promoter regions. These observations are of direct relevance in PWM-based TFBS prediction within plant miRNA promoters. In order to decide if a subsequence within a promoter region is a good candidate binding site for a particular TF, intuitively we would want to know if it is more likely to observe this sequence under the PWM model than to observe it simply as a result of the underlying nucleotide distribution within promoter regions. This is exactly the question addressed by the likelihood scoring method described in the following section. In order to ask this question, we first need to estimate the underlying nucleotide distribution or background model. A straightforward practical model is to assume
MicroRNA Promoter Analysis
153
Background frequency vector A
.34
C
.16
G
.16
T
.34
Fig. 2. Example of a simple background model, computed as a background frequency vector. This example reflects the A-T rich nature of Arabidopsis promoter sequences.
P(Sequence S is observed under PWM model M ) Score = P(Sequence S is observed under background model B ) L ∏ PM ( si ) i =1 = log L S = s1s2s3...sL P ( s ) ∏ B i si ∈{A,C,G,T} i =1 and L = ∑ (log(PM ( si ) )− log (PB ( si ) )) where P M (si) denotes the probability of observing base si i =1 in position i of PWM model M P B (si) denotes the probability of observing base si in position i of background model B
Fig. 3. Log-likelihood score function for comparing subsequence S to PWM M.
that nucleotides are drawn at random according to the mean observed frequencies for each nucleotide within a large collection of promoter regions. Figure 2 gives an example of a background frequency vector computed for the case of Arabidopsis promoter regions. This frequency vector is available for use with the TFBS scoring and scanning programs at http://www.diana.pcbi.upenn. edu/Tools/PlantTFBS/. Construction of a new background frequency vector for another species is straightforward: 1. Obtain a set of promoter sequences for pol-II genes, ideally as large a set as possible with as much experimental support as possible. 2. Compute and record the observed frequency of each nucleotide in this set. 3.3. Threshold Choice for Each TF
In our promoter element search we will use a log-likelihood scoring function to scan upstream sequences for potential TF binding sites (Fig. 3), which is a standard PWM-based scoring approach (10). Using the log-likelihood score equation in Fig. 3, the score for each binding sequence can be computed as follows: For each position i = 1…L of the sequence: 1. What is the nucleotide at position i (is it A, C, G, or T)? Call this nucleotide si.
154
Megraw and Hatzigeorgiou
2. Look at the table of frequencies given by the PWM: for position i, what is the frequency of nucleotide si? This is the value found in the ith column of the PWM at the row associated with nucleotide si. Call this frequency PM(si). 3. Look at the vector of background frequencies: what is the background frequency of nucleotide si? Call this frequency PB(si). 4. Compute the value of the term: log(PM(si)) – log(PB(si)) where log refers to the natural logarithm of a number. Add up the L computed terms to get the Score for this sequence. Intuitively, the log-likelihood score compares the probability of observing a particular subsequence according to our PWM model to the probability of observing that subsequence according to a background model. A high score is indicative of a good match to the TF binding motif, but how good must this score be for us to conclude that the subsequence is a binding site for that TF? The need for a meaningful threshold can be addressed with the following simple procedure for each TF of interest: 1. Choose a trusted collection of binding sites for the TF. This collection could include experimentally verified and/or putative sites, and/or sites which were not used to construct the PWM for this TF. The important point is to choose a collection which represents one’s best estimate of the full range of sites where the TF can bind. 2. List the possible binding sequences for this collection of sites. Even if many sites are associated with the same binding sequence, it is only necessary to list each binding sequence once. 3. Compute the log-likelihood score for each binding sequence in the list as described above (Fig. 3). 4. Choose the lowest Score from the list as the Threshold Score. The consequence of this procedure is that, when scanning a particular set of upstream sequences, we will only “discover” a binding site occurrence of a particular TF if its PWM-based loglikelihood score is at least as strong as all trusted binding sites for that TF. If more stringent site selection is desired, putative sites can later be filtered according to their scores. A command-line Java program for performing this thresholding procedure, as well as the set of thresholds determined by this procedure for the 99 PWMs constructed in our Arabidopsis miRNA promoter study (6), are freely available at http://www.diana.pcbi.upenn.edu/Tools/PlantTFBS/. 3.4. Selection of Promoter Regions
Binding sites predicted on the basis of sequence are clearly much more likely to be functional in the promoter regions of genes than elsewhere. It is therefore important to estimate the true promoter region using as much evidence about TSS location as possible. Ideally, evidence from a 5¢ cap detection method is available for the miRNA transcript of interest. Recent plant promoter prediction methods for pol-II genes assert approximately 60–80%
MicroRNA Promoter Analysis
155
accuracy for miRNA promoter prediction (11, 12), and predicted promoters for some miRNAs can be obtained online from these sources (see Note 3). Even when evidence for the TSS location is available, it is important to obtain accurate information on genomic location of the TSS and extract the sequence from the most recent plant genome build. The identification of 63 miRNA Transcription Start Sites (TSSs) in Arabidopsis (13) via 5¢ RACE recently opened an exciting opportunity for computational analysis of miRNA promoter regions in plants. Data from these experimentally supported TSSs suggests that in plants, the location of the mature miRNA generally falls within about 300 nt of the downstream-most TSS. Therefore, in the absence of any additional information, the location of the mature miRNA itself can provide a first very rough approximation for an appropriate upstream search region. As functional TFBS sites are unlikely to reside within the miRNA precursor, it is also helpful to obtain estimates of the miRNA precursor location from miRBase (14) or via RNA secondary structure prediction (see Note 5). A second concern in selecting promoter regions for TFBS prediction is to decide how much upstream sequence to search. While enhancer elements for plant genes may be present several kilobases upstream of the TSS, we found that the vast majority of experimentally supported promoter elements reported in the literature for Arabidopsis protein-coding genes fall within 800 nt of the annotated TSS. While this may be due in part to the fact that many experiments focus on the region immediately proximal to the annotated TSS, a search region of approximately 1 kb upstream does provide a starting point for a segment where functional binding sites for many known TFs are likely to be found. As part of our study on Arabidopsis miRNA promoters, we performed the necessary sequence extraction steps for all experimentally supported miRNA promoters from Xie et al. (13). We selected a length of 800 nt upstream of each TSS, and have made the genomic locations and sequences for these promoters available in FASTA format at http://www.diana.pcbi.upenn.edu/Tools/ PlantTFBS/. Multiple alternative TSSs were indicated by the 5¢ RACE experiments for about 20% of the miRNAs in this study, and in these cases, we selected the downstream-most TSS in order to provide the most comprehensive TFBS search region. We summarize the current options for obtaining upstream sequence here: 1. If the miRNA of interest has an experimentally supported TSS identified by the study (13), obtain its genomic location and/ or download the promoter sequence directly from http:// www.diana.pcbi.upenn.edu/Tools/PlantTFBS/. 2. If an experimentally supported TSS or a predicted TSS is available from another source, but the desired length of upstream
156
Megraw and Hatzigeorgiou
sequence is not directly available, use one of two options to obtain the sequence: (a) If genomic location of the TSS is directly available, identify the genome build to which this location refers and download the desired length of upstream sequence (see Note 4). For Arabidopsis, this task can be performed at the TAIR Web site http://www.arabidopsis.org/. (b) If genomic location for the TSS itself is not available but some portion of upstream or surrounding sequence is available, use a BLAST tool to find the best matching location for this TSS within the upstream vicinity of the miRNA. For Arabidopsis, TAIR BLAST is available at http://www.arabidopsis.org/Blast/index.jsp. 3. If no additional source for estimating TSS location is available for the miRNA of interest, locate the miRNA precursor and use the upstream-most endpoint of the hairpin foldback as the first approximation for the TSS. Genomic locations for miRNA precursors are available in miRBase for several plant species at http://microrna.sanger.ac.uk/sequences/ftp.shtml. 4. If the miRNA of interest is not contained in miRBase and no additional source for estimating TSS location is available, two options remain: (a) Use the location of the mature miRNA sequence as a very rough approximation of the TSS. (b) Predict the hairpin foldback for this miRNA using an RNA secondary structure prediction tool (see Note 5). Use the upstream-most endpoint of the predicted hairpin foldback as an approximation for the TSS. 5. As many TFs do not show a marked strand bias for binding, it is also useful to create a reverse complement set of sequences for TFBS prediction. After obtaining a forward-strand set of promoter sequences as described in steps 1–4, a reverse-complement set can be obtained using a sequence processing tool such as the one available at http://www.bioinformatics.org/sms2/ rev_comp.html. 3.5. Scanning Promoter Regions for Putative TFBSs
The log-likelihood scoring function and associated TF-specific threshold score are used to scan miRNA promoter sequences for putative TF binding site occurrences, as illustrated in Fig. 4. The sites found to exceed the PWM-specific threshold score for each TF are considered to be putative TFBSs. Use the following steps to perform the scan for each TF and each upstream sequence selected above. In these steps, suppose the PWM for a particular TF has L positions and the upstream region under consideration is of length N.
MicroRNA Promoter Analysis
157
Observed Site High
.
. . . . . ... . .. . . . . . . .... . . .. . ..
Threshold Scores
0
Low
ACGGTACCTATTGACGAGCTCCAATGTGA Promoter Sequence
PWM
Fig. 4. An illustration to visualize scanning for binding sites: only those sites in a promoter sequence that exceed the PWM-specific threshold score are “observed” as putative binding sites.
1. For the subsequence or “window” that starts at the first nucleotide of the upstream region, compute the log-likelihood Score for the TF as described in Subheading 3.3, Fig. 3. 2. If the Score is greater than or equal to the Threshold Score previously determined for this TF, record the location and Score for this window as a putative TFBS. 3. Repeat this procedure for each of the remaining windows in the upstream region. As illustrated in Fig. 4, the idea of this routine is effectively to “slide” the PWM along the upstream sequence and ask at each window location “does this site match the PWM well enough to be considered a putative TFBS?” The procedure for computing the score at each position is exactly the same as the procedure for computing the score of each binding sequence when choosing a TF threshold. A command-line Java program for performing this sequence scanning procedure is freely available at http://www. diana.pcbi.upenn.edu/Tools/PlantTFBS/.
4. Discussion In this chapter, we discuss the computational identification of TFBSs for miRNA promoters in plants. In a recent work, we used the above-described method to analyze the promoter regions of the 63 experimentally verified miRNA TSSs of miRNAs in Arabidopsis (13). This study remains the most comprehensive published source of experimentally supported miRNA TSSs to date, though it is anticipated that additional data will soon become
158
Megraw and Hatzigeorgiou
available through deep sequencing of small RNA cDNA libraries (15). Using this published source, we analyzed regions up to 800 nt upstream of the miRNA TSS sites to search for known transcription factor binding elements. The goal of our investigation was to determine whether “miRNA-preferred” transcription factor binding elements exist in plants. In our analysis (6), we observed a predominance of TATAcontaining miRNA promoters, and found that miRNA promoters in general have a similar AT-rich base composition to their proteincoding counterparts in plants. However, while many of the same transcription factor binding elements were found in Arabidopsis miRNA promoters as in protein-coding gene promoters, the distribution of these elements differed between the two promoter types. Several factors were found significantly more frequently in miRNA promoters (AtMYC2, ARF, SORLREP3, and LFY), and these “miRNA-preferred” factors were consistent with prominent roles for miRNAs in organism development and adaptation. This investigation also suggested that miRNAs may be involved in direct feedback loops with hormonal regulators in plants (Fig. 5). An anecdotal example of such a case was found and described in (6). In this study, we found that two miRNAs with putative ARF binding sites upstream, miR-160 and miR167, have targets belonging to the ARF gene family. These targets were not only computationally predicted, but were also found to have experimental evidence for binding recorded in TarBase (16), a database for experimentally supported interactions between miRNAs and mRNAs. While our computational study yielded some initial insights into the nature of plant miRNA promoters, a vast amount of work still remains to be done in this new area of investigation. In many cases, the researcher will need to begin by predicting TFBSs within known or putative miRNA promoter regions. This collection
Transcription Factor (TF)
miRNA
Reduced amount of TF available to promote miRNA transcription 5’
TF mRNA
DNA
miRNA is transcribed and processed into mature sequence
3’
Mature miRNA binds and represses translation of TF mRNA
Fig. 5. An illustration of a miRNA and a transcription factor in a negative feedback loop. A TF X is activating a miRNA Y. After miRNA Y is transcribed, it targets the messenger RNA of TF X. This results in a reduced amount of protein produced by TF X, in turn lowering the expression of the transcript hosting miRNA Y.
MicroRNA Promoter Analysis
159
of sites can then be used as a starting point for the identification of candidate regions for ChIP-on-Chip experiments, for the incorporation of gene expression data in order to increase confidence in specific sites, for forming hypotheses about genetic pathways of interest, and for many other useful follow-on analyses.
5. Notes 1. In the framework of this method, PWMs are constructed from a contiguous set of aligned binding site positions. However, some instances of binding site motifs reported in databases or in the literature may have different lengths than others for the same TF. In this case, choose a contiguous set of positions which are common to these motifs, and discard positions for which there is missing data. 2. Even though a certain nucleotide may never appear in a particular position within the observed binding sequence data for a TF, it is understood in this situation that the observed data are unlikely to represent every possible binding site for that TF. Rather than represent the appearance of this nucleotide in such a position as an impossible event, a more realistic approach would be to represent it as a very rare event. The addition of pseudocounts is a well-established practical method for addressing this situation in the case where it is otherwise very difficult to estimate the frequency of such rare events. By adding a relatively small number (this number can also be fractional) of extra counts to each entry when summing up the total number of observations, “impossible events” are eliminated from the probability model and replaced in a proportional manner with rare events. 3. The first cited source provides a web-based interface that accepts regions of sequence for promoter prediction (http:// softberry.com/berry.phtml?topic = tssp&group = programs& subgroup = promoter). The second cited source provides a set of predicted promoters for some miRNAs as supplementary material online (http://cic.cs.wustl.edu/microrna/ath_miRNA_ promoter.fa). 4. If the miRNA of interest is on the “minus” strand of the chromosome, keep in mind that the desired start and end points of the upstream sequence segment will be annotated with numerically larger values than the genomic location of the TSS itself. 5. Several programs are available for secondary structure prediction. Among the most well-known software packages are RNAfold (17) and MFOLD (18). It is important to bear in mind that installing, running, and selecting energy thresholds
160
Megraw and Hatzigeorgiou
for these programs can be a nontrivial task when undertaken for the first time. Several programs have online versions available and they are expedient when only a few sequences need to be folded; however, energy thresholds must still be appropriately chosen. Help from an informatics colleague with experience in running these programs is ideal for first-time users.
Acknowledgments The authors thank Shane Jensen, Vesselin Baev, Ventsislav Rusinov, and Kriton Kalantidis for their contributions to the study from which this work was derived. This work was supported by an NSF Career Award (DBI-0238295).
References 1. Jones-Rhoades MW, Bartel DP, Bartel B (2006) MicroRNAs and their regulatory roles in plants. Annu Rev Plant Biol 57:19–53 2. Wingender E, Chen X, Hehl R, Karas H, Liebich I, Matys V, Meinhardt T, Pruss M, Reuter I, Schacherer F (2000) TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res 28:316–319 3. Davuluri RV, Sun H, Palaniswamy SK, Matthews N, Molina C, Kurtz M, Grotewold E (2003) AGRIS: Arabidopsis gene regulatory information server, an information resource of Arabidopsis cis-regulatory elements and transcription factors. BMC Bioinformatics 4:25 4. Steffens NO, Galuschka C, Schindler M, Bulow L, Hehl R (2005) AthaMap web tools for database-assisted identification of combinatorial cis-regulatory elements and the display of highly conserved transcription factor binding sites in Arabidopsis thaliana. Nucleic Acids Res 33:W397–W402 5. O’Connor TR, Dyreson C, Wyrick JJ (2005) Athena: a resource for rapid visualization and systematic analysis of Arabidopsis promoter sequences. Bioinformatics 21:4411–4413 6. Megraw M, Baev V, Rusinov V, Jensen ST, Kalantidis K, Hatzigeorgiou AG (2006) MicroRNA promoter element discovery in Arabidopsis. RNA 12:1612–1619 7. Hoffman M, Zhang MQ (2001) AtProbe: Arabidopsis thaliana promoter binding element database. http://exon.cshl.org/cgi-bin/ atprobe/atprobe
8. Stormo GD (2000) DNA binding sites: representation and discovery. Bioinformatics 16:1 6–23 9. Pandey SP, Krishnamachari A (2006) Computational analysis of plant RNA Pol-II promoters. Biosystems 83:38–50 10. Durbin R, Eddy S, Krogh A, Mitchison G (1999) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge 11. Zhou X, Ruan J, Wang G, Zhang W (2007) Characterization and identification of microRNA core promoters in four model species. PLoS Comput Biol 3:e37 12. Solovyev VV, Shahmuradov IA (2003) PromH: promoters identification using orthologous genomic sequences. Nucleic Acids Res 31:3540–3545 13. Xie Z, Allen E, Fahlgren N, Calamar A, Givan SA, Carrington JC (2005) Expression of Arabidopsis MIRNA genes. Plant Physiol 138:2145–2154 14. Griffiths-Jones S, Grocock RJ, van Dongen S, Bateman A, Enright AJ (2006) miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res 34:D140–D144 15. Lu C, Meyers BC, Green PJ (2007) Construction of small RNA cDNA libraries for deep sequencing. Methods 43:110–117 16. Sethupathy P, Corda B, Hatzigeorgiou AG (2006) TarBase: a comprehensive database of experimentally supported animal microRNA targets. RNA 12:192–197
17. Hofacker IL, Fontana W, Stadler PF, Bonhoeffer S, Tacker M, Schuster P (1994) Fast folding and comparison of RNA secondary structures. Monatsh Chem 125:167–188
MicroRNA Promoter Analysis
161
1 8. Zuker M (2003) Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res 31: 3406–3415
Chapter 12 Computational Methods for Comparative Analysis of Plant Small RNAs Gayathri Mahalingam and Blake C. Meyers Abstract Small RNAs play an important role in plant development, stress responses, and epigenetic regulation, primarily through their role in transcriptional and post-transcriptional silencing of specific target genes and loci. Most if not all plants utilize these small RNA signaling networks. We have developed a deepsequencing based dataset of plant small RNAs, based on the hypothesis that comparisons among the complex pool of small RNAs from diverse plants will identify novel types of conserved, regulated, or species-specific molecules. A database containing upward of hundreds of millions of plant small RNA sequences is being created for comparative analyses. This small RNA database will allow the experimental characterization of the majority of the biologically important small RNAs for a range of plant species. This database can be accessed from our website (http://smallrna.udel.edu/). A variety of web-based tools have been developed for analyses of these data. Here, we focus on these tools, and we describe how the users can implement these tools to analyze and interpret the small RNA data and how the users could use similar approaches for other sets of plant small RNAs from diverse species. Key words: Small RNA; Comparative analysis
1. Introduction 1.1. Small RNAs
Two major types of small RNAs (21–24 nucleotides in size), known as small interfering RNAs (siRNAs) (1) and microRNAs (miRNAs) (2), are present in a wide variety of eukaryotic organisms (3, 4). miRNA molecules originate from distinct genomic loci predicted to form “hairpin” structures that often have an imperfect double-stranded characteristic (5). They are cleaved from the hairpin as a duplex by a DICER-LIKE protein (DCL1), and the miRNA strand of this duplex becomes associated with an AGO protein in a complex called RISC (6, 7). In plants, miRNAs
B.C. Meyers and P.J. Green (eds.), Plant MicroRNAs, Methods in Molecular Biology, vol. 592 DOI 10.1007/978-1-60327-005-2_12, © Humana Press, a part of Springer Science + Business Media, LLC 2009
163
164
Mahalingam and Meyers
almost always induce cleavage and accelerated degradation of their “target” mRNAs by forming base-pairing interactions (8, 9). They can also direct cleavage of non-coding RNAs to induce production of trans-acting siRNAs (ta-siRNAs) (10, 11). In addition, some miRNAs act by preventing mRNA translation and thereby limiting protein production (12, 13). In contrast to siRNAs, miRNAs usually do not match perfectly to their target mRNA molecules (14–16). The biological roles of miRNA are predominantly associated with development (4, 17, 18), but they also play a role in stress responses. Differential accumulation of miRNAs in different tissues is common, and many miRNAs target transcription factor mRNAs (19, 20). Moreover, developmental defects are associated with the miRNA metabolism mutants discussed above (10, 21–24). Although the association of small RNAs with viral resistance has been known for some time, more recent data demonstrate a role of miRNAs in plant disease resistance against bacterial or other pathogens (25, 26). Associations of miRNAs and natural antisense siRNAs with abiotic stresses are also well documented (27, 28). Functions of siRNAs include protection against viruses and mobile genetic elements and other repetitive sequences (29, 30). In addition, several genes are regulated at the level of chromatin modifications and DNA methylation, and small RNAs play a major role in the establishment and maintenance of these marks (31, 32). 1.2. Conservation of Small RNAs in the Plant Kingdom
Cloning of miRNAs from moss and other lower land plants indicates that miRNAs are conserved over more than 400 million years of evolution (33–35). While these data suggest that many miRNAs are evolutionarily conserved, some miRNAs may not be conserved across species, such as the Arabidopsis miRNAs 158, 161, 163, and 173, which are not present in rice (16). At least two of these miRNAs have evolved recently, suggesting that other species-specific miRNAs are likely to be found (36); indeed, our recent deep sequencing in an rdr2 background demonstrates that many nonconserved miRNAs are found at low expression levels and have eluded detection because of these low abundance levels. The discovery of new miRNAs with unique characteristics, as well as more detailed studies of known miRNAs, will likely lead to new computational tests to enable miRNA discovery; examples of this have been published (37, 38).
1.3. Novel Technologies for Deep Sequencing of Small RNAs
Several approaches have emerged for sequencing of small RNAs (39). Previously, the standard method for the isolation and identification of small RNAs involved gel-purification and cDNA production, followed by the cloning of single molecules or longer “concatamers” of these molecules that are subsequently sequenced (40–42). The latest generation of sequencing technologies includes a method called Sequencing by Synthesis (SBS), developed
Computational Methods for Comparative Analysis of Plant Small RNAs
165
by Illumina, Inc. SBS enables deep sequencing at a low cost (>2,000,000 sequences per channel), while enabling the complete sequencing of the small RNA. Several other technologies have been described or promised that report potential total numbers of reads that meet or exceed those of SBS. The methods that we describe in this chapter all assume this level of sequencing depth, and methods for the construction of the libraries are described in Chapter 8 of this volume. 1.4. Deep Sequencing to Identify Small RNAs in Diverse Plant Species
To generate whole-genome data on the diversity and abundance of siRNAs and miRNAs in plants, this chapter assumes the availability of a set of libraries representing several, preferably diverse plant species. Because of the depth of sequencing, these data will define nearly the complete repertoire of endogenous small RNA molecules generated in these plant species. Our own project includes a broad selection of more than 30 plant species, but the methods and applications that we describe could be applied to a much smaller set of species.
1.5. Selection of Species from Across the Plant Kingdom
The set of small RNA libraries that we are developing is indicated in Fig. 1, and this list includes nearly 100 libraries representing more than 33 plant species. To improve our ability to match small RNA sequences to their genomic sources, a set of selected species includes most plants for which extensive genomic resources are being developed. The species were selected on the basis of the following series of criteria: 1. Diversity in the plant kingdom. We have chosen representative species that represent either important families or nodes. The Floral Genome Project has worked on many of these species for several years and has an extensive justification for their species selections, available on the web (http://fgp.bio.psu.edu/ fgp/taxa/rationale.html). Their rationale includes the following: (a) phylogenetic position; (b) diversity of floral-organ structure; (c) direct relevance to crop or economic plants; (d) diploid with a small genome size; (e) availability of inbred lines, when possible; (f) desirable properties, such as large numbers of flowers per plant, transformability, and prior flower developmental studies; and (g) lack of prior genomic resources. 2. Economic importance. Because the focus of the NSF Plant Genome Research Program is on economically important crops, we have biased our selection in the favor of these species. We have also selected two families for focused analyses including the Poaceae and Solanaceae. 3. Research importance. While we have ongoing small RNA projects in many of the important model species (Arabidopsis, rice, Medicago truncatula), there are many other model species of growing importance.
166
Mahalingam and Meyers Cucurbita maxima (pumpkin) Fabaceae (Medicago, soybean) Malus (apple) Populus (poplar) Gossypium (cotton) Citrus sinensis (orange) Arabidopsis Silene alba Vitis vinifera (grape) Ribes (currant) Antirrhinum (snapdragon) petunia tobacco pepper tomato potato Lactuca sativa (lettuce) Mimulus guttatus Vaccinium (blueberry) Mesembryanthemum (ice plant) Beta (beet) Eschscholzia (California poppy) Persea americana (avocado) Liriodendron (yellow poplar) Saruma henryi Halophila decipiens (paddle grass) Crocus sativus Cocos nucifera (coconut palm) Musa acuminata (banana) rice switchgrass maize sorghum barley wheat Acorus (sweet flag) Illicium (star anise) Nuphar advena (water lily) Amborella trichopoda Cryptomeria (Japanese cedar) Welwitschia Gnetum Ephedra (Mormon tea) Pinus radiata Ginkgo biloba Zamia fischeri (cycad) Marsilea quadrifolia Ceratopteris Angiopteris Psilotum Selaginella Lycopodium Physcomitrella (moss) Anthoceros (hornwort) Marchantia (liverwort) Chara Coleochaete Mesostigma Acetabularia Caulerpa Chlamydomonas Volvox
*
*
eudicots
Solanaceae
*
*
ferns
*
monocots
green algae
*
non-seed land plants
gymnosperms
basal angiosperms
Poaceae
* **
magnoliids
*
* * Fig. 12.1. Phylogenetic tree of the selected plant species: Branch lengths are approximate. Gray text indicates species that will not be sampled (shown for reference only) or for which small RNA data are already available or underway. Asterisks indicate whole genome sequences are available or underway (only Capsella rubella, and Triphysaria are not shown); Arabidopsis has two sequences, A. lyrata and A. thaliana. Gymno- and angiosperm tree modified from the FGP; lower plant phylogeny is modified from http://www.greenbac.org/. Gray boxes indicate the families on which we are focused.
Computational Methods for Comparative Analysis of Plant Small RNAs
167
4. Availability of genomic sequence data. For the most part, these resources have been developed for the species selected for reasons 1–3.
2. Materials 2.1. Project Data
1. The database tables are built with MYSQL for the public web server, for which we use a Dell PowerEdge 2950. The web interface and the tools, mainly written in PHP, extract the data requested by the user from the MYSQL tables and display the query results in tabulated form and as downloadable data sheets. See Note 1 for details on the implementation of the database. 2. The normalized and raw data are available from our website for those who want to perform their own analysis.
2.2. Analysis Tools
1. The tools run on the web server. These tools were written in PHP and retrieve data from the tables that reside in our MYSQL server. 2. These tools can be accessed from our public server and provide a variety of ways to access and analyze the small RNA data from the species we have described above.
3. Methods In this section, we give a detailed description of the tools that we have developed for comparative analysis of the majority of biologically important small RNAs for a range of plant species. The tools that were developed are as follows: 1. A tool for small RNA comparisons to genomic DNA sequences, reporting only perfect matches. 2. A tool for comparisons of small RNAs and either genomic DNA or other small RNAs, allowing mismatches. 3. A library comparison tool to identify small RNAs conserved in multiple libraries or different plant species. 4. An miRNA target prediction tool. 5. A library comparison tool to identify small RNAs that demonstrate evidence of differential regulation in different tissues or species. We encourage the reader to utilize our website for hands-on experience with these tools. These tools can be accessed from the
168
Mahalingam and Meyers
home page of our NSF-funded project entitled “Comparative Sequencing of Plant Small RNAs” (http://smallrna.udel.edu). 3.1. A Tool for Perfect Match Comparisons of Small RNA vs. Genomic DNA
This tool is a small RNA mapping tool that can be used to map sequences from one of our small RNA libraries onto a DNA sequence of interest to the user. The tool identifies perfect matches between the DNA sequence and the sequences from the library selected by the user. This could be used to find siRNAs derived from a repeat or other sequence within a genomic fragment, or could find an miRNA if it is encoded on that clone. 1. The input DNA sequence is submitted in FASTA format, and we have currently designated a limit of 50,000 characters. The interface also enables the user to either paste their DNA sequences in the text box provided, or to upload their own file containing the DNA sequences in FASTA format. 2. The web interface provides a set of DNA sequences to the user as an example for the user to test the tool. This example can be accessed by pressing the “Example” button displayed under the text box (see Fig. 2a) used for entering the DNA sequence.
Fig. 2. The input interface of small RNA mapping tool: (a) The text box and the file upload option to input the DNA sequence (http://smallrna.udel.edu/). (b) Drop down lists showing the list of species and the libraries for each species. (c) Text boxes to input the e-mail address and the subject line for the e-mail to keep track of analyses.
Computational Methods for Comparative Analysis of Plant Small RNAs
169
3. After entering the DNA sequences, the next step is to select the desired library of sequences (species and tissue) that will be mapped onto the DNA sequences. The library is selected by first choosing the species of interest from the drop down menu (shown in Fig. 2b). 4. Only once the species has been selected, the user can select the library from the second drop down menu (shown in Fig. 2b). This kind of implementation enables us to populate the drop down menu with only those libraries that are specific to the selected species. Please refer to Note 2 for the naming conventions for each library. The user also has the option of choosing all the libraries for a given species for the analysis of the DNA sequences. 5. Finally, the user must provide his/her email address for the results to be mailed to him/her once the analysis has been completed (see Fig. 2c); a link to a web page storing the results will be mailed to the user. At this point, all the information required for the tool has been entered and the user can submit his/her request. A link to the results of an example analysis is provided at the bottom of the page so that the user can see how the results will be displayed. 6. The results are displayed with a table for each input DNA sequence. The table contains the following fields: the small RNA sequences from the selected library with perfect matches in the input DNA sequence, and the strand, the coordinate, and the abundance value of each matching small RNA in the analyzed library. If the user entered multiple DNA sequences in the FASTA input file, then in the output he/she can jump to a particular table of results by clicking on the DNA sequence name listed at the top of the page. 7. The results page also provides links to a file containing the input DNA sequences, a link to download the results as an Excel worksheet, and a link to download the results as a text file. These links help the user track his/her input DNA sequences and to save the analysis for future reference; the results of the analysis are stored in our database for only 48 hours from the time of request. The user can place any number of requests in one day. 3.2. A Tool for Small RNA vs. Genomic DNA Comparisons, for Mismatches
This tool, also known as the miRNA mismatch tool, is used to compare known miRNAs or small RNA sequences against a userselected library of other small RNAs. The tool allows mismatches in the alignment of two small RNAs, but does not allow insertion/ deletion events (indels). This tool could be used to find miRNAs that have slightly diverged in the species of interest compared with known, annotated miRNAs.
170
Mahalingam and Meyers
1. The web interface of this tool enables the user to compare the miRNA or small RNA sequences of interest against one of the libraries selected from our database. The text box at the top of the page allows the user to input his/her miRNA sequences of interest in FASTA format with a limit of 50,000 characters. The tool also provides the option of using known miRNA sequences for certain plant species obtained from the Sanger miRNA registry (http://microrna.sanger.ac.uk). In order to utilize this option, the user can select a species by clicking the checkbox next to the name of the species from the list of species and then by pressing the button “Paste miRNAs” (shown in Fig. 3a). This will automatically paste a set of anno-
Fig. 3. The input interface for analyses of small RNAs against DNA sequences, allowing mismatches: (a) The text box to input the DNA sequence and the example sequences given as options. (b) The drop down lists showing the species and the library, options for the length of the signatures, and the options showing the maximum number of mismatches allowed.
Computational Methods for Comparative Analysis of Plant Small RNAs
171
tated miRNA sequences for the selected species in the text box. The list of species with available known miRNAs currently includes Arabidopsis, rice, maize, and Medicago. 2. The library against which the comparison should be made can be selected from two drop down lists, one displaying the set of species available in our database and the other one providing the list of libraries available for the species selected. The tool only enables the selection of a library after the selection of a species from the drop down list. 3. A subset of the signatures or small RNA sequences from the selected library for comparison against the input sequences can be chosen based on their lengths. This is useful to distinguish potential miRNAs from siRNAs. The tool provides three options for the selection of these sequences (shown in Fig. 3b) which could have the following characteristics: (a) The small RNAs which are of exactly the same length as the input miRNA sequence. This option allows the user to match each miRNA sequence with signatures from the selected library having the same length as the miRNA sequence. (b) The signatures which are ±2 nucleotides in length to the miRNA sequence. This option allows the user to match the miRNA sequences with those signatures which are shifted 5¢ or 3¢ by a difference of two nucleotides from that of the miRNA sequence against which it is matched. (c) The signatures which are of any length. This option allows the user to match the miRNA sequences against the entire set of signatures in the selected library, regardless of their length. This will find a match of one small RNA “embedded” or contained within a longer small RNA. 4. Since the tool allows mismatches, the maximum number of allowed mismatches must be selected by the user from one of the options that we provide in a list, as shown in Fig. 3b. The number of mismatches ranges from zero to a maximum of five. 5. The results of the miRNA mismatch tool are tabulated for each small RNA input sequence. The results table lists the input miRNA sequence, the signature matched with the input sequence (taking the maximum number of mismatches allowed into account), the start and end positions of the library-derived small RNA compared to the input sequence, the number of mismatches, and the abundance values of the small RNA in the libraries selected. The mismatches are highlighted with a different color in the sequence to identify the positions where the mismatches occurred. 6. The results page includes links for downloading the input sequences and also the output results in either text format
172
Mahalingam and Meyers
and/or as an Excel worksheet. These results can only be downloaded when the output exceeds 1,000 entries for a given set of input sequences. This prevents delays in loading a large html page. 3.3. Library Comparison Tool to Identify Conserved Small RNAs
This tool can be used to perform an interlibrary comparison to identify the conserved small RNAs (sequences found in multiple libraries). The tool identifies those conserved small RNAs that occur in a single, primary library as well as all or a subset of the other libraries that the user has selected. Because miRNAs tend to be present in multiple libraries and are often conserved across species, this tool may be useful for distinguishing miRNAs from siRNAs. 1. The primary library can be selected from the two drop down lists at the top of the main page of this tool. As described for other tools, the species can be selected using the first drop down list, and then the library can be selected from the second drop down list (see Fig. 4a). The primary library is defined as a library in which the conserved small RNA must appear in addition to the other libraries that are analyzed. 2. The tool includes an option to allow the user to paste sequences as a primary library. This option, when enabled, provides the
Fig. 4. The input interface of the tool for identifying small RNAs conserved across multiple libraries: (a) The drop down lists for selecting the species and the library as the primary library and a list of checkboxes to select the secondary libraries. (b) The text box to input the small RNA sequences for the primary library.
Computational Methods for Comparative Analysis of Plant Small RNAs
173
user with a textbox for entry of the sequences. The user could also choose to paste known miRNAs, as described above (see Fig. 4b). 3. A series of checkboxes list all the libraries that are available to the user. The selection of multiple libraries allows for comparison with the primary library to determine which small RNAs are found in the intersection. Once the user selects a primary library, it is automatically included in the list of libraries chosen for comparison. 4. The user then needs to specify the total number of other libraries in which a desired small RNA should be found, in addition to the primary library. For example, if the user chooses the library “CRE1–Control” as the primary library and then chooses four other libraries for comparison, the maximum number of libraries that the user can choose for comparison including the primary library is five. However, the user can choose any value less than the maximum. In this case, the comparison is made with different combinations of the chosen libraries, with all the combinations including the primary library. 5. A link to the results of the analysis is emailed to the user. The results page includes the following details about the analysis: (a) The primary library that the user had selected for the analysis. (b) The number of libraries (out of the maximum number of libraries) chosen for comparison with the primary library. (c) A table tabulating the signatures from the primary library that also appeared in the other libraries, the length of each resulting small RNA, and the abundance values of these sequences in the primary library and in the other libraries chosen for comparison. The table is sorted based on the length of the signatures. The table is not displayed if there are more than 1,000 signatures retrieved from the comparison. Instead, the user can download the results as a worksheet in Microsoft Excel format, as mentioned above. 3.4. Library Comparison Tool to Identify Regulated Small RNAs
This tool is used to identify small RNAs demonstrating different expression levels in the comparison of two or more libraries. The tool identifies those signatures whose abundance in the chosen libraries is greater or lesser than the threshold chosen by the user. Since this analysis presupposes the presence of the small RNA in multiple libraries, it is best suited for identifying miRNAs that show regulation in different tissues or conditions. 1. The initial page of the tool displays the list of available libraries in a table (shown in Fig. 5) with three options to choose for each of the libraries. The first option allows the user to select the library for comparison and display in the final results. The
174
Mahalingam and Meyers
Fig. 5. The initial library selection page for the library comparison tool to identify differentially regulated small RNAs: The list of libraries available for selection or viewing is shown.
second option can be used to select libraries not used in the analysis, but displayed only for viewing purposes in the final result. The third option enables the user to ignore the library. Once the libraries have been selected for either viewing or for comparison, the other criteria for comparison among the chosen libraries can be selected by clicking on the “continue to criteria” button in the initial page. 2. The second page of the interface lets the user choose the other input required for the comparison. All the libraries chosen for comparison or viewing are tabulated with the table containing the following columns (shown in Fig. 6):
Computational Methods for Comparative Analysis of Plant Small RNAs
175
Fig. 6. The selection criteria for the tool to identify differentially regulated small RNAs: The figure shows the selection range of abundances for each selected library, the expression levels for two libraries chosen for comparison and the sorting options for the final result.
(a) The first column lists the library name as a link. Clicking on the library name would open a popup window that displays the information about that library. (b) The second column shows a drop down menu that enables the user to pick a range of values for the abundances of that library for comparison. The different options are “>1,000 TPM,” “101–1,000 TPM,” “11–100 TPM,” “3–10 TPM,” and “1 or 2 TPM.” (c) The third column consists of two text boxes that allow the user to define his/her own lower and upper values of the boundary ranges for the abundances of that library. The column is labeled as “user defined range.” (d) The last column gives the option of ignoring the library for comparison. This is the default option. The user can
176
Mahalingam and Meyers
choose only one option, the second, third, or the fourth column. The libraries chosen for viewing alone will not have the above options displayed in the table. 3. The user, if interested in comparing only two libraries, can choose two libraries from the two drop-down lists (shown in Fig. 6) after checking the checkbox that enables this option. The user can then specify the ratio of expression for signatures in the first library versus the second library. The ratio can be specified as a lower and upper range for the abundance in percentage, or by specifying the lower and upper threshold, or by specifying the exact value of the ratio. The comparison between these two libraries can be performed either as a percentage comparison or as a fold comparison. 4. The default direction of comparison is unidirectional; in other words, the abundance is measured by comparing the value in the first library with that of the second library, so that a query for 10-fold higher abundance will require that the first library have a 10-fold higher abundance than the second and not vice versa. However, the user can choose to perform a bidirectional comparison between the two libraries; in this case, the comparison is performed in both directions (library one vs. two and two vs. one) with the ratio specified by the user. The ratio of expression of the signatures is displayed as the user selects the two libraries for comparison and the range of abundances as percent values or fold values. See Fig. 6 for a graphic view of these options on our website. 5. The tool provides an option of sorting the results based on three libraries from the list of libraries chosen by the user either in ascending or descending order and also based on the length of the signatures. The libraries can be chosen from three individual drop-down lists that enable the user to sort the results in the order in which they are chosen. The table, as shown in Fig. 6, also allows the results to be put in ascending or descending order with respect to each of the three libraries chosen for sorting. 6. The results are displayed in a table with the following fields: the sequences that satisfied the criteria that the user picked for the query; the length of the sequence; and the sequence abundances in each of the libraries chosen by the user, displayed in the sorted order specified by the user. If the user has not chosen any sorting option, then the column is displayed in the order in which the data are retrieved from the database. The libraries chosen for viewing alone are displayed in gray color. 7. The results page includes features that enable the user to view the entire table in separate web pages. By default, each page displays 1,000 rows of the table, and the last page displays
Computational Methods for Comparative Analysis of Plant Small RNAs
177
the rest of the rows in the result, or 1,000 rows, whichever is less. The user can change the number of rows to display from the drop-down menu provided at the top of every page. A maximum of 3,000 rows can be displayed in a page. Each page shows the range of rows displayed out of the entire set of data. The user can jump to a page to view the results in that page by either entering the page number in the textbox provided or by clicking the appropriate page number listed horizontally on the top of each page. These options allow the user to easily navigate through the pages. Finally, the tool also provides an option of downloading the results in an Excel worksheet format. 3.5. Small RNA Reverse Target Prediction Tool
This tool is based on more typical target prediction programs used to predict targets of small RNA sequences such as miRNAs, but since no genome may be available or since the user may have a genomic sequence for which they would like to assess for targets of small RNAs, we developed a “reverse target prediction” algorithm. In this case, the user enters a genomic sequence, and then picks a set of small RNAs to compare against that sequence. In some cases, it may be possible to use this page with the output of another of our small RNA analysis pages. Because this is a computationally intensive search (due to the mismatches permitted for miRNA targets), we limit the size of the input sequence. The result will be a set of small RNAs that match the input sequence to the characteristics of a miRNA, as defined by an algorithm that generates a score based on mismatches and bulges. 1. The initial page of the tool allows the user to input their own genome sequences to search for targets. The input sequence is expected to be in the FASTA format and is limited to 50,000 characters. An example sequence can be chosen by clicking the “Example” button provided below the text box (shown in Fig. 7a) in which the sequences need to be entered. The user can also upload a file containing the genomic sequences in the FASTA format. 2. Once the user inputs their genomic sequences, they can choose a species from the drop-down list (shown in Fig. 7a) from which to use the small RNA data. The user can also utilize the output of one of our other tools, and thus can choose his/her own set of small RNA sequences for reverse target prediction. To minimize the computational time, we also require that the user selects smaller molecules (18–22 nt) or larger molecules (>22 nt). Since miRNAs are usually 20–22 nt, we recommend using the smaller class. 3. Once the user chooses the species for which the small RNA data will be used, they next can adjust the settings for penalty scores for mismatches, bulges, and wobbles in the miRNA-target pair;
178
Mahalingam and Meyers
or the user can choose the default settings. The tool also provides two filter options: (1) the 10/11 nt positions can require perfect matches (as for known miRNAs, as this is the site of cleavage), and (2) the 2–9 nt positions may not allow more than one mismatch (as also the case for known miRNAs); both of these are turned on in the default conditions. All of these options as displayed on the web site are shown in Fig. 7b. 4. The results of the analysis are emailed to the user and are visible on a web page, the link to which is sent in the email. The results page includes a table that displays the target site alignment of the genomic sequence, plus the start and end positions of the sequence, the score, and the total numbers of mismatch and indels, and the strand of the genomic sequence on which the match was found. The table also includes the abundances of the matching small RNAs in the libraries of the species selected by the user for target prediction. The result
Fig. 7. The input interface for the reverse target prediction tool: This tool compares a library of small RNAs to a genomic sequence to identify small RNAs that could target the genomic sequence, using standard miRNA criteria. (a) The text box to enter the genomic sequence and the drop down list showing the species available, and the options provided for the length of the signatures. (b) The settings for the scoring system. The settings allow the user to alter the scoring penalties associated with nucleotide “wobbles” (G:U pairs), bulges, and mismatches.
Computational Methods for Comparative Analysis of Plant Small RNAs
179
of the analysis can be downloaded as an Excel sheet, a link which is provided at the bottom of the results page.
4. Notes 1. The following are the tables used in our database to retrieve the data requested by the user; this table description should be sufficient for a reader who wishes to implement his/her own version of this database. (a) A species table that contains the list of species and its tissues. Each species is identified by a species code, and each tissue is identified by the library code as explained in Note 2. (b) The small RNA data are stored in a table that includes the following columns: (a) the union of the distinct sequences generated for of all the species; (b) the body part of each of these signatures; (c) the length of each signature; and (d) the raw abundance of each signature in each of the libraries, structured in the table with each library occupying a column. The abundance value of a signature for a library is set to zero if that signature is not present in the library. Thus, if there are nine libraries, then the table has ten columns listing the raw abundances for each of these libraries and for each of the sequences in the signature column. Normalized instead of raw abundances can be stored in a similarly structured table, and this may facilitate comparisons across libraries for differentially regulated small RNAs. (c) Other than these tables, each tool has two other associated tables. One of the tables is used to store the pending requests that the user initiates in our system from the web page. A “cron job” (a program which runs on a regular schedule) runs in the background, and this extracts data from the pending request table, identifying jobs that are pending to move them to the processing stage. After processing, the results are stored in another table and can be retrieved from this table with the user’s email address and the time at which the request was placed. 2. Each library is represented by an identifier composed of the three letter code of the species followed by the library number, followed by the name of the tissue separated by a hyphen. For example, the tissue “Leaves” of the species “Lactuca sativa” is represented as “LSA1-Leaves.” In general, library #1 is a leaf library, library #2 is an inflorescence library, and library #3 is a tissue that varies from species to species.
180
Mahalingam and Meyers
Acknowledgments We are grateful to Prakash Janardhan, Mayumi Nakano, Vimal Kannan, Emanuele De Paoli, Pam Green, and other members of the Meyers and Green labs for assistance with the database, web pages, analytical tools, data and discussions about all of these. Work on this project is supported by the NSF Plant Genome Research Program, Comparative Sequencing Project, award #0638525.
References 1. Hamilton AJ, Baulcombe DC (1999) A species of small antisense RNA in posttranscriptional gene silencing in plants. Science 286:950–952 2. Bartel B, Bartel DP (2003) MicroRNAs: at the root of plant development? Plant Physiol 132:709–717 3. Mallory AC, Vaucheret H (2004) MicroRNAs: something important between the genes. Curr Opin Plant Biol 7:120–125 4. Carrington JC, Ambros V (2003) Role of microRNAs in plant and animal development. Science 301:336–338 5. Bartel DP (2004) MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116:281–297 6. Schwarz DS, Hutvagner G, Haley B, Zamore PD (2002) Evidence that siRNAs function as guides, not primers, in the Drosophila and human RNAi pathways. Mol Cell 10:537–548 7. Jones-Rhoades MW, Bartel DP, Bartel B (2006) MicroRNAs and their regulatory roles in plants. Annu Rev Plant Biol 57:19–53 8. Tang G, Reinhart BJ, Bartel DP, Zamore PD (2003) A biochemical framework for RNA silencing in plants. Genes Dev 17:49–63 9. Llave C, Xie Z, Kasschau KD, Carrington JC (2002) Cleavage of Scarecrow-like mRNA targets directed by a class of Arabidopsis miRNA. Science 297:2053–2056 10. Peragine A, Yoshikawa M, Wu G, Albrecht HL, Poethig RS (2004) SGS3 and SGS2/SDE1/ RDR6 are required for juvenile development and the production of trans-acting siRNAs in Arabidopsis. Genes Dev 18:2368–2379 11. Vazquez F, Vaucheret H, Rajagopalan R, Lepers C, Gasciolli V, Mallory AC, Hilbert JL, Bartel DP, Crete P (2004) Endogenous transacting siRNAs regulate the accumulation of Arabidopsis mRNAs. Mol Cell 16:69–79
12. Aukerman MJ, Sakai H (2003) Regulation of flowering time and floral organ identity by a MicroRNA and its APETALA2-like target genes. Plant Cell 15:2730–2741 13. Chen X (2004) A microRNA as a translational repressor of APETALA2 in Arabidopsis flower development. Science 303:2022–2025 14. Kasschau KD, Xie Z, Allen E, Llave C, Chapman EJ, Krizan KA, Carrington JC (2003) P1/ HC-Pro, a viral suppressor of RNA silencing, interferes with Arabidopsis development and miRNA function. Dev Cell 4:205–217 15. Reinhart BJ, Weinstein EG, Rhoades MW, Bartel B, Bartel DP (2002) MicroRNAs in plants. Genes Dev 16:1616–1626 16. Jones-Rhoades MW, Bartel DP (2004) Computational identification of plant microRNAs and their targets, including a stressinduced miRNA. Mol Cell 14:787–799 17. Palatnik JF, Allen E, Wu X, Schommer C, Schwab R, Carrington JC, Weigel D (2003) Control of leaf morphogenesis by microRNAs. Nature 425:257–263 18. Kidner CA, Martienssen RA (2005) The developmental role of microRNA in plants. Curr Opin Plant Biol 8:38–44 19. Rhoades MW, Reinhart BJ, Lim LP, Burge CB, Bartel B, Bartel DP (2002) Prediction of plant microRNA targets. Cell 110:513–520 20. Llave C, Kasschau KD, Rector MA, Carrington JC (2002) Endogenous and silencing-associated small RNAs in plants. Plant Cell 14:1605–1619 21. Vazquez F, Gasciolli V, Crete P, Vaucheret H (2004) The nuclear dsRNA binding protein HYL1 is required for microRNA accumulation and plant development, but not posttranscriptional transgene silencing. Curr Biol 14:346–351
Computational Methods for Comparative Analysis of Plant Small RNAs
22. Jacobsen SE, Running MP, Meyerowitz EM (1999) Disruption of an RNA helicase/RNAse III gene in Arabidopsis causes unregulated cell division in floral meristems. Development 126:5231–5243 23. Vaucheret H, Vazquez F, Crete P, Bartel DP (2004) The action of ARGONAUTE1 in the miRNA pathway and its regulation by the miRNA pathway are crucial for plant development. Genes Dev 18:1187–1197 24. Kidner CA, Martienssen RA (2004) Spatially restricted microRNA directs leaf polarity through ARGONAUTE1. Nature 428:81–84 25. Navarro L, Dunoyer P, Jay F, Arnold B, Dharmasiri N, Estelle M, Voinnet O, Jones JD (2006) A plant miRNA contributes to antibacterial resistance by repressing auxin signaling. Science 312:436–439 26. Lu C, Kulkarni K, Souret FF, Valliappan RM, Tej SS, Poethig RS, Henderson IR, Jacobsen SE, Wang W, Green PJ, Meyers BC (2006) microRNAs and other small RNAs enriched in the Arabidopsis RNA-dependent RNA polymerase-2 mutant. Genome Res (submitted) 27. Borsani O, Zhu J, Verslues PE, Sunkar R, Zhu JK (2005) Endogenous siRNAs derived from a pair of natural cis-antisense transcripts regulate salt tolerance in arabidopsis. Cell 123: 1279–1291 28. Sunkar R, Zhu JK (2004) Novel and stressregulated microRNAs and other small RNAs from Arabidopsis. Plant Cell 16:2001–2019 29. Baulcombe D (2004) RNA silencing in plants. Nature 431:356–363 30. Lippman Z, Martienssen R (2004) The role of RNA interference in heterochromatic silencing. Nature 431:364–370 31. Chan SW, Zilberman D, Xie Z, Johansen LK, Carrington JC, Jacobsen SE (2004) RNA silencing genes control de novo DNA methylation. Science 303:1336 32. Kinoshita T, Miura A, Choi Y, Kinoshita Y, Cao X, Jacobsen SE, Fischer RL, Kakutani T (2004) One-way control of FWA imprinting
181
in Arabidopsis endosperm by DNA methylation. Science 303:521–523 33. Floyd SK, Bowman JL (2004) Gene regulation: ancient microRNA target sequences in plants. Nature 428:485–486 34. Arazi T, Talmor-Neiman M, Stav R, Riese M, Huijser P, Baulcombe DC (2005) Cloning and characterization of micro-RNAs from moss. Plant J 43:837–848 35. Axtell MJ, Bartel DP (2005) Antiquity of microRNAs and their targets in land plants. Plant Cell 17:1658–1673 36. Allen E, Xie Z, Gustafson AM, Sung GH, Spatafora JW, Carrington JC (2004) Evolution of microRNA genes by inverted duplication of target gene sequences in Arabidopsis thaliana. Nat Genet 36:1282–1290 37. Bonnet E, Wuyts J, Rouze P, Van de Peer Y (2004) Evidence that microRNA precursors, unlike other non-coding RNAs, have lower folding free energies than random sequences. Bioinformatics 20:2911–2917 38. Ohler U, Yekta S, Lim LP, Bartel DP, Burge CB (2004) Patterns of flanking sequence conservation and a characteristic upstream motif for microRNA gene identification. RNA 10:1309–1322 39. Meyers BC, Souret FF, Lu C, Green PJ (2006) Sweating the small stuff: microRNA discovery in plants. Curr Opin Biotechnol 17:139–146 40. Reinhart J, Mertz LM, Catt KJ (1992) Molecular cloning and expression of cDNA encoding the murine gonadotropin-releasing hormone receptor. J Biol Chem 267: 21281–21284 41. Aravin AA, Lagos-Quintana M, Yalcin A, Zavolan M, Marks D, Snyder B, Gaasterland T, Meyer J, Tuschl T (2003) The small RNA profile during Drosophila melanogaster development. Dev Cell 5:337–350 42. Lagos-Quintana M, Rauhut R, Lendeckel W, Tuschl T (2001) Identification of novel genes coding for small expressed RNAs. Science 294:853–858
Chapter 13 Biotic Stress-Associated microRNAs: Identification, Detection, Regulation, and Functional Analysis Florence Jay, Jean-Pierre Renou, Olivier Voinnet, and Lionel Navarro Abstract The methods described herein first highlight the strategies that were used to discover a biotic stressassociated miRNA. This involved (1) the selection of transcripts that were more abundant in transgenic plants expressing viral-derived suppressors of RNA silencing and transcripts that were repressed in wildtype seedlings treated with a biotic stress, (2) a 5¢ RACE-derived assay to map miRNA target sites, and (3) a bioinformatic analysis to retrieve specific miRNA loci from the Arabidopsis genome. We then describe methods used to monitor (1) the levels of primary miRNA transcripts (pri-miRNAs)/mature miRNAs and (2) the transcriptional activity of miRNAs in response to a biotic stress and bacterial challenge. Furthermore, we present a strategy to identify additional biotic stress-responsive miRNA genes and get insight into their regulation. This involves (1) a microarray approach that allows detection of pri-miRNAs, coupled with (2) a promoter analysis of co-regulated miRNA genes. Finally, we describe strategies that can be used to functionally characterize individual biotic stress-associated miRNAs, or the miRNA pathway, in disease resistance. Key words: Biotic stress response, miRNA, Bioinformatics, bacteria, Promoter analysis
1. Introduction Although miRNAs were initially characterized in plant and animal development, recent studies revealed a role for miRNAs in controlling the innate immune response (1, 2, 3). For example, Arabidopsis miR393 is a stress-responsive miRNA that contributes to resistance against virulent Pseudomonas syringae pv. tomato strain DC3000 (Pto DC3000), presumably by repressing auxin-signaling (1). In this chapter, we present the strategies that were used to identify and functionally characterize miR393 in antibacterial resistance. We also present methods to identify additional biotic stress-associated miRNAs and get insight into their regulation. These methods can B.C. Meyers and P.J. Green (eds.), Plant MicroRNAs, Methods in Molecular Biology, vol. 592 DOI 10.1007/978-1-60327-005-2_13, © Humana Press, a part of Springer Science + Business Media, LLC 2009
183
184
Jay et al.
be used to retrieve and functionally characterize biotic-stress asso ciated miRNAs from model plants but also from agriculturally important crops whose genomes are partially sequenced.
2. Materials 1. pGEMT easy® vector. 2. GeneRacer RNA adaptor® (Invitrogen). 3. Tri ReagentTM (Sigma). 4. Acrylamide: Acryl/Bis-acryl 19:1 (Eurobio). 5. Hybridization Buffer: Perfect-HybTM plus hybridization Buffer, 1× (Sigma). 6. HybondTM-NX (Amersham). 7. Whatman® (Schleicher and Schuell). 8. Cy3-dUTP/Cy5-dUTP (Perkin-Elmer-NEN Life Science Products). 9. UV cross-linker (StratalinkerTM). 10. Hybridization oven: Hybridizer HB-ID (Techne). 11. Oligotex mRNA mini kit® (Qiagen). 12. RNeasy Plant Mini Kit® (Qiagen). 13. miR StarFireTM oligonucleotide labeling kit (Integrated DNA Technologies). 14. Klenow DNA polymerase. 15. Polynuclotide kinase (PNK). 16. SephadexTM (G25 spin columns). 17. Superscript III® (Invitrogen). 18. Taq DNA polymerase (Qiagen). 19. Oligonucleotides. 20. 1× Murashige-Skoog (MS) liquid medium. 21. Flg22 and flg22A.tum peptides. Flg22A.tum is an inactive peptide derived from the N-terminal part of Agrobacterium tumefaciens flagellin. 22. Pseudomonas syringae pv. tomato strain DC3000 (Pto DC3000), Pto DC3000 hrcC-. 23. NYGA and NYGB medium. 24. Agarose and DNA sequencing gel equipment. 25. Overexpressor vectors (pK7WG2D, pROKII). 26. GFP reporter vector (pBIN61-based).
Biotic Stress-Associated microRNAs
185
27. Arabidopsis miRNA- and siRNA-deficient mutants (dcl and rdr mutants). 28. Arabidopsis transgenic plants overexpressing P19, P1-Hc Pro or P15 virus-derived suppressors of RNA silencing under the strong Cauliflower Mosaic Virus 35 S promoter.
3. Methods 3.1. Identification of Biotic StressAssociated miRNA Loci
The identification of biotic stress-associated miRNA loci involves (1) the selection of putative biotic stress-associated miRNA target transcripts, (2) the mapping of miRNA-directed cleavage sites, and (3) a bioinformatic analysis to retrieve potential stress-associated miRNA-precursors and miRNA mature sequences (see Note 1).
3.1.1. Identification of Putative Stress-Associated miRNA Targets Using Transcriptome Analysis
Transcripts accumulating to higher levels in Arabidopsis transgenic lines overexpressing either P19, P1-Hc Pro or P15 virusderived suppressors of RNA silencing, when compared with wild-type plants, are likely, directly or indirectly, regulated by siRNAs and/or miRNAs (4, 5). Thus, comparing gene sets that are repressed in wild-type seedlings treated with the Pathogenassociated molecular pattern (PAMP) flg22-a flagellin-derived peptide-with gene sets that are more abundant in non-treated suppressor lines allows the identification of mRNAs that may be post-transcriptionally repressed by small RNAs in the flg22 response (Fig. 1a). For example, the auxin receptor Transport Inhibitor Response 1 (TIR1) fulfilled such criterion (Fig. 1a), suggesting that the corresponding transcript might be directly regulated by an endogenous small RNA (see Note 2). 1. Grow A. thaliana Col-0 and transgenic lines overexpressing viral-derived suppressors of RNA silencing for 10 days on plates containing 1× MS medium (Duchefa), 1% sucrose and 0.8% agar under a 12 h photoperiod at 22°C. Transfer the seedlings to MS liquid medium (two seedlings per 500 mL of medium in wells of 24-well-plates). 2. After 2 days, challenge Col-0 plantlets with 1 mM of either flg22 (QRLSTGSRINSAKDDAAGLQIA) or flg22A.tum (ARVSSGLRVGDASDNAAYWSIA) synthetic peptides (Sigma Genosys). Collect plantlets in liquid nitrogen and store at −80°C. For Col-0 seedlings, collect samples at different time points after treatments (in the illustrated example, a 30 min time point was used). 3. Prepare total RNA using RNeasy Plant Mini Kit® (Qiagen). 4. Perform cDNA synthesis, cRNA synthesis, biotin labeling, hybridization, and scanning as recommended by the manufacturer (Affymetrix®).
a Common P1-HcPro/ genes P19/P15
Flg22 DOWN
UP
TIR1 mRNA
5’RACE-derived assay on TIR1 W
t
dc
l1-
9
dc
1 -1 l 2 - dcl3
Mapping the cleavage site by sequencing
AtmiR393a (chromosome 2)
AtmiR393b (chromosome 3)
Blast versus IGR regions and mfold analysis
b
c
Fig. 1. Identification of Arabidopsis miR393. (a) Schematic representation of the methods used to identify miR393 loci. (b) Alignment of TIR1, AFB1, AFB2, AFB3, and rice TIR1 cDNAs. (c) Alignment of miR393 precursor sequences from different plant species. From top to bottom, miR393 precursor sequences derived from: Rice_1, Oryza sativa chromosome 1 (AP002483); Rice_2, Oryza sativa chromosome 4 (OSJN00092); Populus, Populus alba x Populus tremula cDNA (CF231897);
Biotic Stress-Associated microRNAs
187
5. Perform bioinformatic analysis: select transcripts that are repressed in flg22-treated Col-0 seedlings and more abundant in non-treated P19, P15, or P1-Hc Pro transgenic seedlings. 6. Validate microarray data by semi-quantitative RT-PCR analysis (Fig. 1a): Reverse transcribe total RNA into cDNA using SuperScript III® reverse transcriptase (Invitrogen). PCR amplify the full-length candidate transcripts. 3.1.2. Mapping the miRNA-Directed Cleavage Site
A modified rapid amplification of cDNA ends (RACE) assay is commonly used to detect 3¢-cleavage products derived from plant miRNA targets. Using this method, we found that TIR1 mRNA is cleaved in Col-0, La-er, dcl2-1, and dcl3-1, as evidenced by detection of PCR-amplified cleavage product (Fig. 1a). However, no PCR amplification was detected in the dcl1-9 mutant background, indicating that the TIR1 transcript is likely specifically cleaved by an endogenous miRNA rather than a siRNA (Fig. 1a). 1. Collect and store 12-day-old Col-0, La-er, dcl1-9, dcl2-1, dcl3-1, and dcl4-2 seedlings at −80°C. 2. Extract total RNA using Tri ReagentTM (Sigma). 3. Enrich PolyA RNAs from 100 to 150 mg of total RNA using an Oligotex® mRNA mini kit (Qiagen). 4. Add PolyA RNAs directly into tubes that contain the lyophilized GeneRacer RNA oligo. 5. Incubate at 65°C for 5 min and place on ice for 2 min. 6. After adding the T4 RNA ligase, incubate for 1 h at 37°C. 7. Centrifuge and place on ice. 8. Precipitate the RNA. 9. Incubate at 65°C for 5 min and chill on ice for 5 min. 10. Proceed with reverse transcription using Superscript III® reverse transcriptase (Invitrogen). 11. Perform a PCR amplification using the GeneRacer 5¢ primer (5¢-CGACTGGAGCACGAGGACACTGA-3¢) and the Gene Racer 3¢ primer (5¢-GCTGTCAACGATACGCTACGTAAC G-3¢) to generate a pool of 5¢ RACE products. 12. Perform a PCR amplification (35 cycles) using the 5¢Nested GeneRacer primer 5¢- and a 3¢ gene specific primer (5¢-TTAT AATCCGTTAGTAGTAATGATTTG-3¢ for TIR1). The pool of 5¢ RACE products generated at point 11 is used as template in this PCR amplification.
Fig. 1. (continued) Lotus, Lotus japonicus chromosome 5 (AP004970); Arabidopsis_1, Arabidopsis thaliana chromosome 2 (between At2g39880 and At2g39890); Arabidopsis_2, Arabidopsis thaliana chromosome 3 (between At3g55730 and At3g55735); Malus_x_domestica, Malus x domestica cDNA (CN904957); and Medicago, Medicago truncatula (AC147434).
188
Jay et al.
13. Clean PCR products using QIAquick® PCR purification kit (Qiagen) and load those amplifications on 1% agarose gel (see Note 3). 14. Clone the 3¢ cleavage products into pGEMT easy® vector. 15. Sequence the inserts. 3.1.3. Identification of miRNA Precursors and Mature miRNA Sequence
Alignment of the Arabidopsis TIR1 cDNA sequence with TIR1 paralogs and Oryza sativa TIR1 orthologs revealed a conserved 23-nucleotide sequence motif in this region (Fig. 1b). This sequence motif was BLASTed against Arabidopsis intergenic sequences (IGRs) and led to the identification of one hit on chromosome 3 (IGR between At3g55730 and At3g55735) and another one on chromosome 2 (IGR between At2g39880 and At2g39890). Analysis using the Mfold software of each IGR region surrounding the predicted miRNA sequence produced a prediction of stable stem loop structures of 174 nt (chromosome 3; DG = −66.9 kcal/mol) and 149 nt (chromosome 2; DG = −54.6 kcal/mol) (Fig. 1a). Several orthologous precursors were identified, and their alignment revealed a conserved 21-nucleotide sequence motif corresponding to the mature miR393 sequence (Fig. 1c). This miRNA sequence motif begins with a U, like the majority of plant miRNAs loaded in AGO1 (20, 21), and the predominant cleavage occurs at position 10 from the 5¢ end of the miRNA. The Arabidopsis miR393 precursors were referred to as At-miR393a and At-miR393b, which are located on chromosome 2 and 3, respectively (22). 1. Align the candidate cDNA sequence with paralogs and orthologs in order to find a conserved motif containing the miRNA target site (see Note 4). 2. Blast this motif against Arabidopsis IGRs from the EMBL databases EST division and against other plant genomes from the plant division (PNL) (5). 3. Subject each IGR surrounding this motif to Mfold structure predictions (http://mfold.bioinfo.rpi.edu/cgi-bin/rna-form1.cgi). 4. Align the different precursor sequences.
3.2. Detection of Biotic Stress-Associated miRNAs
A method to monitor miRNA-transcriptional activities in response to biotic stress is first presented. Methods to detect biotic stressassociated pri-miRNA and mature miRNAs are subsequently described.
3.2.1. miR393Transcriptional Activity In Response to flg22 Treatment
RT-qPCR analysis revealed an increase in eGFP mRNA levels in miR393a.p::eGFP transgenic plants treated with flg22 (Fig. 2b). This result indicates the presence of flg22-responsive elements in miR393a promoter (e.g., 1.5 kb upstream region of miR393a
Biotic Stress-Associated microRNAs
189
Fig. 2. miR393 is transcriptionally induced upon flg22 treatment. (a) Northern blot analysis of miR393 in response to flg22. Northern blot analysis of miR393 (left panels) and miR171 (right panels) upon flg22 (upper panels) and flg22A.tum (bottom panels) treatments. rRNA, ethidium bromide staining of ribosomal RNA. (b) Flg22-responsiveness of miR393a promoter. T2 transgenic lines expressing either At-miR393a or At-miR393b promoters in fusion to the reporter gene eGFP. Three independent transgenic lines expressing either reporter constructs are depicted. Real-time RT-PCRs were performed to assess the relative mRNA level of eGFP upon flg22 or flg22A.tum treatments. mRNA levels were normalized to that of Actin2 (At3g18780). Error bars represent standard deviation from four PCR results.
stem loop contains ten copies of the stress-related W-box elements (6), data not shown). 1. Clone the 1.5 kb DNA sequences that are upstream of the miRNA stem loop structures (see Note 5). 2. Fuse those DNA sequences to a reporter gene (e.g., the enhanced Green Fluorescence Protein (eGFP) using a pBIN61-based reporter vector for instance). 3. Transform these constructs in Arabidopsis to generate stable transgenic lines. 4. Prepare cDNA samples as described in Subheading 3.2.2.
190
Jay et al.
5. Perform qPCR analysis using SYBR® Green qPCR kit (EUROGENTEC) with eGFP specific primers (Forward 5¢-ACGTAAACGGCCACAAGTTC-3¢, Reverse 5¢-AAGTCGTGC TGCTTCATGTG-3¢). PCR is performed in 96-well optical reaction plates heated at 95°C for 10 min, followed by 45 cycles of denaturation at 95°C for 15 s, annealing at 60°C for 20 s, and elongation at 72°C for 40 s. A melting curve is performed at the end of the amplification by steps of 1°C (from 95 to 50°C). Transcript levels are normalized to that of Actin2 (At3g18780): Forward 5¢-GCACCCTGTTCTTCTTACCG-3¢ and Reverse 5¢-AACCCTCGTAGATTGGCACA-3¢ and TIP4-1 like gene (At4g34270): Forward 5¢-GTGAAAACTGTTGGAGAGAAGC AA-3¢ and Reverse 5¢-TCAACTGGATACCCTTT CGCA-3¢. 3.2.2. Detection of pri-miR393a/b Induction Upon flg22 Treatment
We found that pri-miR393a, but not pri-miR393b, was significantly induced in flg22-treated seedlings (data not shown). 1. Challenge 12-day-old Col-0 seedlings with either flg22 or flg22A.tum at a final concentration of 10 mM and store samples at −80°C (see Note 6). 2. Extract total RNA using RNeasy Plant Mini Kit (Qiagen®). 3. Reverse transcribe RNA samples into cDNA using SuperScript III® reverse transcriptase (Invitrogen). 4. PCR amplify the primary miRNA transcript using Taq DNA polymerase (Qiagen®). PCR conditions: 2 min, 94°C (first cycle); 30 s, 94°C; 30 s, 58°C; 1.5 min, 72°C (35–38 cycles); and 10 min, 72°C (last cycle). Primer sequences are as follows (see Note 7): pri-miR393a Forward 5¢-GAGATAGAGAGTTGA ACAAATTCTTC-3¢, Reverse 5¢-GTATCCATGATAGTTG AGAAATTTGC-3¢; pri-miR393b Forward 5¢-ACACCATT GCTCCCACCTTGAAAGA-3¢, Reverse 5¢-CGCCGTTGATG TCTCCGGTCATG-3¢. To control equal cDNA amount in each reaction, perform a PCR with primers corresponding to Actin2 (At3g18780): Forward 5¢-GCACCCTGTTCTTCTTA CCG-3¢ and Reverse 5¢-AACCCTCGTAGATTGGCACA-3¢ and TIP4-1 like gene (At4g34270): Forward 5¢-GTGAAAACTG TTGGAGAGAAGCAA-3¢ and Reverse 5¢-TCAACTGGAT ACC CTTTCGCA-3¢ as control. 5. Separate PCR products on a 1.2% agarose gel and visualize bands after ethidium bromide staining.
3.2.3. Detection of miR393 Induction by Northern Analysis Using Polyacrylamide Gels
Northern analysis showed a ~2-fold increase in miR393 accumulation after 20 min and 60 min of flg22 treatment, whereas levels of the unrelated miR171 remained unaltered (Fig. 2a) (see Note 8). 1. Challenge 12-day-old Col-0 seedlings with either flg22 or flg22A.tum at a final concentration of 10 mM (see Note 6) and store samples at −80°C.
Biotic Stress-Associated microRNAs
191
2. Extract total RNA using Tri ReagentTM (see Note 9). 3. Resuspend total RNA in 50% formamide. 4. Heat 40 to 60 mg of total RNA at 95°C for 5 min and chill on ice for another 5 min. Add ¼ volume of loading buffer (50% glycerol, 50 mM Tris-HCl pH7.7, 5 mM EDTA, 0.03% Bromophenol Blue). 5. Load total RNA on a 17.5% acrylamide gel (using Acryl/Bisacryl 19:1, Eurobio). The gel is pre-run for 30–60 min in TBE 0.5× buffer before loading the samples. 6. Migrate for 4 h at constant voltage (80 V) in TBE 0.5× buffer. 7. Transfer RNA to a HybondTMNX membrane using Transblot Cell (Biorad). Transfer conditions: constant voltage under 80 V for 75 min in TBE buffer 0.5×buffer. 8. Place the membrane on a Whatman® 3MM paper soaked with SSC 2× for 5 min. 9. Cross-link RNA using a UV cross-linker (StratalinkerTM) (see Note 10). 10. Transfer the membrane into a rotating glass tube in a hybridization oven (Hybridizer HB-1D, Techne). Prehybridization is performed by adding 20 mL of Perfect-HybTM Plus (Sigma) at 42°C for at least an hour (23). 11. End label the DNA oligonucleotide (which is complementary to the miRNA sequence of interest) with g-32P-ATP using T4 PNK (New England Biolabs, Beverly, MA) as described by the manufacturer (see Note 11). Add this probe in the Perfect HybTM plus buffer (Sigma) and perform the hybridization overnight at 42°C. 12. Wash the membrane twice with 20 mL of SSC 2×, SDS 2% for 20 min at 42°C (see Note 12). 13. Expose the membrane to X-ray film (at least 24 h exposure for miR393 detection). 3.3. Understanding the Regulation of Biotic Stress Responsive miRNAs
Promoter analyses are used to provide additional insights into the regulation of stress-associated miRNAs. This involves methods to (1) retrieve known cis-regulatory elements within a biotic stress-responsive promoter sequence; and (2) profile biotic stressassociated miRNA transcripts, implement a clustering analysis of their expression profiles and perform a promoter analysis on the miRNA genes that are co-regulated in response to the biotic stress.
3.3.1. Promoter Analysis of Known cis-Regulatory Sequences
A large number of known cis-regulatory element databases are available online and can be used for this purpose (e.g., PLACE database: http://www.dna.affrc.go.jp/PLACE/signalscan.html). Alternatively, specific cis-regulatory elements can be retrieved and highlighted simply by using Microsoft Word.
192
Jay et al.
1. Generate a file containing candidate promoter sequences in a fasta format. 2. Use >Edit and >Replace functions. 3. In the appearing window, enter the cis-regulatory element sequence of interest in the ‘Find What’ box and the exact same sequence in the ‘Replace with’ box using the highlight function for the latter one. 4. Use ‘Replace all’ function. 5. Repeat the same analysis by entering the sequence of the cisregulatory motif in the reverse complementary orientation to highlight cis-regulatory elements on the other DNA strand (see Note 13). The methods described hereafter allow (1) the identification of PAMP-responsive miRNAs that may play an important role in antibacterial resistance (2) the functional characterization of individual miRNAs in disease resistance. 3.3.2. Understanding the Regulation of Stress-Responsive miRNAs Using a Microarray Approach Coupled to a Bioinformatic Analysis
Approaches to profile pri-miRNA transcripts coupled with clustering/promoter analyses can be implemented to identify putative over-represented cis-regulatory elements that may play a role in the regulation of biotic stress-associated loci (see Note 14, Fig. 3). 1. Design of pri-miRNA probes: retrieve sequences located upstream of the miRNA stem loop structures (see Note 15). 2. Spot 60–70 mers oligonucleotides corresponding to the reverse complementary sequences onto a microarray slide (see Note 16). 3. Challenge samples with the biotic stress of interest (see Note 17) and extract total RNA using Rneasy Plant Mini Kit® (Qiagen). 4. Perform the in vitro transcription (Ambion), the RT in the presence of Cy3-dUTP or Cy5-dUTP (Perkin-Elmer-NEN Life Science Products).
Fig. 3. Methods to study the regulation of stress-responsive miRNAs. Schematic representation of a method that can be used to identify over-representation of cis-regulatory elements within promoters of co-regulated stress-responsive miRNA genes.
Biotic Stress-Associated microRNAs
193
5. Hybridize the labelled samples to the slide, and scan as described in Lurin et al. (7) and analyse the data as described in Gagnot et al. 6. Promoter analysis: to identify cis-regulatory elements that are over-represented within the promoter of co-regulated miRNA genes use any of the following publicy available programs: that are AlignACE, DIALIGN, FootPrinter, MEME, and MotifSampler. 7. Repeat the same analysis as in step 6, but this time using a set of promoter sequences that are derived from biotic stressinsensitive miRNA genes (used as negative controls).
Fig. 4. miR393 contributes to resistance against virulent Pto DC3000. (a) pri-miR393a/b transcripts are up-regulated in response to Pto DC3000 hrcC- and this induction is suppressed by virulent Pto DC3000. Semi-quantitative RT-PCR analysis of several pri-miRNAs. (b) miR393a and b are transcriptionally induced by Pto DC3000 hrcC- and this induction is suppressed by virulent Pto DC3000. RT-qPCR analysis of the eGFP and FRK1 transcripts in miR393a-p::eGFP and miR393b-p::eGFP reporter lines challenged with Pto DC3000 and Pto DC3000 hrcC-. (c) Overexpression of AFB1 increases susceptibility to virulent Pto DC3000. Growth of Pto DC3000 on AFB1-myc overexpressing lines, Col-0 and tir1-1. (d) Overexpression of miR393 elevates resistance to virulent Pto DC3000. Growth of Pto DC3000 on three independent miR393 overexpressing lines.
194
Jay et al.
3.4. Identification and Functional Analysis of miRNAs Implicated in Antibacterial Resistance
3.4.1. Identification of PAMP-Responsive miRNAs that May Play an Important Role in Antibacterial Resistance
The methods described hereafter allow (1) the identification of PAMP-responsive miRNAs that may play an important role in antibacterial resistance (2) the functional characterization of individual miRNAs in disease resistance. Pseudomonas syringae pv tomato strain DC3000 hrcC- mutant (Pto DC3000 hrcC-) is a type III defective mutant that cannot deliver effector proteins into host cells (8). This bacterial mutant elicits, but cannot suppress, PAMP-triggered immunity and consequently triggers-like flg22- a potent basal defense response. Accordingly, the PAMP-responsive pri-miR393a/b and pri-miR396 are all induced in response to Pto DC3000 hrcC- whereas levels of other PAMP-insensitive pri-miRNAs remain unaltered (Fig. 4a) (25). Several reports indicate that virulent Pto DC3000 can suppress transcriptional induction of PAMP-responsive protein-coding genes (some of these genes are required for basal resistance (9–11)). Similarly, induction of the PAMP-responsive primiR393a/b and pri-miR396b are significantly suppressed by virulent Pto DC3000 (Fig. 4a). This effect occurs, at least in part, at the transcriptional level because similar results were obtained using transgenic lines reporting miR393a/b transcriptional activities (Fig. 4b). Altogether, these results suggest a role for these PAMP-responsive miRNAs in antibacterial resistance. Therefore, to identify the whole set of PAMP-responsive miRNA genes that may play an important role in antibacterial resistance, a genome-wide analysis (described in Subheading 3.3.2) can be conducted. 1. Grow Pto DC3000 hrcC- and virulent Pto DC3000 strains overnight at 28°C in 10 mL of NYGB medium. This medium contains for 1 L: Bacto Yeast Extract (3 g), Bacto Peptone (5 g), Glycerol (20 mL), 1% agar, supplemented with 100 mg/L Rifampicin. Spin down cells by centrifugation for 10 min at 2,500 g at room temperature. Resuspend the bacteria in a 10 mM MgCl2 solution and spin down again for 8 min at 3,500 rpm. Finally, resuspend the cells in a solution of 10 mM MgCl2 and dilute to the appropriate working concentration. 2. Inoculate 5-week-old Arabidopsis fully expanded leaves by syringe-infiltration with a bacterial concentration of 2 × 107 colony-forming units per mL (cfu/mL). 3. Collect samples at different timepoints post-inoculation (see Note 18). The methods described hereafter allow (1) the identification of PAMP-responsive miRNAs that may play an important role in antibacterial resistance (2) the functional characterization of individual miRNAs in disease resistance. 4. Perform RNA extraction, cDNA preparation, and hybridization onto the microarray slide as described in Subheading 3.3.2.
Biotic Stress-Associated microRNAs
195
5. Select pri-miRNAs that are up-regulated in response to Pto DC3000 hrcC- and no longer induced upon virulent Pto DC3000 treatment (see Note 19). We also recommend to select pri-MRNA’s that are repressed in response to P to DC 3000 hrcC and no longer repressed in response to P to DC 3000. Such miRNAs may regulate positive regulators of the plant defense response and their inactivation might increase antibacterial resistance. Primer sequences are as follows (see Note 7): pri-miR166a Forward 5¢-GGGACGAACATAGAAAGAGAGAGA-3¢, Reverse 5¢-AATAT GGAGTAAACAGGGAGCAAC-3¢, pri-miR393a Forward 5¢-GA GATAGAGAGTTGAACAAATTCTTC-3¢, Reverse 5¢-GTATCCA TGATAGTTGAGAAATTTGC-3¢; pri-miR396b Forward 5¢-TT AATTAGTTTTCAGAAGAAGGAG-3¢, Reverse 5¢-CTTCAAATC AATATCTTTGGAAAGAA-3¢; pri-miR393b Forward 5¢-ACACCA TTGCTCCCACCTTGAAAGA-3¢, Reverse 5¢-CGCCGTTGATG TCTCCGGTCATG-3¢. To control equal cDNA amount in each reaction, perform a PCR with primers corresponding to Actin2 (At3g18780): Forward 5¢-GCACCCTGTTCTTCTTACCG-3¢ and Reverse 5¢-AACCCTCGTAGATTG GCACA-3¢. 3.4.2. Inhibition of miRNA Function
The function of an miRNA can be inhibited either by overexpressing an miRNA-resistant target or by knocking out the miRNA genes. 1. Overexpressing miRNA-resistant targets: In the case of miR393 functional analysis, we used Arabidopsis transgenic lines that overexpress a myc epitope tagged version of AFB1, which is a TIR1 paralog that is naturally more resistant to miR393-mediated cleavage (1). Therefore, over-expression of AFB1 should have dominant-negative effects upon a putative miR393-mediated defense response. When inoculated with virulent Pto DC3000, AFB1 transgenic lines had higher bacterial titers compared with nontransformed or tir1-1 plants at 4 days post inoculation (dpi) (Fig. 4c). (a) Generate multiple synonymous mutations in the miRNA target site (see Note 20 and 21). For AFB1 overexpressing lines, AFB1 cDNA (that is partially refractory to miR393mediated cleavage) was introduced in the pROKII vector that carries the strong 35 S promoter (12). (b) Generate Arabidopsis stable transgenic lines that overexpress the miRNA-resistant target (see Note 22). (c) Inoculate virulent Pto DC3000 as described in Subheading 3.2.2, but using an inoculum of 105 cfu/mL. (d) Monitor bacterial growth at 2 and 4 dpi. 2. Knock-out of miRNA using insertion lines: This is performed by selecting homozygous T-DNA/transposons insertion lines in the corresponding miRNA locus (e.g., SALK
196
Jay et al.
T-DNA insertion lines available at http://signal.salk.edu/cgi-bin/ tdnaexpress) (see Note 23). 3.4.3. Overexpression of Stress-Responsive miRNAs
Upon inoculation with virulent Pto DC3000, miR393 overexpressing lines, but not the empty vector transformants, displayed lower bacterial titers at 4 dpi (Fig. 4d), indicating that miR393 contributes to resistance against Pto DC3000. Furthermore, no difference in bacterial growth was observed in transgenic lines over-expressing an artificial miRNA directed against GFP mRNA (13), indicating an miR393-specific effect. 1. PCR amplifies an miRNA precursor from Arabidopsis Col-0 genomic DNA by using primers located 20 bp upstream and downstream of the miRNA stem loop of interest. 2. Introduce this PCR product in a GATEWAY® TOPO Entry vector (Invitrogen) according to the manufacturer’s recommendations. 3. Recombine the insert in a GATEWAY Binary destination vector carrying the strong 35 S promoter cassette (pK7WG2D) (see Note 22). 4. Transform Arabidopsis plants with this construct. 5. Characterize the transgenic lines and inoculate these plants with virulent Pto DC3000 as described in Subheading 3.2.2, using an inoculum of 105 cfu/mL. 6. Monitor bacterial growth at 4 dpi.
3.5. Functional Analysis of RNA Silencing Pathways in Plant Basal Defense
To test the role of small RNA pathways in disease resistance, a set of Arabidopsis mutants that are defective in the accumulation of endogenous siRNA and/or miRNAs are inoculated with the non-virulent bacterium Pto DC3000 hrcC- (see Note 24). This bacterial mutant multiplies poorly on wild-type Col0- and La-er-inoculated leaves (Fig. 5a). However, growth as well as disease symptoms of Pto DC3000 hrcC- are significantly enhanced in the miRNA-deficient dcl1-9 and hen1-1 mutants (Fig. 5a and data not shown, see Note 25). Similar effects are observed when dcl1-9 and hen1-1 plants are challenged with the non-host Pseudomonas syringae pv. phaseolicola (Psp) or the non pathogenic Pseudomonas fluorescens Pf-5 and E. coli W3110 strains (Fig. 5b–d). Collectively, these results indicate that the miRNA pathway plays a preponderant role in basal defense. 1. Grow plants at 21–22°C with an 8 h photoperiod. dcl1-9 seedlings are first grown on plates containing 1× MS medium (Duchefa), 1% sucrose, and 0.8% agar with kanamycin selection. Homozygous seedlings are selected based on their developmental phenotype and then transferred to soil at 10 days post germination (dpg). hen1-1 and the corresponding La-er
Biotic Stress-Associated microRNAs
197
Fig. 5. The miRNA pathway is required for antibacterial basal resistance. (a) Pto DC3000 hrcC- growth is specifically enhanced in miRNA-deficient mutants. (b) as (a) but with Pseudomonas syringae pv. phaseolicola (Psp). (c) as (a) but with Pseudomonas fluorescens Pf-5. (d) as (a) but with E. coli W3110.
control are similarly grown on in vitro plates containing solid medium (without selection) and transferred to soil at 10 dpg. The rest of the plants used in this assay are grown directly from soil. La-er, dcl1-9, and hen1-1 6-week-old plants that were inoculated with E. coli W3110 were then grown for 4 days at 28°C with an 8 h photoperiod before bacterial counting.
198
Jay et al.
2. Perform as described in Subheading 3.4.2 with a concentration of 106 cfu/mL for Pto DC3000 hrcC- and Psp or with a concentration of 108 cfu/mL for P. fluorescens Pf-5 and E. coli W3110.
4. Notes 1. Several other methods that are not described herein allow the identification of a large number of stress-associated miRNAs (e.g., production of small RNA libraries coupled with deep sequencing technologies as described in Chaps. 8 and 14). 2. Because miRNA-mediated translational inhibition is a key mechanism by which miRNA silence their targets (14), we anticipate that a similar comparative analysis at the proteome level will be even more informative. For this purpose, we would recommend using proteomic approaches that are sensitive enough to detect little changes in protein levels (e.g., 2D DIGE, differential in gel electrophoresis). 3. This is an important step to avoid the cloning of non-specific PCR products. 4. This approach is particularly useful to identify evolutionarily conserved miRNAs, such as miR393. However, in the case of rapidly evolving miRNAs (‘young miRNAs’), we would recommend performing a BLAST analysis against closely related genomes (for instance, in the case of a young Arabidopsis thaliana miRNA, we would recommend a blast against Arabidopsis halleri and/or Capsella rubella genomes). 5. Instead of cloning the upstream regions of miR393a and b stem loop structures, we would recommend first mapping the start of pri-miR393a and b transcripts by 5´ RACE, and subsequently fusing the corresponding 1.5 kb upstream regions to a reporter gene. Xie et al. already mapped the 5´ start of several other Arabidopsis pri-miRNA transcripts and this available information can be used for this purpose (15). It is important to note that DNA portions of 1.5 kb long were arbitrarily used in the present study. However, we cannot rule out that important cis-regulatory elements are located even upstream of these DNA sequences. 6. A flg22 concentration of 1 mM is high enough to detect miR393 and pri-miR393a induction in seedlings. 7. To monitor pri-miRNA transcripts by semi-quantitative RT-PCR analysis, we use primers that are designed from each part of the miRNA stem loop structures. 8. Alternatively, plant mature miRNAs can be detected and their abundance quantified using specific quantitative RT-PCR
Biotic Stress-Associated microRNAs
199
assays. For instance the TaqMan ® MicroRNA Assay (Applied Biosystem) use a two-step protocol: reverse transcription with a miRNA specific looped-primer, followed by real-time PCR with TaqMan probe. This method is highly specific (quanti fication of only mature miRNAs, with a single base discrimination) and sensitive. 9. Do not use columns from RNA extraction kits that cannot retain small RNAs. 10. For low abundant miRNAs we recommended to use a chemical cross-linking which enhance the detection sensitivity of small RNAs. Such method was previously described (23). 11. To enhance the detection sensitivity of miRNAs, we recommend using the StarfireTM polymerase extension labeling reaction. Briefly, a 3¢ hexamer extension is added to the target-specific oligonucleotide. In a second step, a template oligonucleotide that carries a complementary hexamer extension together with an oligo-dT10 sequence is annealed to the target specific oligonucleotide. Annealed duplexes are then labeled with g-32P-dATP in the presence of Klenow fragment of DNA polymerase. The latter step allows the addition of ten radiolabeled deoxynucleotides per molecule. To increase the detection sensitivity of miRNAs, we also recommend to use Locked-Nucleic acid (LNA) oligonucleotide. 12. When LNA probes are used, we perform the hybridization and washing at 68°C instead of 42°C. 13. The same analysis can be used to calculate the frequency of representation of known stress-responsive cis-regulatory elements in a set of miRNA promoters derived from co-regulated pri-miRNAs. Besides highlighting the cis-regulatory elements within the promoter region of interest, the ‘Replace all’ function will provide the number of changes that occur all through the promoter sequences of interest. For this type of analysis, a file containing promoter sequences derived from miRNAs that are not responsive to biotic stress should be used as a negative control. 14. To get insight into the regulation of stress-responsive miRNAs, we recommend the use of genome-wide approaches that allow detection of pri-miRNA transcripts rather than mature miRNAs. The main reason being that mature miRNAs are often produced by multiple pri-miRNAs which are not always co-regulated (for instance, flg22 activates the transcription of miR393a but not miR393b in seedlings (Fig. 2b)). Therefore, identifying the subset of stress-responsive pri-miRNAs within the same miRNA subfamily will be essential for subsequent promoter analyses. 15. We have evidence that oligonucleotide probes of 60–70 nt long are working well for this purpose (L. Navarro, O. Voinnet, J.-P. Renou, unpublished data).
200
Jay et al.
16. The microarray slides should preferentially contain probes corresponding to protein-coding gene transcripts, some of which being miRNA targets. 17. We would recommend performing an extensive time course experiment. By doing so, the resolution of the subsequent clustering analysis-displaying the expression pattern of coregulated pri-miRNAs-, will be significantly improved. 18. We recommend the collection of samples between 6 h and 12 h post inoculation. 19. We also recommend to select pri-miRNAs that are repressed in response to Pto DC3000 hrcC- and no longer repressed in response to Pto DC3000. Such miRNAs may regulate positive regulators of the plant defense response and their inactivation might increase antibacterial resistance. 20. Mutations located at opposite positions 10-11 of the miRNA are essential to abolish miRNA-guided slicing. Nevertheless, we recommend the generation of as many synonymous mutations as possible along the miRNA-target site. 21. Alternatively, a strategy involving a non-coding RNA that sequesters an miRNA of interest could be used. This principle occurs in nature to negatively regulate the phosphate starvation-induced miRNA miR399 (16). In this particular case, the non-coding RNA Induced by Phosphate Starvation 1 (IPS1), which is refractory to miR399-directed cleavage due to mismatches opposite positions 10-11 of the miRNAs, sequesters miR399. Franco-Zorilla et al. (16) demonstrated that by engineering IPS1 to mimic target sites of miR156 or miR319 an efficient inhibition of these miRNA activities was also obtained. Therefore, we recommend the use of the same strategy to knock-down the activity of biotic stress-associated miRNAs such as miR393. 22. Because the overexpression of miRNAs or miRNA-resistance targets can significantly alter the normal development and physiology of plants, we would also recommend the generation of transgenic lines that conditionally express these entities (e.g., under a dexamethasone or estradiol inducible promoters). 23. This approach is feasible in the case of miR393 where only two loci are present in the Arabidopsis genome. However, this strategy would not be possible in the case of miRNAs that are produced by a large number of miRNA loci (e.g., Arabidopsis miR169, which is produced by 14 loci). 24. To assess the role of RNA silencing pathways in other plants species than Arabidopsis thaliana, we recommend the use of viral-derived suppressors of RNA-silencing (VSRs). This could be achieved by constitutively or conditionally overexpressing different VSRs in the plant of interest. For this purpose,
Biotic Stress-Associated microRNAs
201
we would recommend using the following VSR proteins: P19 from Tomato bushy stunt virus (TBSV), P1-HcPro from Turnip mosaic virus (TuMV), P15 from Peanut clump virus (PCV), or P25 from Potato virus X (PVX). P19, P1-HC-Pro and P15 suppress siRNA and miRNA functions, whereas P25 suppresses specifically siRNA function (3, 4). 25. We cannot rule out that long siRNAs (lsiRNAs) or natural cis-acting siRNAs (nat-siRNAs), which are also DCL1dependent (17, 18), additionally contribute to the observed disease phenotypes. However, we found that mutants altered in the accumulation of lsiRNA and/or nat-siRNA-, but not in miRNA- (e.g., rdr6 which is fully impaired in nat-siRNA accumulation and partially impaired in AtlsiRNA-1 accumulation (17, 18)) did not rescue the growth of Pto DC3000 hrcC- (Fig. 5a, data not shown). This result suggests that lsiRNAs or nat-siRNAs seem not to significantly contribute to basal resistance in this assay. Similarly, we would recommend the use of rdr6, ago7, nrpd1a, and nrpd1b to distinguish the lsiRNA and/or nat-siRNA effects from the miRNA effects in various functional assays.
Acknowledgments The authors thank P. Dunoyer, S. Dharmasiri, M. Estelle and J.D.G Jones for their discussions and contributions to this work. L.N was supported by a long-term Fellowship from the Federation of European Biochemical Societies (FEBS); O.V and F.J by a grant from the trilateral Génoplante-German Plant Genome Research Program-Spanish Ministry of Research; J-P Renou by Génoplante.
References 1. Navarro L, Dunoyer P, Jay F, Arnold B, Dharmasiri N et al (2006) A Plant miRNA contributes to antibacterial resistance by repressing auxin signaling. Science 312:436–439 2. Taganov KD, Boldin MP, Chang KJ, Baltimore D (2006) NF-kappaB-dependent induction of microRNA miR-146, an inhibitor targeted to signaling proteins of innate immune responses. Proc Natl Acad Sci U S A 103: 12481–12486 3. Jagadeeswaran G, Saini A, Sunkar R (2009) Biotic and abiotic stress down-regulate miR398 expression in Arabidopsis. Plant 229:1009–1014
4. Chapman EJ, Prokhnevsky AI, Gopinath K, Dolja VV, Carrington JC (2004) Viral RNA silencing suppressors inhibit the microRNA pathway at an intermediate step. Genes Dev 18:1179–1186 5. Dunoyer P, Lecellier CH, Parizotto EA, Himber C, Voinnet O (2004) Probing the microRNA and small interfering RNA pathways with virus-encoded suppressors of RNA silencing. Plant Cell 16:1235–1250 6. Kanz C, Aldebert P, Althorpe N, Baker W, Baldwin A et al (2005) The EMBL nucleotide sequence database. Nucleic Acids Res 33:D29–D33
202
Jay et al.
7. Eulgem T, Rushton PJ, Robatzek S, Somssich IE (2000) The WRKY superfamily of plant transcription factors. Trends Plant Sci 5: 199–206 8. Lurin C, Andres C, Aubourg S, Bellaoui M, Bitton F et al (2004) Genome-wide analysis of Arabidopsis pentatricopeptide repeat proteins reveals their essential role in organelle biogenesis. Plant Cell 16:2089–2103 9. Yuan J, He SY (1996) The Pseudomonas syringae Hrp regulation and secretion system controls the production and secretion of multiple extracellular proteins. J Bacteriol 178:6399–6402 10. He P, Shan L, Lin NC, Martin GB, Kemmerling B et al (2006) Specific bacterial suppressors of MAMP signaling upstream of MAPKKK in Arabidopsis innate immunity. Cell 125:563–575 11. Li X, Lin H, Zhang W, Zou Y, Zhang J et al (2005) Flagellin induces innate immunity in nonhost interactions that is suppressed by Pseudomonas syringae effectors. Proc Natl Acad Sci U S A 6(102):12990–12995 12. Navarro L, Zipfel C, Rowland O, Keller I, Robatzek S et al (2004) The transcriptional innate immune response to flg22. Interplay and overlap with Avr gene-dependent defense responses and bacterial pathogenesis. Plant Physiol 135:1113–1128 13. Dharmasiri N, Dharmasiri S, Weijers D, Lechner E, Yamada M et al (2005) Plant development is regulated by a family of auxin receptor F box proteins. Dev Cell 9:109–119 14. Parizotto EA, Dunoyer P, Rahm N, Himber C, Voinnet O (2004) In vivo investigation of the transcription, processing, endonucleolytic activity, and functional relevance of the spatial distribution of a plant miRNA. Genes Dev 18:2237–2242 15. Brodersen P, Sakvarelidze-Achard L, BruunRasmussen M, Dunoyer P, Yamamoto YY et al (2008) Widespread translational inhibition by plant miRNAs and siRNAs. Science 320(5880): 1185–1190
16. Xie Z, Allen E, Fahlgren N, Calamar A, Givan SA, Carrington JC (2005) Expression of Arabidopsis MIRNA genes. Plant Physiol 138:2145–2154 17. Franco-Zorrilla JM, Valli A, Todesco M, Mateos I, Puga MI et al (2007) Target mimicry provides a new mechanism for regulation of microRNA activity. Nat Genet 39:1033–1037 18. Katiyar-Agarwal S, Gao S, Vivian-Smith A, Jin H (2007) A novel class of bacteria-induced small RNAs in Arabidopsis. Genes Dev 21:3123–3134 19. Katiyar-Agarwal S, Morgan R, Dahlbeck D, Borsani O, Villegas A Jr et al (2006) A pathogen-inducible endogenous siRNA in plant immunity. Proc Natl Acad Sci U S A 103: 18002–18007 20. Mi S, Cai T, Hu Y, Chen Y, Hodges E et al (2008) Sorting of small RNAs into Arabidopsis argonaute complexes is directed by the 5’ terminal nucleotide. Cell 133: 116–127. 21. Montgomery TA, Howell MD, Cuperus JT, Li D, Hansen JE et al (2008) Specificity of ARGONAUTE7-miR390 interaction and dual functionality in TAS3 trans-acting siRNA formation. 22. Jones-Rhoades MW, Bartel DP (2004) Computational identification of plant micro RNAs and their targets, including a stressinduced miRNA. 23. Pall GS, Hamilton AJ (2008) Improved northern blot method for enhanced detection of small RNA. Nature protocols 3: 1077–1084 24. Gagnot S, Tamby JP, Martin-Magniette ML, Bitton F, Taconnat L et al (2008) CAT db: a public access to Arabidopsis transcriptome data from the URGV-CATMA platform. Nucleic Acids Research 36:D986–D990 25. Navarro L, Jay F, Nomura K, He SY, Voinnet O (2008) Suppression of the miRNA pathway by bacterial effector proteins. Science 321: 964–967
Chapter 14 Abiotic Stress-Associated miRNAs: Detection and Functional Analysis Dong-Hoon Jeong, Marcelo A. German, Linda A. Rymarquis, Shawn R. Thatcher, and Pamela J. Green Abstract MicroRNAs (miRNAs) are small regulatory noncoding RNAs varying in length between 20 and 24 nucleotides. They play a key role during plant development by negatively regulating gene expression at the posttranscriptional level. Moreover, recent studies reported several miRNAs associated with abiotic stress responses. Small RNA cloning and high-throughput deep sequencing methods provide expression profiles of not only known miRNAs, but also novel miRNAs. In this chapter, we describe the methods used to identify and characterize abiotic stress-associated miRNAs and their target genes. Key words: MicroRNA, Abiotic stress, RLM 5¢-RACE, PARE target library
1. Introduction In plants, small RNAs (20–24 nt), including microRNAs (miRNAs) and short interfering RNAs (siRNAs), are involved in gene regulation through translation inhibition, mRNA cleavage, or directing chromatin modifications (1–3). MicroRNA molecules originate from distinct genomic loci predicted to form “hairpin” structures that often have an imperfect double-stranded characteristic. They are cleaved from the hairpin as a miRNA/miRNA* duplex by a DICER-LIKE protein (DCL1), and the miRNA strand of this duplex becomes associated with an AGO protein in a complex called miRNP (4, 5). In plants, complementarity between the miRNAs and their targets almost always directs the miRNPs to cleave the target mRNAs, accelerating their degradation (6). Many functional studies have demonstrated that plant miRNAs play important roles in various developmental processes, B.C. Meyers and P.J. Green (eds.), Plant MicroRNAs, Methods in Molecular Biology, vol. 592 DOI 10.1007/978-1-60327-005-2_14, © Humana Press, a part of Springer Science + Business Media, LLC 2009
203
204
Jeong et al.
such as organ boundary formation, organ polarity/radial patterning, and in development of root, stem, leaf, and floral organs (1). Also, recent reports indicate that miRNAs are associated with abiotic stress responses and nutrient homeostasis. For example, functional studies using transgenic plants characterized the function of miR399 and miR398, which play important roles in phosphate starvation and oxidative stress, respectively (7, 8). In addition, dcl1 and hen1 mutants are hypersensitive to abiotic stresses (9). HEN1 methylates miRNAs and siRNAs, which is thought to protect them against degradation [(4) and Chapter 10]. The effect of these mutants implies some miRNAs processed by DCL1 and HEN1 have important roles in abiotic stresses. Other studies report that miRNAs are regulated in response to cold, drought, salt, UV-B radiation, phosphate or sulfate starvation, oxidative stress, or mechanical strain (7, 8, 10–15). Some examples are shown in Table 1. To identify and profile small RNA populations including miRNAs, several sequencing approaches have emerged. Previous conventional techniques were based on the cloning of single molecules or longer concatamers that are subsequently sequenced. Recently, high-throughput sequencing methods such as massively parallel signature sequencing (MPSS), 454 sequencing, and sequencing-by-synthesis (SBS) technology have made it possible to access the full complexity of small RNAs in plants (16–19). Additionally, deep sequencing provides quantitative expression information, since the cloning frequency of an individual small RNA generally reflects its relative abundance in the sample. In this chapter, we describe methods and strategies to identify abiotic stress-regulated miRNAs by deep sequencing. Using examples from model systems, we include conditions and strategies for stress treatment. Additionally, to analyze the function of the miRNAs, we provide methods for validation and analysis of target genes using Northern blots, RNA ligase-mediated (RLM) 5¢-RACE, and the genome-wide identification of miRNA-target RNA pairs using a recently developed approach called Parallel Analysis of RNA Ends (PARE). A summary of the overall experimental design is provided in Fig. 1.
2. Materials 2.1. Abiotic Stress Treatment
1. Arabidopsis (Col-0) seeds and rice (Oryza sativa, Nipponbare cv.) seeds. 2. 300 mM NaCl. 3. Murashige and Skoog (MS) medium: 20.6 mM NH4NO3, 18.8 mM KNO3, 0.08 mM H3BO3, 0.05 mM KI, 0.001 mM
MS S-deficient media for 2 weeks
MS S-deficient media for 5 days
Arabidopsis
Rice
MS P-deficient media for 5 days
Rice
Sulfate starvation
MS P-deficient media for 2 weeks
Arabidopsis
Phosphate starvation
28°C for 24 h 42°C for 8 h
4°C under light for 8 h
Rice
Arabidopsis Rice
4°C under light for 8 h
Arabidopsis
300 mM NaCl for 8 h
Rice
Heat
Cold
300 mM NaCl for 8 h
Arabidopsis
Expose to air for 8 h
Rice
Salt
Expose to air for 4 h
Arabidopsis
Drought (desiccation)
Condition
Species
Stress
miR399
miR165, miR172, miR169, miR319, miR389, miR393, miR396, miR397, miR402
miR389, miR393, miR396, miR397, miR402
miR389, miR393, miR396, miR397, miR402
Known responsive miRNAs
OsASP1 (LOC_Os03g53230)
APR1 (At4g04610)
miR395
miR395
OsRNS1 (LOC_Os07g43670) miR399 OsACP1 (LOC_Os01g52230)
RNS1 (At2g02990)
HsfA2 (At2g26150) SPL7 (LOC_Os05g45410)
OsWRKY71 (LOC_ Os02g08440)OsMAPK2 (LOC_Os03g17700)
Rd29A (At5g52310)CBF3 (At4g25480)
Rd29A (At5g52310) COR15A(At2g42540) SalT (LOC_Os01g24710) OsLEA3 (LOC_Os05g46480)
SalT (LOC_Os01g24710) OsLEA3 (LOC_Os05g46480)
Rd29A (At5g52310)
Inducible genes
Table 1 Examples of stress conditions known to regulate miRNAs in model systems
(10)
(10, 43)
(40–42) and Fig. 3
(14, 39)
(37) (38)
(35, 36) and Fig. 2
(11, 13, 30, 34)
(31, 32) and Fig. 2
(13, 30, 33)
(31, 32) and Fig. 2
(13, 30)
References
Abiotic Stress-Associated miRNAs 205
206
Jeong et al. Abiotic stress treatment -Test different stress conditions (time and concentration) -Monitor of known stress-responsive genes
Small RNA library construction and Deep sequencing
Identification of regulated miRNAs -Mining of inducible or repressible known and new candidate miRNAs by data analysis of sequencing -Validation of expression pattern by northern blot, analysis of biological replicates
Identification and characterization of target genes -Validation by RNA ligase-mediated 5’-RACE -PARE target library construction and analysis -Examination of target gene expression -Transgenic approaches over and/or under express miRNAs and target RNAs
Fig. 1. Overall experimental design for analyzing the roles of miRNAs in abiotic stress responses.
Na2MoO42H2O, 0.0001 mM CoCl26H2O, 3 mM CaCl22H2O, 1.5 mM MgSO47H2O, 0.1 mM MnSO4H2O, 0.03 mM ZnSO47H2O, 0.0001 mM CuSO45H2O, 0.1 mM Na2 EDTA, 0.1 mM FeSO47H2O, 1.25 mM KH2 PO4, 4.1 pM Nicotianic acid, 2.4 pM Pyridoxine-HCl, 0.3 pM Thiamine–HCl, 30 g/L sucrose, 0.1 g/L myo-inositol (pH 5.7). 4. MS P-deficient medium: KH2 PO4 was omitted in the MS media. 20.6 mM NH4NO3, 18.8 mM KNO3, 0.08 mM H3BO3, 0.05 mM KI, 0.001 mM Na2MoO42H2O, 0.0001 mM CoCl26H2O, 3 mM CaCl22H2O, 1.5 mM MgSO47H2O, 0.1 mM MnSO4H2O, 0.03 mM ZnSO47H2O, 0.0001 mM CuSO45H2O, 0.1 mM Na2 EDTA, 0.1 mM FeSO47H2O, 4.1 pM Nicotianic acid, 2.4 pM Pyridoxine-HCl, 0.3 pM Thiamine–HCl, 30 g/L sucrose, 0.1 g/L myo-inositol (pH 5.7). 5. MS S-deficient medium: All the SO4 was substituted with Cl2 in the MS media. 20.6 mM NH4NO3, 18.8 mM KNO3, 0.08 mM H3BO3, 0.05 mM KI, 0.001 mM Na2MoO42H2O, 0.0001 mM CoCl26H2O, 3 mM CaCl22H2O, 1.5 mM Mg Cl27H2O, 0.1 mM MnCl2H2O, 0.03 mM ZnCl27H2O, 0.0001 mM CuCl25H2O, 0.1 mM Na2 EDTA, 0.1 mM FeSO47H2O, 1.25 mM KH2 PO4, 4.1 pM Nicotianic acid, 2.4 pM Pyridoxine–HCl, 0.3 pM Thiamine–HCl, 30 g/L sucrose, 0.1 g/L myo-inositol (pH 5.7).
Abiotic Stress-Associated miRNAs
207
6. Gelling reagents: 8g/L phytoagar (for Arabidopsis, RPI Corp, Mountain Prospect, IL), 2g/L phytagel (for rice, Sigma, St Louis, MO). 2.2. SplintedLigation-Based miRNA Detection
1. Splinted-ligation-based miRNA detection kit such as miRtectIT™ miRNA Labeling and Detection Kit (USB, Cleveland, OH), or: (a) Ligation oligonucleotide : 5¢-CGCTTATGACATTC/ dideoxyC/-3¢ (Integrated DNA Technologies, Coralville, IA) (b) OptiKinase (10 U/µL) (USB, Cleveland, OH). (c) 10× OptiKinase reaction buffer (USB, Cleveland, OH). (d) 10× capture buffer (100 mM Tris–HCl (pH7.5), 750 mM KCl) (USB, Cleveland, OH). (e) PrepEase sequencing dye clean-up kit (USB, Cleveland, OH). (f) Ligate-IT rapid ligation kit (USB, Cleveland, OH). (g) Shrimp alkaline phosphatase (1U/µL) (Roche Diagnostics GmbH, Mannheim, Germany). (h) 2× Formaldehyde loading dye (95% formaldehyde, 20 mM EDTA, 0.025% bromophenol blue and 0.025% xylene cyanol). 2. Bridge oligonucleotide: see Subheading “Bridge oligonucleotide design” (Integrated DNA Technologies, Coralville, IA). 3. Low molecular weight markers, 10–100 nt (USB, Cleveland, OH). 4. [g-32P]-ATP (6,000 Ci mM) (Perkin Elmer). 5. 40% Acrylamide/bis solution (29:1) (Ambion, Austin, TX). 6. Urea (USB, Cleveland, OH). 7. Glycerol tolerant gel buffer, 20× (USB, Cleveland, OH). 8. 10% Ammonium persulfate. 9. TEMED (Bio-Rad, Hercules, CA). 10. Millex-HA 0.45-µm filter (Millipore, Billerica, MA).
2.3. RNA LigaseMediated 5¢-RACE
1. Nuclease-free (and sterile) water. 2. Nuclease-free (and sterile) tubes. 3. 5¢ RACE kit, such as FirstChoice RLM-RACE kit (Ambion, Austin, TX) or GeneRacer kit (Invitrogen, Carlsbad, CA), or: (a) RNA oligo of 40 nt or longer (Integrated DNA technologies, Coralville, IA). (b) T4 RNA ligase (Promega, Madison, WI).
208
Jeong et al.
(c) Two primers specific to the RNA oligo (Integrated DNA technologies, Coralville, IA). 4. Oligo(dT)12–18, or gene-specific primer (Integrated DNA technology, Coralville, IA). 5. RNAseOUT (40 U/µL) (Invitrogen, Carlsbad, CA). 6. SuperScript II RT (200 U/µL) (Invitrogen, Carlsbad, CA). 7. dNTPs, 10 mM each. 8. One or two primers antisense to the putative miRNA target. 9. Choice-Taq DNA Polymerase (Denville Scientific, Inc., Metuchen, NJ). 10. Agarose. 11. 10× TBE buffer. 12. Ethidium bromide (10 mg/mL). 13. Gel extraction kit, such as NucleoCentrifuge Extract II (Macherey-Nagel, Bethlehem, PA). 14. pGEM-T easy vector system II (Promega, Madison, WI), or TOPO TA cloning kit (Intritogen, Carlsbad, CA). 15. Antibiotics. 16. LB (Luria Broth). 17. Plasmid DNA extraction kit, such as NucleoSpin Plasmid (Mackerey-Nagel, Bethlehem, PA). 2.4. PARE Target Library
1. Nuclease-free sterile water. 2. Nuclease-free sterile tubes. 3. T4 RNA ligase (Ambion, Austin, TX). 4. 10× T4 RNA ligase buffer (Promega, Madison, WI). 5. RNAseOUT (40 U/µL) (Invitrogen, Carlsbad, CA). 6. SuperScript II RT (200 U/µL) (Invitrogen, Carlsbad, CA). 7. dNTPs. 8. 5¢RNA adapter: 5¢- GUUCAGAGUUCUACAGUCCGAC-3¢ (Dharmacon Inc., USA). 9. Phenol/chloroform/isoamyl alcohol (25:24:1). 10. Chloroform/isoamyl alcohol (24:1). 11. Glycoblue (Ambion, Austin, TX). 12. 3 M NaOAc. 13. Ethanol. 14. Oligotex kit (Qiagen, Gaithersburg, MD, USA). 15. 3¢ oligo(dT) primer (Integrated DNA Technologies, Coralville, IA). 16. 0.1 M DTT.
Abiotic Stress-Associated miRNAs
209
17. 5× PCR buffer (Finnzymes, Espoo, Finland). 18. 5¢ adapter primer: 5¢-GTTCAGAGTTCTACAGTCCGAC-3¢ (Integrated DNA Technologies, Coralville, IA). 19. 3¢ adapter primer: 5¢-CGAGCACAGAATTAATACGACT-3¢ (Integrated DNA Technologies, Coralville, IA). 20. Phusion (2 U/µL) (Finnzymes, Espoo, Finland). 21. NEB #4 buffer (New England Biolabs, Ipswich, MA). 22. 10× SAM (500 µM) (New England Biolabs, Ipswich, MA). 23. MmeI (2 U/µL) (New England Biolabs, Ipswich, MA). 24. Shrimp alkaline phosphatase (1 U/µL) (Roche Diagnostics GmbH, Mannheim, Germany). 25. Acrylamide (Ambion, Austin, TX). 26. 10% Ammonium persulfate (APS) buffer. 27. TEMED (Bio-Rad, Hercules, CA). 28. 6× DNA loading buffer containing bromophenol blue and xylene cyanol. 29. 10-bp DNA ladder (Invitrogen, Carlsbad, CA). 30. 5 M NaCl. 31. Millex-HA 0.45-µm filter (Millipore, Billerica, MA). 32. Microcon columns (Millipore, Billerica, MA). 33. Rapid Ligation Kit (Roche Diagnostics GmbH, Mannheim, Germany). 34. Double-stranded DNA adapter: top, 5¢-p-TCGTATGCCGTCTTCTGCTTG -3¢and bottom, 3¢-NNAGCATACGGC AGAAGACGAAC-5¢. p, phosphate group. (Integrated DNA Technologies, Coralville, IA). 35. P5 primer: 5¢-ATGATACGGCGACCACCGACAGGTTCAG AGTTCTACAGTCCGA-3¢ (Integrated DNA Technologies, Coralville, IA). 36. P7 primer: 5¢-CAAGCAGAAGACGGCATACGA-3¢ (Integr ated DNA Technologies, Coralville, IA). 37. Glycogen (Ambion, Austin, TX).
3. Methods 3.1. Abiotic Stress Treatment
In order to avoid unintended biotic or abiotic stresses, plants for the experiments, we describe, are grown on a defined growth medium instead of soil. It is important to monitor the expression of a known stress-regulated gene (Fig. 2) to verify that the stress treatment is effective and determine which specific conditions, such
210
Jeong et al. 0
2
4
8
12
24 (h) SalT
drought RBCS
SalT
salt RBCS
cold
OsWRKY71
Fig. 2. Use of known stress-regulated genes to monitor and select stress conditions for library construction. Total RNAs were isolated from 2-week-old rice seedlings treated with drought, salt, or cold during a 24-h time course. Stress inducible (SalT and OsWRKY71) or repressible (RBCS) gene expression patterns were examined to choose optimal conditions for library construction. In these examples, 8-h treatment was selected for small RNA library construction.
as duration of the treatment, are appropriate. Rather than describing all possible abiotic stress conditions, we will describe a few examples from rice and Arabidopsis (Table 1). For rice, the stress treatments below are given for plants initiated in parallel 14 days after sowing on MS agar media and incubated in a growth chamber with 12 h light at 28°C/12 h dark at 25°C. For Arabidopsis, the stress is imposed on 2-week-old seedlings grown on MS agar plates for 2 weeks in an incubator set at 16 h light /8 h dark and 21°C. The entire harvest procedure is completed in less than 10 min. The RNA samples for control should be generated from nonstressed plants, which are handled in exactly the same way but not exposed to stress conditions. 3.1.1. Cold Stress
Transfer 2-week-old rice or Arabidopsis seedlings to 4°C in a cold room under continuous light for 8 h. Cold stress is more effective under light conditions.
3.1.2. Heat Stress
Transfer 2-week-old seedlings to an incubator set to 42°C for 8 h (rice) or 28°C for 24 h (Arabidopsis). Humidity and light/dark cycle should be identical to the control conditions.
3.1.3. Drought Stress
Desiccation can be used as a proxy for drought. Remove 2-weekold rice seedlings from media and expose to air in an incubator for 8 h. For Arabidopsis, gently pull 2-week-old seedlings from MS media and expose their roots for 4 h.
3.1.4. Salt Stress
Transfer 2-week-old rice seedlings to 300 mM NaCl solutions for 8 h. For Arabidopsis, pour a solution of concentrated NaCl
Abiotic Stress-Associated miRNAs
211
onto plates containing 2-week-old Arabidopsis seedlings and soak for 8 h. 3.1.5. Sulfate Starvation
Remove the endosperm from 9-day-old rice seedlings to avoid nutrient transport. Transfer the seedlings to MS S-deficient media (for sulfate starvation) or MS media (for control). After 5 days of treatment, separate shoots and roots for independent analysis. This will help us understand the roles of miRNAs in primary uptake of sulfate by roots, but also those in internal translocation of sulfate by shoots under sulfate-deficient conditions. For Arabidopsis, plate seeds on the MS media with varying concentrations of sulfate and grow under the stress for the full 2 weeks.
3.1.6. Phosphate Starvation
Remove the endosperm from 9-day-old rice seedlings to avoid nutrient transport. Transfer the seedlings to MS P-deficient media (for phosphate starvation) or MS media (for control). After 5 days of treatment, separate shoots and roots for independent analysis. For Arabidopsis, plate seeds onto MS media with varying concentrations of phosphate and grow for the full 2 weeks.
3.2. Small RNA Library Construction and Sequencing
Small RNA populations in plants are so vast and complex that small RNA library construction and deep sequencing is necessary to elucidate the identity, regulation, and function of miRNAs. Additionally, by using deep sequencing methods such as MPSS, 454, or SBS, the number of times a small RNA is sequenced from a small RNA library provides a reliable indicator of the relative abundance of that small RNA. To identify abiotic stress responsive miRNAs, small RNA libraries are constructed from stresstreated plant tissues and controls (see Notes 1 and 2). Construction of small RNA libraries is described in Chapter 8. Typically, 10 rice or 100 Arabidopsis seedlings are more than sufficient to yield enough total RNA (isolated as described in Chapter 3) for small RNA isolation and the preparation of one or more libraries.
3.3. Computational Analysis and Validation of Abiotic Stress-Associated miRNAs
In order to find known or novel miRNAs associated with abiotic stress responses, deep sequencing data can be analyzed using a series of computational tools. First, to obtain the small RNA sequences, the adaptor sequences are removed with trimming scripts (see Chapter 7). Trimmed sequences are then matched against the genome to remove likely contaminants. These could derive from fungal, bacterial, or viral sources, especially in plants not grown under sterile conditions, but also could be the result of instrument sequencing errors or sequences that derive from unsequenced regions of the genome such as centromeres or ribosomal repeats. Next, to streamline the analysis, sequences matching to noncoding RNAs – such as rRNAs, tRNAs, small nuclear RNAs, and small nucleolar RNAs – are typically removed, as are those matching to chloroplast or mitochondrial genomes. It is thought that most of these sequences represent nonspecific decay products
3.3.1. Computational Analysis of Abiotic Stress-Associated miRNAs
212
Jeong et al.
of the corresponding RNAs, though some may be biologically interesting and could be studied subsequently if the reason arises. The abundance of each sequence is represented by the percentage of its abundance relative to that of the total library, rather than by a raw abundance, according to the following formula: normalized abundance (TPM) = raw abundance/(total genome match – t/r/sn/snoRNA/chloroplast/mitochondria)/total reads in library × 1,000,000 (see Note 3). Candidate small RNAs regulated by abiotic stress can be isolated based on the ratio of normalized abundance values between control and abiotic stress libraries. To minimize noise from technical bias, it is preferable to select sequences showing the greatest difference between libraries for further functional analysis, preferably exhibiting differential expression of several fold or more. Similar to microarrays, replicates offer a means of decreasing biological and technical variation and, with the increasing prevalence of instrumentation, are now becoming more affordable. This approach may be particularly useful if the goal is to accurately characterize the population of small RNAs that change the abundance in response to an abiotic stress, but could also be helpful for identifying the best examples to examine with functional studies. The miRNAs that are found to be regulated in response to a given stress could be of three types (1) known miRNAs that are known to be regulated by the stress, (2) known miRNAs that are newly discovered and to be regulated by the stress, and (3) new miRNAs that are regulated by the stress. MicroRNAs of the first type are good positive controls and also could reveal unknown aspects of the regulation, such as tissue or organ specificity. However, the second and third types are likely the most interesting. Those of the second type are easy to identify, and probably have known targets that can be investigated for biological function in stress responses as discussed below. Accordingly, examination of the abundances of known miRNAs is useful to include as early analysis of small RNA data sets. Particular attention should be paid to putative novel miRNA sequences that are regulated by a given stress. To minimize the number of candidates requiring manual and experimental analysis, we typically filter the sequences for those with high abundance (over 100 TPM), few hits on genome (