CHAIVER
1
DNA Sequencing Hugh G. Grifin
and Annette M. Grifin
1. Introduction Methods to determine the sequenceof DN...
42 downloads
908 Views
21MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
CHAIVER
1
DNA Sequencing Hugh G. Grifin
and Annette M. Grifin
1. Introduction Methods to determine the sequenceof DNA were developed in the late 1970s (1,2) and have revolutionized the science of molecular genetics. The DNA sequences of many different genes from diverse sources have been determined, and the information is stored in international databanks such as EMBL, GenBank, and DDBJ. Many scientists now accept that sequence analysis will provide an increasingly useful approach to the characterization of biological systems. Projects are already underway to map and sequencethe entire genome of organisms such as Escherichia coli, Saccharomyces cerevisiae, Caenorhabditis elegans, and Homo sapiens. In the recent past, large-scale sequencing projects such as these were often dismissed as prohibitively expensive and of little short-term benefit, while DNA sequencing itself was seen as a repetitive and unintellectual pursuit. However, this view is now changing and most scientists recognize the importance of DNA sequence data and perceive DNA sequencing as a valuable and often indispensable aspect of their work. Recent technological advances, especially in the area of automated sequencing, have removed much of the drudgery that used to be associated with the technique, and modern innovative computer software has greatly simplified the analysis and manipulation of sequence data. Large-scale sequencing From Methods m Molecular Biology, Vol. 23. DNA Sequencmg Protocols Edited by. H. and A. Gnffm Copyright Q1993 Humana Press Inc., Totowa,
1
NJ
2
Griffin
and Griffin
projects, such as the Human Genome Project, produce the DNA sequences of many unknown genes. Such data provide an impetus for molecular biologists to apply the techniques of reverse genetics to produce probes and antibodies that can be used to identify the gene product, its cellular location, and its time of appearancein the developing cell (3). A function can be assigned by mutant analysis or by comparison of the deduced amino acid sequence with proteins of known function. Therefore, DNA sequencing can act as a catalyst to stimulate future research into many diverse areas of science. The two original methods of DNA sequencingdescribed in 1977 (1,2) differ considerably in principle. The enzymatic (or dideoxy chain termination) method of Sanger (I) involves the synthesis of a DNA strand from a single-stranded template by a DNA polymerase. The Maxam and Gilbert (or chemical degradation) method (2) involves chemical degradation of the original DNA. Both methods producepopulations of radioactively labeled polynucleotides that begin from a fixed point and terminate at points dependent on the location of a particular base in the original DNA strand. The polynucleotides are separated by polyacrylamide gel electrophoresis, and the order of nucleotides in the original DNA can be read directly from an autoradiograph of the gel (4). Although both techniques are still used today, there have been many changes and improvements to the original methods. While the chemical degradation method is still in use, the enzymatic chain termination method is by far the most popular and widely used technique for sequence determination. This process has been automated by utilizing fluorescent labeling instead of radioactive labeling (Chapters 33-37), and the concepts of polymerase chain reaction (PCR) technology have been harnessed to enable the sequencing reaction to be “cycled” (Chapters 21, 26, and 34). Other recent innovations include multiplexing (Chapter 28), sequencing by chemiluminescence rather than radioactivity (Chapter 29), solid phasesequencing (Chapter 25), and the use of robotic work stations to automate sample preparation and sequencing reactions (Chapter 38). 2. Maxam and Gilbert Method In the original Maxam and Gilbert method (2) a fragment of DNA is radiolabeled at one end and then partially cleaved in four different chemical reactions, each of which is specific for a particular base or
DNA Sequencing
3
type of base. This results in four populations of labeled polynucleotides. Each radiolabeled molecule extends from a fixed point (the radiolabeled end) to the site of chemical cleavage, which is determined by the DNA sequence of the original fragment. Since the cleavage is only partial, each population consists of a mixture of molecules, the lengths of which are determined by the base composition of the original DNA fragment. The four reactions are electrophoresed in adjacent lanes through a polyacrylamide gel. The DNA sequence can then be determined directly from an autoradiograph of the gel. The original method has been improved over the years (5). Additional chemical cleavage reactions have been devised (6), new end-labeling techniques developed (7,8), and shorter, simplified protocols have been produced (Chapter 32). The main advantage of chemical degradation sequencing is that sequence is obtained from the original DNA molecule and not from an enzymatic copy. It is therefore possible to analyze DNA modifications such as methylation, and to study protein/DNA interactions. Chemical sequencing also enables the determination of the DNA sequence of synthetic oligonucleotides. However, the Sanger method is both quicker and easier to perform and must remain the method of choice for most sequencing applications. 3, Sanger Method The Sanger (or chain termination) method (I) involves the synthesis of a DNA strand from a single-stranded template by a DNA polymerase. The method depends on the fact that dideoxynucleotides (ddNTPs) are incorporated into the growing strand in the same way as the conventional deoxynucleotides (dNTPs). However, ddNTPs differ from dNTPs becausethey lack the 3’-OH group necessaryfor chain elongation. When a ddNTP is incorporated into the new strand, the absence of the hydroxyl group prevents formation of a phosphodiester bond with the succeeding dNTP and chain elongation terminates at that position. By using the correct ratio of the four conventional dNTPs and one of the four ddNTPs in a reaction with DNA polymerase, a population of polynucleotide chains of varying lengths is produced. Synthesis is initiated at the position where an oligonucleotide primer anneals to the template, and each chain is terminated at a specific base (either A, C, G, or T depending on which ddNTP was used). By using the four different ddNTPs in four separate reactions the com-
4
Griffin
and Griffin
plete sequence information can be obtained. One of the dNTPs is usually radioactively labeled so that the information gained by electrophoresing the four reactions in adjacent tracks of a polyacrylamide gel can be visualized on an autoradiograph. The original method used the Klenow fragment of DNA polymerase I to synthesize the new strands in the sequencing reactions, and this enzyme is still used today (Chapter 12). Other enzymes such as Sequenase (Chapter 14), T7 polymerase (Chapter 13), and Taq polymerase (Chapter 15) are also widely used. Each enzyme has its own particular properties and qualities, and the choice of polymerase will depend on the type of template and the sequencing strategy employed. 4. Templates
for DNA Sequencing
The polymerase reaction requires single-stranded template. This is usually achieved by utilizing Ml3 phage that can produce large amounts of just one strand of DNA as part of its normal replicative cycle. Double stranded (replicative form) Ml3 can also be isolated, and this is used to clone the DNA fragment to be sequenced. The qualrty of DNA sequencedata achieved using Ml3 template is extremely good and many researchers prefer to subclone to Ml3 prior to sequencing. Sequencing reactions can be performed directly on plasmid DNA, the double stranded molecule being denatured prior to sequencing. Recent innovations in DNA purification techniques and the availability of improved polymerases have greatly enhanced the quality of data produced by plasmid sequencing methods (Chapters 14, 18, and 19). Sequence determination can also be performed directly on cosmid clones (Chapter 21), lambda clones (Chapter 20), and on PCR products (Chapters23-25). In particular, the adventof cycle sequencing(Chapter 26) has vastly increased the range of templates that can be used. 5. Sequencing
Strategies
A lot of sequencing performed is confirmatory sequencing to check the orientation or the structure of newly constructed plasmids, or to determine the sequence of mutants. This type of sequencing can be easily achieved by subcloning a restriction fragment into Ml3 and sequencing using the universal primer. Alternatively, a custom-designed oligonucleotide primer can be synthesized and sequencing performed without the need for any subcloning.
DNA Sequencing
5
The determination of long tracts of unknown sequence, however, requires careful planning and the utilization of one of a variety of strategies including: the shotgun approach, directed sequencing strategies, and the gene walking technique. A random, or shotgun, approach involves subcloning random fragments of the target DNA to an appropriate vector such as Ml3 (Chapter 7). Sequencesfrom these recombinants are determined at random until the individual readings can be assembled into a contiguous sequence. This is achieved using a sequence assembly computer program (9,lO). The disadvantage of this method is the redundancy in the sequence data obtained, each section of DNA being sequenced several times over. However, the strategy benefits from making no prior assumptions about the DNA to be sequenced, such as base composition or the presence of certain restriction sites. Directed strategies usually involve the construction of a nested set of deletions of the fragment to be sequenced. Progressive deletions of the fragment are generatedwith a nuclease,eachdeletion being approximately 200-300 bp. Following deletion the fragments are recloned into Ml3 or a plasmid vector adjacent to the universal primer site. The subclones are then sequenced in order of size, with the sequence of each clone overlapping slightly with the one before. In this way, a large tract of contiguous sequenceis determined on one gel. The disadvantage is the labor and time involved in constructing the deletions. Several methods are available for deletion construction including the use of exonuclease III (Chapter 8), T4 DNA polymerase (Chapter 9), and DNase I (Chapter 10). It is essential to sequence both strands of the DNA and this usually entails generating two sets of deletions. Perhaps the simplest method of sequencing is the gene walking technique (Chapter 11). This involves the initial sequencing of approximately 200-400 bp of the end of a cloned fragment using the universal primer (the sequence of the other end can be achieved with the reverse primer). This sequence information is then used to design a new oligonucleotide primer, which will provide the sequence of the next 200-400 bp, and so on across the entire length of the insert. This method is the least labor intensive because no deletion construction or generation of random clones is necessary, and template DNA can be made in the one batch since the template is the same for all sequencing reactions. However, the delay involved in synthesizing a new oli-
Griffin
and Griffin
gonucleotide primer before the next reaction can be performed may considerably prolong the time taken to sequence a long tract of DNA. The cost of oligonucleotide synthesis may also be prohibitive. 6. Automation
in DNA Sequencing
One of the major advances in sequencing technology in recent years is the development of automated DNA sequencers. These are based on the chain termination method and utilize fluorescent rather than radioactive labels. The fluorescent dyes can be attached to the sequencing primer, to the dNTPs, or to the terminators, and are incorporated into the DNA chain during the strand synthesis reaction mediated by a DNA polymerase (e.g., Klenow fragment of DNA polymerase I, Sequenase, or Taq DNA polymerase). During the electrophoresis of the newly generated DNA fragments on a polyacrylamide gel a laser beam excites the fluorescent dyes. The emitted fluorescence is collected by detectors and the information analyzed by computer. The data are automatically converted to nucleotide sequence. Several such instruments are now commercially available and are becoming increasingly popular (11; Chapters 33-37). Other aspects of the sequencing procedure that are being automated include template preparation and purification, and the sequencing reactions themselves. Robotic workstations are currently being developed to perform these tasks (Chapter 38). 7. Cycle Sequencing Cycle sequencing is a new and innovative approach to dideoxy sequencing. Its advantages over conventional sequencing techniques are that the reactions are simpler to set up, less template is required, the quality and purity of template are not as critical, and virtually any single- or double-stranded DNA can be sequenced (including lambda, cosmid, plasmid, phagemid, M13, and PCR product). In this method, a single primer is used to linearly amplify a region of template DNA using Taq polymerase in the presence of a mixture of dNTPs and a ddNTP. Either radioactive or fluorescent labels can be used, making cycle sequencing technology as relevant to automated processes as it is to manual methods (Chapters 26, 34-36). As in conventional dideoxy sequencing methods, cycle sequencing involves the generation of a new DNA strand from a single-stranded
DNA Sequencing template, synthesis commencing at the site of an annealed primer, and terminating on the incorporation of a ddNTP. The difference is that the reaction occurs not just once but 20-30 times under the control of a thermal cycler (or PCR machine). This results in more and better sequence data from less template. The process of denaturing a doublestranded molecule is eliminated, with denaturation occurring automatically in the thermal cycler. The development of cycle sequencing techniques has made a major contribution to DNA sequencing methodology, improving the reliability and efficiency of DNA sequence determination and eliminating time-consuming steps. 8. Aim of This Book The purpose of this book is to provide detailed practical procedures for a number of DNA sequencing techniques. Although protocols for DNA sequencing methods are available elsewhere, there was a need for a book that comprehensively covered the vanguard techniques now being applied in this rapidly evolving field. Each contribution is written so that a competent scientist who is unfamiliar with the method can carry out the technique successfully at the first attempt by simply following the detailed practical procedures that have been described by each author. Even the simplest techniques occasionally go wrong, and for this reason a “Notes” section has been included in most chapters. These notes will indicate any major problems or faults that can occur, their sources, and how they can be identified and overcome. Since the purpose of this book is to describe practical procedures and not to go into great depth regarding theory, a comprehensive reference section is included in most chapters, enabling the reader to refer to other publications for more detailed theoretical discussions on the various techniques. References 1. Sanger, F., Nicklen, S., and Coulson, A. R. (1977) DNA sequencing wrth chaintermmator inhibitors. Proc. Natl. Acad. Sci USA 74, 5463-5467. 2. Maxam, A. M and Gilbert, W (1977) A new method for sequencing DNA Proc. Natl. Acad. Scl USA 74,560-564.
3. Barrell, B. (1991) DNA sequencmg: present limitations and prospects for the future. FASEB J 5,40-45 4. Sambrook, J., Frrtsch, E F., and Maniatrs, T. (1989) Molecular Cloning. ,4 Laboratory Manual 2d ed , Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.
Griffin
and Griffin
5 Maxam, A. M. and Gilbert, W. (1980) Sequencing end-labeled DNA with basespecific chemical cleavages. Meth. Enzym. 65,499-560. 6. Ambrose, B J. B. and Pless, R. C. (1987) DNA sequencing. Chemical methods. Meth. Enzymol. 152,522-538. 7. Volckaert, G. (1987) A systematic approach to chemical DNA sequencing by subcloning in pGV451 and derrved vectors. Meth Enzym. 155,23 l-250 8 Eckert, R. L. (1987) New vectors for rapid sequencing of DNA fragments by chemical degradation. Gene 51,247-254. 9. Dolz, R. (1993) Fragment assembly programs, in DNA sequencmg: Computer Analysis ofSequence Data, (Griffin, A. M. and Grrffm, H. G., eds.), Humana Press, Totowa, NJ. (Ch. 2). 10 Staden, R. (1992) Managing sequencing projects, m DNA sequencing. Computer Analysis of Sequence Data, (Griffin, A. M. and Griffin, H. G., eds.), Humana Press, Totowa, NJ. (Ch 17). 11 Hunkapiller, T , Karser, R J., Kopp, B F , and Hood, L. (1991) Large-scale and automated DNA sequence determination. Science 254,59-67
&IAE’TER
Ml3 Cloning Their Contribution
2
Khicles to DNA Sequencing
Joachim
Messing
1. Introduction For studies in molecular biology, DNA purification has been essential, in particular for DNA sequencing, probing, and mutagenesis. The amplification of DNA in E. coli by cloning vehicles derived from M13mp or pUC made expensive physical separation techniques like ultracentrifugation unnecessary.Although today the polymerase chain reaction is a valuable alternative for the amplification of small DNA pieces (I), it cannot substitute for the construction of libraries of DNA fragments. Therefore, E. coli has served not only as a vehicle to amplify DNA, but also to separate many DNA molecules of similar length and the two DNA strands simultaneously. For this purpose, a bacteriophage like M 13 can be used. The various viral cis- and trunsacting functions are critical not only for strand separation, but also to separate the single-stranded DNA from the E. coli cell by an active transport mechanism through the intact cell wall. Although it may have been somewhat surprising to some how many changes in its DNA sequence the phage tolerated, manipulations of this amplification and transport system have been extended today even to the viral coat proteins for the production of epitope libraries (2). Much of the work is now more than a decade old, but experience has confirmed the usefulness of some simple biological paradigms. TechFrom Methods m Molecular Biology, Vol 23’ DNA Sequencmg Protocols Edlted by. H and A Gnffm Copyright 01993 Humana Press Inc , Totowa,
9
NJ
10
Messing
niques that were new and limiting fifteen years ago included automated oligonucleotide synthesis and the use of thermostable enzymes, which add a critical dimension to molecular biology today. Neither necessarily replaces the previous techniques, but they create greater flexibility, enormously accelerate scientific investigations, and even make certain analyses possible for the first time. However, DNA sequencing of larger contigs (several overlapping sequences that can be linked) have benefited from economizing on the synthesis of new oligonucleotides (3,4). Even in the absence of automated oligonucleotide synthesis in the earlier years, the concept of a universal primer could be developed by alternative techniques. 2. Development
of DNA Sequencing A Discussion
Techniques:
In 1974, at one of the first meetings on the use of restriction endonucleases in molecular biology some of these ideas became clear, Work on the chemical synthesis of a tRNA gene was presented, and the initial work on sequencing phage $X174 using restriction fragments as primers for the plus-minus method was discussed. At that time, oligonucieotide synthesis required a major effort and could not easily be generally applied. Restriction fragments offered an alternative. They could be used as primers for DNA synthesis in vitro and for marker-rescue experiments to link genetic and physical maps of viruses like SV40, both forerunners for DNA sequencing and sitedirected mutagenesis. There were several reasons to use @X174 as the first model in developing DNA-sequencing techniques and determining the sequence of an entire autonomous genome. First, it was one of the smallest DNA viruses; it is even smaller than M13. Second, the mature virus consists of single-stranded DNA, eliminating the need to separate the two strands of DNA for template preparation; this is even more critical if one wishes to use double-stranded restriction fragments as primers. Third, a restriction map was superimposed on the genetic map by marker-rescue experiments (5). The latter feature still serves today as a precondition for other genomes. Restriction sites were critical as signposts along the thousands of nucleotides and provided the means to dissect the double-stranded replicative form or RF of @X174 in small, but ordered pieces that permitted the DNA sequencing effort
Ml3
Cloning Vehicles
11
to proceed in a walking manner along the genome. Today the restriction map can be replaced by any DNA sequence (e.g., STS), since the synthesis of oligonucleotides is so rapid that researchers can use the DNA sequence that was just read from a sequencing gel to design and produce an oligonucleotide to extend the sequencing gel further in the 5’ direction. Therefore, the use of oligonucleotides instead of restriction fragments in such a primer walking method would have enormously accelerated the @X174 project. At the Cleveland Conference on Macromolecules in 1981, after a talk that I had given, the replacement of shotgun sequencing by such a method was suggestedby a colleague, who, as a pioneer in DNA synthesis and its automation, saw a perfect match of this emerging technology with DNA sequencing. Another expert in the chemical synthesis of DNA, Michael Smith, had more interest in the applications than improving the method, and had switched from the diester method to the phosphoramidite method (6), as well as conceiving the idea of using oligonucleotides in marker rescue (7). These innovative researchers pioneered marker rescue and established the physical and genetic map of $X174 (5). It is clear that today’s protein engineering had its roots right there with the right people at the right time, because they also recognized that oligonucleotides could eliminate the need for strand separation for DNA sequencing (8), and developed this method until it reached greater maturity (4,9). Even double-stranded DNA sequencing with universal primers became easier with the development of the pUC plasmids (IO). Despite all the advantages of choosing @X174 as a model system, Sanger’s group nearly picked a different single-stranded DNA phage, fd. In principle, E. coli has two different types of single-stranded DNA phage, represented by $X174 and fd. The first is packaged into an icosahedral head; it kills and lyses the host cell, but does not require F pili, which are receptor sites on the surface of the cell wall encoded by F factors. Its host range is restricted to E. coli C. Phage like fd can only infect male-specific E. cd, producing pili at their surface that are packaged in a filamentous coat and discharged from the cell without lysis; infected cells can continue to divide. These differences are critical, but there was another reason for choosing fd originally, The major coat protein encoded by gene VIII of the phage, a very small but very abundant protein, had been sequenced by protein sequen-
12
Messing
cing methods. Therefore it seemed obvious, particularly to those who had pioneered protein sequencing, to use protein sequenceto check the DNA sequence. The protein sequence allowed the design and synthesis of an oligonucleotide that would prime in vitro DNA synthesis within the coat protein gene. Furthermore, the derived DNA sequence had to match the protein sequence. Of course, the codon redundancy of many amino acids made it difficult to design a unique primer, and it might have been not too surprising that the approach did not lead to the correct DNA sequence (11). It turned out later that this was not caused by the choice of codons, but to a mistake in the protein sequence instead, Still, the paradigm of reverse genetics again has its roots right there. 3. Replication Systems and Ml3 In 1974, a research group at the Max Planck Institute of Biochemistry in Munich became interested in viruses, initially in E. coli more than in mammalian cells. The “Abteilung” was organized in subgroups, and one subgroup was interested in developing in vitro rephcation systems using both bacteriophage and small plasmids. This had a strong biochemical emphasis, and the researchers rapidly began to learn about single-stranded DNA replication. Eleven years earlier, Hofschneider had isolated a similar filamentous phage from the Munich sewers that he named after a series of phage with the initial M (12). Number 13 was the one that was studied most. Looking for a different research topic than plasmids or DNA replication, the possibility of combining Ml3 phage production with the in vitro DNA synthesis-based method of DNA sequencing was seen. Although this might have been obvious to those familiar with phage replication, innovative methods were needed for adaptation to DNA cloning techniques. The walking method for sequencing $X174 was the strategy used at that time, and some thought that it would be difficult to clone large fragments into Ml3 (although the author’s record was around 40 kb) and that a walking method might therefore have a limited use. Logically, the only alternative to the walking method was the use of shotgun clomng and a universal primer. The replicative form of Ml3 could be used to clone DNA fragments of a size slightly larger than necessary for single sequencing reactions, and a universal sequence near the cloning site would be used as a primer.
Ml3 Cloning Vehicles This would shift the work from preparing primers to preparing templates, which still remains more economical (13). If they were numerous, cloning was much faster than any biochemical technique, and with these thoughts in mind, a plan took shape to construct in vitro recombinants of phage Ml3 without using existing methods. One might recall that in vitro recombinants were usually based on drugresistance markers. This led to the development of plasmid vectors with unique cloning sites that were scattered ail over the plasmid genome (14). 4. Transposons Mutagenesis Both Zinder’s and Schaller’s laboratory had in mind, and actually later used, transposons to develop f 1 and fd transducing phage (15,16). However, one could predict that such a course of experiments, although useful for plasmid cloning vehicles, would be less useful for M13. It seemed plausible, and such an experiment could demonstrate that, in contrast to $X174, filamentous phage can accommodate additional DNA by extending the filamentous coat; infected cells can be treated like plasmid-containing cells. To some degree this had already been proved since one group had already described mutants of more than unit length (17). Another advantage of transposon mutagenesis was that insertion mutants would be naturally selected. This was one of the biggest obstacles from the beginning. Although plasmids and bacteriophage h were natural transducing elements, filamentous phage had never been shown to have this property, and it was not obvious whether insertion mutants would be viable. This was difficult because it was already known that amber mutants of most viral genes not only cause abortive infection, but also lead to killing of the host, something that does not happen when these mutations are suppressed. Therefore, insertion mutants that can be used as plasmids would kill the cell. It was thus predictable that insertion mutants had to be restricted to noncoding regions. Therefore, a decision was made to use a restriction enzyme that recognized at least two different sites in the intergenic region of the RF. Rather than asking whether the intergenic region contains a target site for transposons, the restriction map showed that it was possible to get at least two different insertion mutants within the intergenic region. The only difficulty in such an experiment was
14
Messing
to find conditions where the restriction enzyme would cut RF only once at any of the possible target sites, so that a population of unit length RF could be ligated to the appropriate marker DNA fragment. However, there was another reason not to use drug-resistance markers. Infected cells still divide, but very slowly. Therefore, selection takes much longer than with plasmids, but it makes it very easy to distinguish infected from noninfected cells on a bacterial lawn. A single infection grown on a bacterial lawn forms a turbid plaque. If bacterial cells are then transfected by the calcium chloride technique of Mandel and Higa (18), a transformed cell can be recognized as a plaque. Hence, no selection technique is necessary, but the ability to distinguish between wild-type M 13 and M 13 insertion mutants would remain a problem. Although it was quite plausible to think of the histochemical screen used for bacteriophage hplac by Malamy et al. (19), the la& gene would have been a large insertion. However, it turned out that, rather than using entire genes as markers, one could clone only the portion encoding the amino-terminal and the repressible control region, and provide the rest in tram by the host of the phage. This became clear when Landy et al. (20) wrote on the purification of an 800-bp Hind11 fragment from hplac capable of a-complementation in a cell-free transcription-translation system. An informal sequence of this fragment showed that it was 789-bp long and included the first 146 codons of the ZacZ gene, but it was still necessary to assemble many components and purify several restriction endonucleases. Work began after some strains and purified Zac repressor were traded; this allowed the purification of the 789-bp Hind11 fragment out of about fifty other restriction fragments by simply filtering the DNA/lac repressor complex through a nitrocellulose filter. After adding IPTG, it was possible to recover the DNA in solution. Using DNA-binding proteins for purifying and cloning promoter regions is now a well established technique, but ligating restriction fragments via blunt ends was not established at the time when this experiment was ready. As described elsewhere (21), only two transformants were obtained-one of them was saved and named M 13mp 1. Electron microscopy proved that added DNA was packaged as a filamentous phage and produced as single-stranded DNA (22,23). Now the path took a more formal shape. Not only would the histochemical screenwork by detecting a blue among colorless plaques, but
Ml3 Cloning Vehicles 1 2 3 4 5 6 ATG ACC ATG ATT AC-
7 8 TCA CTG (+ GGmAT TC CT TA AG (-
15 9 10 WlJmpl GCC GTC (+ or Viral strand) or viral strand) or complementary strand)
12 3 4 5 6 7 8 9 10 M13mp2 A CTG GCC GTC (+ or Viral ATG ACC ATG ATT ACE AAT TC EcoRI
strand)
Fig. 1. Creation of an EcoRI sue by chemical mutagenesis. By screening the nucleotides of the first ten codons of the 1ucZ gene, we found that the sequence GGATTC could erther be converted into an EcoRI site GAATTC or a BamHI sue GGATCC by a single base change. Since there already was a BumHI site in gene III, but no EcoRI site m M13mp1, and I had an ample supply of EcoRI enzyme purified myself, we decided to select the GAATTC site that also changed codon GAT for asparttc acid to AAT for asparagine (24).
it could be reversed. One uncertainty was how to introduce new restriction sites in the right region, Such a site had to be unique for M 13mp 1
and positioned not somewhere in the viral genome, but in the ZacZ region, so that insertion mutants would not give rise to a-complementation. Inspection of the sequence showed that there were not many sequences in the amino-terminal region that could be converted in a single step into a unique restriction site. Attempts to use EcoRI linkers that became available at the time to “marker rescue” them did not succeed. Without somebody to synthesize oligonucleotides that were more homologous to the Zac region, it was hopeless. As an alternative, a chemical mutagen was tried. It was known that methylated G could mispair with uracil or thymine, therefore, by methylating the single-stranded Ml3 DNA with nitrosomethylurea, a mutation could be introduced into the minus strand and the subsequent RF molecules (Fig. 1). Unfortunately, there was no good genetic selection for this procedure. It would require brute-force methods of enriching EcoRI-sensitive RF from a transformed phage library by gel electrophoresis of linear versus circular molecules. Still, it was hard to believe when this author isolated RF that was not only sensitive to EcoRI, but had exactly the predicted base change in codon 5 of the 1acZ gene (24).
16
Messing
The same mutagenesis led to two more EcoRI mutants and a mutant RF that was resistant to BumHI, a site within gene III. At that time, there was still much concern that many mutations might not be tolerated because of changes in the protein sequence or the secondary structure of RNA. Therefore, changes in the lac DNA should probably occur at a higher frequency than in the viral DNA. Furthermore, the aminoterminus of the 1acZ gene appeared to be more flexible since it was demonstrated that fusion proteins retained P-galactosidase function. On the other hand, this author did not recognize the tremendous selection power for suppressor mutations. In other words, any mutation that was introduced could potentially be compensated for by another mutation somewhere else. The primary mutant might give a low titre, but because of the growth advantage a suppressor mutant would rapidly take over. An example of such a case is M 13mp 1, Later, Dotto and Zinder (25) showed that insertion mutants at the mpl Hue111 site gave a low titre phenotype: Since M13mpl gave a normal titer, they searched for a suppressor mutation. Codon 40 in gene II of M 13mp 1 was indeed changed. 5. The Need for a Universal
Primer
Researchers continue to use chemical mutagens to get rid of restriction sites. The reason for this becomes clear when they return to the use of Ml 3 in DNA sequencing. The EcoRI site in M13mp2 allowed cloning by screening for colorless plaques. Now one could readily purify these plaques and prepare a template for sequencing the inserted DNA. Still, as with the $X174 project, a restriction fragment from the adjacent Zuc DNA needed to be purified as a primer, and such a fragment was subcloned into a plasmid for convenience (26). Such a primer fragment still needed to be denatured since it was doublestranded, and had to be cut off after the sequencing reaction to produce a shift of the 3’ end in the sequencing gel. Although such a protocol could still be improved on, a more serious obstacle arose suddenly from the concern over the biological containment of Ml 3 recombinant DNA. The NIH Recombinant Advisory Committee (RAC) thought that the conjugation proficient E. coli host strains could lead to the spread of Ml3 infection and pose a risk in using Ml3 as a cloning vehicle. On the other hand, using one of the truD or tru1 mutants reduces conjugation by a factor of 106, but they
Ml3 Cloning Vehicles
17
still were infectious to M13. Since this F factor carrying the traD mutation was wild-type lac, a histochemical screen with the mp vectors would not be possible, and one would have to return to drug-resistanttype M 13 vectors. The scientific reasoning of RAC is hard to understand. First, nobody argued against Agrobacterium tumefaciens as a plant transformation vector, although it was conjugation proficient and could easily spread in the environment. Second, F pili were never made under stress or anaerobic conditions, something that was already known as “phenocopies.” Conjugation in the human gut was in any case nearly zero. Third, Ml3 infection per se reduces conjugation by a factor of 106. It was hard to argue with so many well known scientists at that time, so a new seriesof E. coli strains (JM series) all carrying the Ml5 deletion on the F’ traD36 episome was constructed. Since it was a concern of NIH, it was necessary to summarize the status of all the strains for potential users in the NIH Bulletin (27). Although DNA sequencing of eukaryotic origin by Ml3 cloning was now possible, preparation of the primer from the plasmid was still cumbersome. It was clear that an oligonucleotide to replace the restriction fragment as a universal primer was needed. Inquiries regarding the synthesis of a universal primer by a commercial supplier revealed that the cost of such an attempt made it out of the question A further attempt and timely support produced success, not only with a universal primer but also with another application of oligonucleotides (3). In 1978 this author made another interesting observation, namely that in-frame insertions of linkers in the EcoRI site could still give a positive color reaction, One of these isolates, called M13mp5, could be used to clone both EcoRI and Hind111 fragments at the same site and with the same primer for sequencing (27). The utility of creating cloning sites on top of each other was based on the universal primer concept, but in turn caused the development of multiple cloning sites (MCS) or polylinkers that are now found in all cloning vehicles and provide many additional uses (Fig. 2). Therefore, work began on the synthesis of an oligonucleotide that could be inserted into the EcoRI site and generate restriction sites recognized by six basepair cutters like BamHI, AccI, SmaI, or Hind11 useful for cloning either bluntended fragments or fragments with sticky ends produced by four base cutters like Sau3A, TaqI, and HpaII (Fig. 3) (consistent with a DNA
18
Messing
A
Kanr
ECORI-BamHI-SalI-PstI-SalI-BamHI-EcoRI AccI AccI HincII HincII
B
EcoRI-SmaI-BamHI-SalI-PstI-Hind111 XmaI AccI HincII
Imp8 1
HindIII-PstI-SalI-BamHI-SmaI-EcoRI AccI HincII
(mpg 1 XmaI
--->3 ’ ~xxxxxxxxxxxxxxxxxxx~