PROGRESS IN
Nucleic Acid Research and Molecular Biology edited by
WALDO E. COHN
KlVlE MOLDAVE
Biology Division Oak R...
10 downloads
1618 Views
23MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
PROGRESS IN
Nucleic Acid Research and Molecular Biology edited by
WALDO E. COHN
KlVlE MOLDAVE
Biology Division Oak Ridge National Laboratory Oak Ridge, Tennessee
Department of Molecular Biology and Biochemistry University of California, lrvitae Zrvine, Californiu
Volume 54
ACADEMIC PRESS Son Diego
New York Boston
London Sydney Tokyo Toronto
This book is printed o n acid-free paper. @ Copynght 0 1996 by ACADEMIC PRESS, INC All Rights Reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher.
Academic Press, Inc.
A Division of Harcourt Brace & Company 525 B Street, Suite 1900, San Diego, California 92101-4495 United Kingdom Edition published by Academic Press Limited 24-28 Oval Road, London NW 1 7DX
International Standard Serial Number: 0079-6603 International Standard Book Number: 0-12-540054-3 PRINTED M THE UNITED STATES OF AMERICA 96 97 9 8 9 9 00 0 1 E B 9 8 7 6 5
4
3
2
1
Abbreviations and Symbols
All contributors to this Series are asked to use the terminology (abbreviations and symbols) recommended by the IUPAC-IUB Commission on Biochemical Nomenclature (CBN) and approved by IUPAC and IUB, and the Editors endeavor to assure conformity. These Recommendations have been published in many journals ( 1 , Z ) and compendia ( 3 ) ;they are therefore considered to he generally known. Those used in nucleic acid work, originally set out in section 5 of the first Recommendations ( 1 ) and subsequently revised and expanded (2, 3), are given in condensed form in the frontmatter of Volumes 9-33 ofthis series. A recent expansion of the oneletter system (5)follows. SINGLELETTERCODERECOMMENDATIONV( 5 ) Meaning
Symhol
Origin of symbol Guanosine Adenosine (ribo)Thymidine (Uridine) Cytidine
Wb
G or A T(U) or C A or C G or T(U) G or C A or T(U)
puRine pyrimidine aMino Keto Strong interaction (3 H-bonds) Weak interaction (2 H-bonds)
H B V Dc
A or C or T(U) G or T(U) or C G or C or A G or A or T(U)
not not not not
N
G or A or T(U) or C
aNy nucleoside (i.e., unspecified)
Q
Q
Queuosine (nucleoside of queuine)
R
Y M
K S
G; H follows G in the alphabet A; B follows A T (not U); V follows U C; D follows C
OModified from Proc. Natl. Acad. Sci. U . S . A . 83, 4 (1986). AW has been used for wyosine, the nucleoside of “base Y” (wye). C Dhas been used for dihydrouridine (hU or H,Urd). Enzymes
In naming enzymes, the 1984 recommendations of the IUB Commission on Biochemical Nomenclature ( 4 ) are followed as far as possible. At first mention, each enzyme is described either by its systematic name or by the equation for the reaction catalyzed or by the recommended trivial name, followed by its EC numher in parentheses. Thereafter, a trivial name may he used. Enzyme names are not to be abbreviated except when the substrate has an approved abbreviation (e.g., ATPase, but not LDH, is acceptable). ix
X
ABBREVIATIONS AND SYMBOLS
REFERENCES I . JBC 241,527 (1966);Bchem 5,1445 (1966);BJ 101, l(1966);ABB 115,1(1966),129,1(1969); and elsewhere. General. 2 . EJB 15, 203 (1970);JBC 245, 5171 (1970);J M B 55, 299 (1971);and elsewhere. 3. “Handbook of Biochemistry” (G. Fasman, ed.), 3rd ed. Chemical Rubber Co., Cleveland, Ohio, 1970, 1975, Nucleic Acids, Vols. I and 11, pp. 3-59. Nucleic acids. 4. “Enzyme Nomenclature” [Recommendations (1984)of the Nomenclature Committee of the IUB]. Academic Press, New York, 1984. 5. EJB 150, 1 (1985). Nucleic Acids (One-letter system). Abbreviations of Journal Titles Journals
Abbreviations used
Annu. Rev. Biochem. Annu. Rev. Genet. Arch. Biochem. Biophys Biochem. Biophys. Res. Commun. Biochemistry Biochem. J. Biochim. Biophys. Acta Cold Spring Harbor Cold Spring Harbor Lab Cold Spring Harbor Symp. Quant. Biol. Eur. J. Biochem. Fed. Proc. Hoppe-Seyler‘s Z. Physiol. Chem. J. Amer. Chem. SOC. J. Bacteriol. J. Biol. Chem. J. Chem. SOC. J. Mol. Biol. J. Nat. Cancer Inst. Mol. Cell. Biol. Mol. Cell. Biochem. Mol. Gen. Genet. Nature, New Biology Nucleic Acid Research Proc. Natl. Acad. Sci. U.S.A. Proc. SOC.Exp. Biol. Med. Progr. Nucl. Acid. Res. Mol. Biol.
ARB ARGen ABB BBRC Bchem BJ BBA CSH CSHLab CSHSQB EJB FP ZpChem JAC S J. Bact. JBC JCS JMB JNCI MCBiol MCBchem MGG Nature NB NARes PNAS PSEBM This Series
Some Articles Planned for Future Volumes
Minute Virus o f Mice cis-acting Sequences Required for Genome Replication and the Role of the Trans-acting Viral Proteins CAROLINE ASTELL, QINGQUAN LIU, COLINE. HARRIS,JOHNBRUNSTEIN, HITESH K. JINDALLAND PAT TAM Structure and Transcription Regulation of Nuclear Genes for the Mouse Mitochondria1 Cytochrome c Oxidase NARAYANG. AVADHANI, A. BASU,C . SUCHAROV AND N. LENKA The Large Ribosomal Subunit Stalk as a Regulatory Element of the Eukaryotic Translational Machinery JUANP.G. BALLESTAAND MICUEL REMACHA General Transcription Factors Controlling the Activity of Mammalian RNA Polymerase II JANEW. CONAWAY AND RONALD C. CONAWAY The Internal Structure o f the Ribosome BARRYS. COOPERMAN Function and Mechanism in Prokaryotic General Recombination Systems MICHAELCox
S1 Nuclease Sensitive D N A Structures Contribute to Transcriptional Regulation of the Human PDGF A-Chain ZHAO-YIWANGAND THOMASF. DEUEL Eukaryotic Nuclear RNase P: Structures and Functions JOEL R. CHAMBERLIN, ANTHONYJ. WNGUCH, EILEENP A G A N - ~ M O AND S DAVIDR. ENCELKE Biochemistry and Molecular Biology of Cobalumin Biosynthesis JORCEC. ESCALANTE-SAMERENA Intron-encoded snRNAs MAURILLE J. FOURNIER AND E. STUARTMAXWELL Mechanisms for the Selectivity of the Cell’s Proteolytic Machinery ALFRED GOLDBERG,MICHAELSHERMAN AND OLIVERCoux Structure/Function Relationships of Phosphoribulokinase and Ribulosebisphosphate Carboxylase/Oxygenase FREDC. HARTMAN AND HILLELK. BRANDES The Nature of DNA Replication Origins in Higher Eukaryotic Organisms JOEL A. HUBERMAN AND WILLIAM C . BURHANS Xi
xii
SOME ARTICLES PLANNED FOR FUTURE VOLUMES
Function and Regulatory Properties of the MEK Kinase Family GARYL. JOHNSON et al. Regulation and Function of Adenosine Deaminase in Mice
MICHAEL R. BLACKBURNAND RODNEY E.
KELLEMS
Experimental Analysis o f Global Gene Regulation in Escherichia coli
ROBERT M . BLUMENTHAL,DEBORAH w. BORSTAND ROWENA G. MATTHEWS
DNA Helicases: Roles in DNA Metabolism STEVEN
w. MATSON AND DANIELw. BEAM
Bacterial and Eukaryotic D N A Methyltransferases NORBERT0. REICH Self-glucosylating Initiator Proteins and Their Role in Glycogen Biosynthesis PETER
J. ROACH
DNA Repair AZIZ SANCAR Depletion of Nuclear Poly(ADP-ribose) Polymerase by Antisense RNA Expression: Influence on Genomic Stability, Chromatin Organization, and DNA Repair and D N A Replication CYNTHIAM. G. SIMBULAN-ROSENTHAL,DEANS. ROSENTHAL, RUCHUANG DING, JOANY JACKMAN AND
MARKE.
SMULSON
Chemical Synthesis and Structure of Small RNA Molecules MATHIASSPRINZL AND STEFAN LIMMER Transcriptional Regulation of Small Nuclear RNA Genes
WILLIAME. STUMPH Bacillus subtilis as 1 Know It NOBORUSUEOKA Effects of the Ferritin Open Reading Frame on Translational Induction by Iron
ROBERT E. THACHet al.
Structure and Function of the Human Im munodef iciency Vi rus Leader RNA BENJAMINBERKHOUT Department of Virology Academic Medical Center University of Amsterdam 1105 AZ Amsterdam, The Netherlands
I. A Structure Model for HIV-1 and HIV-2 Leader RNA The Trans-acting Responsive Hairpin . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Poly(A) Hairpin . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........ The Primer-binding Site . . . . . . . . . . . . . . . . . . . . . . . . . The RNA Dimerization Signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The RNA Packaging Signal . . . . . . ..................
11. 111. IV. V. VI.
VII. Splicing and Translation Functions ............. VIII. Base Composition of HIV-SIV Leader RNAs . . . . . . . . . . . . . . . . . . . . . IX. Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10 18 21
23 25 29 30
The retrovirus family encompasses a diverse group of viruses characterized by a replication step in which the viral RNA genome is copied into DNA by the virally encoded reverse transcriptase enzyme. Among retroviruses, the lentiviruses have the most complex genome structure and expression strategy. The primate lentiviruses include the human and simian immunodeficiency viruses. There are two types of human immunodeficiency virus, HIV-1 and HIV-2. Simian immunodeficiency viruses (SIVs) have been identified in a number of Old World monkey species: the sooty mangabey (SIVsm), mandrill (SIVmnd), African green monkey (SIVagm), Sykes monkey (SIVsyk), macaque (SIVmac), and chimpanzee (SIVcpz). In general, the primate lentiviruses can be split in five subgroups that are equally distantly related to one another ( 1 , 2). Interestingly, phylogenetic analysis of nucleicacid or amino-acid sequences strongly suggests that both HIV-1 and HIV-2 result from relatively recent simian-to-human cross-species transmissions. The HIV-1 genome is closely related to that of SIVcpz, and HIV-2 is almost identical to the SIVsm and SIVmac isolates. The three additional groups are represented by the SIVmnd, SIVagm, and SIVsyk isolates. The 5'-untranslated leader region of an HIV-SIV RNA genome encodes Progress in Nucleic Acid Research and Molecular Biology, Vol. 54
1
Copynght 0 19% by Academic Press. Inc. All nghts of reproduction ID any form reserved.
2
BENJAMIN BERKHOUT
multiple sequences important for viral replication. These sequences do not code for proteins but are the cis-acting sites of recognition by proteins and RNAs responsible for mediating several steps in the viral replication cycle. Reverse transcription of the retroviral genome, for example, is primed by a tRNA bound to an 18-nucleotide complementary region (the primer-binding site, PBS) near the 5’ end of the genome. Other leader motifs, the dimerization and packaging signals (DIS and q), are required for genome dimerization and selective encapsidation into assembling virions. Furthermore, processes such as mRNA splicing, polyadenylylation, and translation are controlled by sequence elements in the leader transcript. In addition, complex lentiviruses encode the transcriptional trans-activator protein Tat that binds to the trans-acting responsive (TAR) hairpin in the nascent leader transcript to regulate viral transcription from the long terminal repeat (LTR) promoter. This article deals with the structure and function of the leader transcript of HIV-1 and HIV-2. Most of the RNA signals encoded by the untranslated leader RNA have specific nucleotide sequences critical for recognition and function [e.g., AAUAAA in the poly(A) site], but there is accumulating evidence that their structural context can also be important (e.g., the TAR hairpin motif). There has been an intense effort to analyze the secondary structure of retroviral leader RNAs using a variety of methods (biochemical analysis, free-energy minimization, sequence comparison, mutant analysis). The phylogenetic approach can be extremely helphl, given the large number of sequenced HIV-1 and HIV-2 isolates and the growing number of more distantly related sequences of members of the simian immunodeficiency viruses. In this article, which is not intended to be encyclopedic, we focus primarily on relationships between the structure of specific leader RNA motifs and their function in the retroviral life cycle. A secondary structure model for the HIV-1 and HIV-2 leader is presented in Section I, and the individual motifs and their regulatory role in virus replication are discussed in Sections 11-VIII.
1. A Structure Model for HIV-1 and HIV-2 leader RNA We and others have published RNA secondary structure models for several domains of the HIV-1 and HIV-2 leaders based on a variety of techniques (3-11). These models are generally similar, but there are some significant variations in some regions of the leader RNA. The leader RNA structure models presented in Figs. 1and 2 are based on published data (see
RNA STRUCTURE AND RETROVIHAL REPLICATION
3
later discussions dealing with the individual structure motifs). In case of conflicting data, we performed extensive phylogenetic analysis of the particular RNA region in all HIV-SIV virus groups in order to reveal structural similarities. We will consider a double-helical element as definitely existing only when it is supported b y sufficient comparative data. Specifically, a putative helix is considered to exist when (1)base-pairing covariance can be demonstrated (e.g., G C changed to A*U) or (2) a similar structure can be folded for other HIV-SIV viruses. Although the comparative evidence is convincing for several hairpins [e.g., TAR, poly(A), DIS], it is obvious that these RNA structure models are by no means final. Secondary structure models for the complete leader RNA of HIV-1 and HIV-2 are presented in Figs. 1 and 2, respectively. Each hairpin motif is identified by a name that refers to its (putative) function in viral replication. This region contains several important molecular signals, the 5 ’ end folds the characteristic TAR hairpin structure with either one extended stem region (HIV-1) or a more complex, branched structure (HIV-2), and this motif forms the binding site for the Tat trans-activator protein. Further downstream is the poly(A) hairpin, which invariably presents the AAUAAA hexamer involved in polyadenylylation in the single-stranded loop region. The initiation site of reverse transcription (the primer-binding site, PBS) is part of an extended structure that may be involved in the annealing of the primer tRNA molecule. The larger PBS structure is divided into three subdomains: the top part consisting of a stem-loop, the relatively unstructured central domain containing the tRNA primer-binding site, and the bottom part consisting of an extended stem region with several irregularities. The DIS hairpin is critical for initiation of genome dimerization, but additional dimerization signals as well as the encapsidation signal (P)are believed to be located downstream of the major splice donor (SD). [Throughout this manuscript we refer to these RNA structure models and discuss the different motifs arbitrarily from the 5‘ (TAR) to the 3’ (P)end of the leader.] It is realized that dealing with individual hairpins may be a gross oversimplification, because there may be structural or functional interactions between the different RNA modules. It is likely that the three-dimensional structure is not a collection of hairpin structures connected by singlestranded regions. For instance, the RNA stretches between the stem-loop structures may form long-distance interactions that contract the molecule into a more rigid structure, but such tertiary interactions are not indicated simply because they have not yet been studied. Some stem regions are connected by a few or no nucleotides [in particular, the TAR and poly(A) hairpins], raising the possibility of coaxial stacking of the neighboring stems as in the structure of transfer RNA. Furthermore, some RNA domains may
TAR %,
4
G G U G C A C-G G-C A-U UG-C
U 'A-U G-C A-U C-G C-G A G-C A-U U-A U-A G-C G-U U-A C-G U-G C-G U-A C
-
ccCUC U
AGA
A
I l l 1 1 A
~
~
~
A
AAU-A A 130-G-C G-C U-A-220 C-G U-A C-0 A GAGCu
C-G U-A C-G
U C-G 0-C A- U A-U
U-A U-A C-G 0-U U-A C-G 11-A A-U E Z g - 2 c - G ~
+I
80 I
G
\ G 289-0 A A 0-C 11-G . C-G G-C 250-U-A
G U -0- 240 C-G U
G
U
G
C
'72.:=: C C
AUG
Y
SD
DIS
I I I
U
..
~
GcucUc
U-A G-C U-G G-C U-A
-97
~
U
C
G
U G U-A C-G A-U
G A G G C-G 315 - 0 - C
A G-C C-G G-C ,@ ;F-;-210 G-C' G AGGCGAGGGGA AAAAAUUUUGA
A G G A C-G 0-C U-G G-U G -C A G
AAGGAGAGA
AAG
FIG.1. Secondary RNA structure model for the HIV-1 leader RNA (LA1 isolate). The 5' end of viral RNA (position +1) has a cap (m'G). The transcriptional start site at + 1 marks the border between the upstream (untranscribed) U 3 region and R region of the LTR. The R region (position 1-97) is the short repeat at each end of the genome (see Fig. 4). The U5 region (98-181) is encoded by the LTR, but unique for the 5' end of HIV-1 transcripts. The leader RNA ends at the AUG initiation codon of the gag open reading frame at position 336. All hairpin motifs have been named after their (putative) function in HIV-1 replication and/or after the sequence elements encoded by them. Several direct repeats in the HIV-1 leader sequence are discussed in the text. These include an 8-mer within the TAR region (CUCUCUGG, positions 4-11 and 36-43), a 10-mer in the TAR and the PBS regions (GGAGCUCUCU, positions 32-41 and 223-232), and a 7-mer in the region encompassing the DIS and SD hairpins (GAGGCGA, positions 270-276 and 280-286). Several important sequence motifs are indicated by shaded boxes (AAUAAA hexamer involved in polyadenylylation, the 18-nucleotide PBS site, the GCGCGC palindrome in the loop of the DIS hairpin, the gag AUG start codon). The cleavage site within the major splice donor is marked by an arrow.
FIG.2. Secondary RNA structure model for the HIV-2 leader RNA (ROD isolate). The transcriptional start site at f l marks the border between the upstream (untranscrihed) U3 region and R region of the LTR. The R region (position 1-173) is repeated in the 3’ end of all HIV-2 transcripts. The U5 region (174-302) is encoded by the LTR, but unique for the 5’ end of HIV-1 transcripts. The leader RNA ends at the AUG initiation codon ofthe gag open reading frame at position 545. All hairpin motifs have been named after their (putative) function in HIV-1 replication and/or after the sequence elements encoded by them. Several important sequence motifs are indicated by boxes (AAUAAA hexainer involved in polyadenylylation, the 18-nucleotide PBS site, the GGUACC palindrome in the loop of the DIS hairpin, the gag AUG start codon). The site of cleavage within the major splice donor is marked by an arrow. [Part of the drawing (region 1-390) is after the model presented in Ref. 8, reproduced by permission of Oxford University Press.]
6
BENJAMIN BEKKHOUT
maintain a level of plasticity by being in an equilibrium between two structures, and such RNA conformational transitions can provide unique regulatory possibilities. We have resisted the temptation to maximize the number of base pairs in our models. Many helices in Figs. 1 and 2 could be extended by a few base pairs on introduction of bulging bases and other destabilizing elements. In the absence of comparative evidence, we preferred to show these segments as single-stranded. In general, there are several indications that extremely stable RNA structures are avoided in the HIV leader region. First, the stability of some stem regions was found to be fine-tuned, with a clear restriction to fold into excessively stable structures [e.g., the poly(A) hairpin; see Section III]. Second, a notable feature that seems to hold for the complete HIV leader, and in particular the TAR and poly(A) stems, is the frequent occurrence of unpaired, bulged single residues within helical regions. A preference for bulged A residues is observed, as has been reported for other RNA molecules (12). Although bulges can form specific recognition sites for proteins (e.g., Tat protein binds the 3-nucleotide TAR bulge; see Section 11), the role of bulges may also be to preclude the formation of excessively stable stem regions that may interfere with replication functions of the viral RNA. In particular, stable hairpins can interfere with the 5’ + 3’ scanning movement of ribosomes (13-15) or the 3’ -+ 5’ movement of an elongating reverse transcriptase enzyme (16, 17).
II. The Trans-acting Responsive Hairpin The role of TAR RNA in regulating HIV-1 gene expression has been extensively investigated by both in vitro transcription and transient transfection analyses. The role of TAR in Tat-mediated activation of viral transcription from the long-terminal-repeat (LTR) promoter has been discussed extensively in recent reviews (18-20), and we will therefore only summarize some important aspects of the TAR structure and function. Several features within TAR are critical for function, including the stem region, the 3-nucleotide bulge, and the 6-nucleotide loop. The viral transactivator protein Tat binds to the bulge domain of TAR RNA as part of nascently transcribed HIV-1 transcripts and activates the transcription machinery from this “RNA-enhancer” binding site (15, 21). Multiple observations suggest that cellular proteins, which also interact with TAR, are involved in Tat-mediated trans-activation. For instance, the fact that mutations in the loop sequences do not inhibit Tat binding to TAR RNA and yet greatly reduce trans-activation suggests that cellular factors that recognize the TAR loop are important. Indeed, several cellular TAR RNA-binding proteins have
RNA STRUCTURE AND RETROVIRAL REPLICATION
7
been cloned andlor purified with binding specificities for either the loop, the stem, or the bulge (22-27), although the precise role of these factors during Tat-dependent transcription remains unclear. Other cellular proteins bind to the TAR DNA sequences as part of the LTR transcriptional promoter (28-31). More recently, the role of the TAR RNA motif was analyzed using mutant HIV-1 viruses in tissue culture infections; in particular, the analysis of spontaneous revertant viruses did further define the critical TAR sequences and structural features (31a-34). A comparative analysis of TAR RNA structures in all human and simian immunodeficiency viruses reveals a conservation of certain structure features, despite significant divergence in both nucleotide sequence and length of the different TAR regions (Fig. 1 for HIV-1, Fig. 2 for HIV-2, and Fig. 3 for SIVagm, SIVmnd, and SIVsyk). In particular, we found a striking structural resemblance between the TAR elements of SIVmnd, SIVsyk, and HIV-2. Furthermore, the TAR structure of SIV-agm is intermediate in complexity compared to the single-stem TAR structure in HIV-1 and the duplex TAR structure in HIV-2, SIVmnd, and SIVsyk. Clearly, sequence and structure elements are conserved in the upper parts of both the single and duplex hairpins. This domain consists of a helix with a 2- to 4-nucleotide U-rich bulge, a 6-nucleotide GGG- or GAG-containing loop, and 4 or 5 base pairs in between. The degree of structure variation in this region of the genome of the different HIV-SIV viruses may suggest that a common ancestral virus did diverge a long time ago, but an uncertain factor in these calculations is the in vivo mutation rate of this group of viruses. In addition, we previously suggested an unusual RNA recombination event for this repeat (R) region of the HIV-1 genome. An elongating reverse transcriptase enzyme can prematurely transfer from 5' to 3' TAR-repeat sequences during minus-strand strong-stop cDNA translocation (17). According to this mechanism, a simple TAR structure can convert in a one-step reaction into a complex hairpin and vice versa. Retroviral RNA genomes are terminally redundant and both the TAR and poly(A) hairpin motifs are contained within each repeat region (Fig. 4). It is generally assumed that TAR is functional as a Tat-binding site in the 5'-R region, whereas the poly(A) signal is hnctional only in the 3' context. The latter hypothesis is reasonable, because synthesis of a full-length transcript requires bypass of the polyadenylylation signal in the leader, but use of this site in the 3' R. The mechanisms proposed to govern this differential poly(A) site use and the role of RNA structure are discussed in Section 111. As part of the 5'-R leader region, however, the poly(A) hairpin structure may perform a different function. Similarly, although most studies have focused on TAR in the context of a 5'-LTR promoter, integrated proviruses can activate transcription from the 3' TAR-LTR enhancer-promoter, thus expressing down-
U
G G
c
C-G G-U A-U G-C
U
G A
C-G G-C A-U C-G C-G A-U G-C G-C A-U
A-U U -A U-A C-G U C-G U-A G-C +1--A-u.
G U G
u
A G
C A / G C U / A
cu/
u
A/ U G
uu A
U A
G
A-U G -C A C-G A-U U-A C-G A-U U-G C-G U-A c U-A G-C A-U
A CUG CUCGGGG I l 1 1 1 1 1 A G A G C U CCA c U
SIV agm
G-C
+I-G-C..
A
G-c
,.
....
U-GA C-G G-C A-U U-A C-G
...
SIV mnd
U-A U-G G-C A-U G-C cG - U +'--G-C.
....
SIV syk
FIG. 3. TAR RNA secondary structure models for the SIVagm, SIVmnd, and SIVsyk virus groups. These hairpin structures form the extreme 5' end of the viral genomes and the transcriptional start site is marked as +I. Representative TAR structures of the HIV-1 and HIV-2 groups are shown in Figs. 1 and 2, respectively. (Part of this figure was reproduced from Ref. 5, reproduced by permission of Oxford University Press.)
9
RNA STRUCTURE AND RETROVIRAL REPLICATION
3 TAR UG G G C
C-G GC ’ A-U GC ’ U
A
C U A A ;G ’ A’ CG ’ CG ’ A G-C A “’ U-A A ‘“ G-C G-“ UA ’ C-; UCG ’ U-R
G G U G C A C-G G- C A- U UG - C
C-
‘AU G- C A-U C-G C-G A
G--C . .
3’ polyA
A-U U-A C U U-A U G- C G G- U C U-A C C-G -U U-G -U C-G C-G U-A U-A C C-G U-A U C- G G-C G- C (+I)- G- C G-C (+97) A- u I A-U cU -A C A (A)n CACUGCUU
FIG. 4. RNA secondary structure model for the repeat region at the 5‘ and 3’ ends of HIV-1 transcripts. Shown are both ends of a mature, polyadenylylated HIV-1 transcript. Nucleotide positions in the two repeats are numbered in an identical manner with respect to the transcriptional start site (+1) in the 5’ R. The 5’-TAR and 5‘-poly(A) hairpins are connected without a single nucleotide between the two stems, raising the possibility ofcoaxial stacking (see Section I). Polyadenylylation in the 3’ R occurs 19 nucleotides downstream of the 3‘ AAUAAA hexamer between positions 97 and 98. This process truncates the poly(A) hairpin and allows for extension of the 3‘-TAR hairpin with two base pairs (using the U-G-dinucleotide encoded by the upstream U3 region). This rearrangement results in two stems separated by eight singlestranded nucleotides.
stream cellular sequences (35).Furthermore, there is no obvious reason why the 3‘-TAR motif cannot play additional roles in the viral replication cycle. A simple possibility is that the TAR and poly(A) stems confer protection against cellular exonucleases. In that case, a hairpin-binding protein may also be required because the presence of a hairpin near the 3‘ end of an RNA is not by itself sufficient for a longer lifetime of the RNA in vivo (36). The tendency of viral RNA stem-loop structures not to become excessively stable is discussed in Section I. This restriction may apply in particular for the sequences within the repeat region of the leader, the TAR and poly(A) hairpins. Stable base-pairing interactions in the 3’ R can interfere with one of the initial steps in the reverse transcription process, that is, transfer of the minus-strand strong-stop cDNA from the 5’-R template to the 3’-R acceptor.
10
BENJAMIN BERKHOUT
Reverse transcription may be aborted at this step when the 3‘-R domain is occluded in stable base pairing (37). Alternatively, the R-region hairpins could be actively involved in the molecular strand-transfer mechanism. This seems unlikely for the 3’-R structures because deletion of the 30-97 region, which removes half of the TAR sequences and the complete poly(A) hairpin, did not significantly reduce the production of infectious virus (37). These results do not rule out a role for the 5‘-R hairpins during strand transfer. Although the 5’- and 3’-TAR elements are identical in sequence, they differ in flanking sequences and this may differentially affect their structure. In particular, unique base-pair interactions may exist between 3’ TAR and the upstream U3 region and between 5‘ TAR and the downstream leader-gag sequences (Fig. 4). The single-stranded TAR domains (Snucleotide bulge and 6-nucleotide loop) may also be involved in tertiary base-pair interactions with flanking sequences. In fact, two studies have proposed an interaction between the G-rich loop of 5’ TAR and downstream leader sequences (8,38). The first study proposed an interaction between the second TAR loop in HIV-2 and complementary sequences between the poly(A) and PBS structures (Fig. 2; 71UGGG74and 189CCC,,,, respectively). Consistent with this proposal, both regions were surprisingly insensitive toward various singlestrand-specific reagents (8). This was particularly striking given the high reactivity of the first TAR loop, which contains very similar sequences. The second study identified a small hairpin motif within the HIV-1 gag gene with a 6-nucleotide loop (UCCCAG) that is the perfect complement of the TAR loop (CUGGGA) (38). Because both pseudoknot-like interactions of the TAR loop use sequences that are unique to the 5’ end of HIV transcripts, they are expected to influence specifically the 5’-TAR structure. The 5’- and 3’-TAR elements of HIV-1 were reported to be structurally similar based on RNase T1 accessibility of the G-rich loops, but the 5’-TAR sequence used in this study did not include the proposed gag interaction site (39).There is no additional evidence for these tertiary interactions based on phylogenetic sequence analysis. Covariations were not observed because the sequence elements involved are conserved among virus isolates (38, and data not shown). Clearly, more studies will be required to characterize both these tertiary interactions and, ultimately, their mechanism of action during virus replication.
111. The Poly(A) Hairpin Like most eukaryotic mRNAs, the HIV-1 transcript is post-transcriptionally processed at its 3’ end by cleavage and polyadenylylation (reviewed in 40,41).A poly(A) signal is present in the leader transcript, which apparently
RNA STRUCTURE AND RETROVIRAL REPLICATION
11
does not function in this context. This sequence is part of the repeat element that is reiterated at the 3'end of the viral transcript, where it functions as an efficient polyadenylylation signal (Fig. 4). Most processing signals characterized are composed of at least two elements, the AAUAAA hexamer, which resides 10-30 nucleotides upstream of the cleavage site, and an amorphous U- or GU-rich element, which is located immediately downstream of the cleavage site. Both the hexamer and an extended GU cluster are present in the genomes of HIV-SIV viruses. For instance, the prototype HIV-1 isolate LA1 contains three GU-rich motifs immediately downstream of the cleavage 162GUGUG166),of site (Fig. 1; lo2GUGUGUG,,,, ,,,UGUUGUGUG,,,, which the most upstream motif was demonstrated to facilitate efficient polyadenylylation (42). More recently, additional regulatory elements upstream of AAUAAA were identified in a variety of viral poly(A) sequences (SV40, adenovirus, cauliflower mosaic virus, hepatitis B virus; reviewed in 43). These enhancer elements function in an orientation- and position-dependent manner and are usually U-rich, but exhibit little sequence similarity. An upstream enhancer element has also been suggested for the HIV-1 poly(A) site, which is of considerable interest with respect to its putative role in the selective activation of the 3'-poly(A) site (see below). Differential regulation of polyadenylylation has been studied for the animal retroviruses (44, 45), HIV-1 (46-52), and the human T cell leukemia virus type I (HTLV-I) (53, 54). The HTLV-I poly(A) signal is unique in that the AAUAAA hexamer is widely separated from the actual site of cleavage-poly(A) addition, but these two positions are juxtaposed by the folding of an extended RNA structure (53, 54).This example underscores the general notion that replicative functions of the leader may depend on its higher order RNA structure. Several models have been proposed to explain daerential poly(A)-site usage of HIV RNA. First, the poly(A) signal may merely be inefficient, such that a certain percentage of transcripts will read through. Clearly, no specific regulation of polyadenylylation is required in this mechanism. According to the second model, the 5'-poly(A) site is sufficient for processing, but the close proximity to the mRNA start site (cap site) results in suppression (47, 50). Perhaps a leader RNA of sufficient length is required for binding of proteins involved in polyadenylylation. Like the first model, suppression by cap-site proximity predicts that no regulatory sequence motifs play a role in differential polyadenylylation. The third model uses an upstream activator sequence, present only in the 3'-poly(A) context, to activate the downstream poly(A) site specifically. Indeed, sequences that increase polyadenylylation efficiency have been identified in the U 3 region upstream of the 3'-poly(A) site (46-49, 51, 52). The USenhancer model has been further modified to include the TAR
12
BENJAMIN BERKHOUT
RNA stem-loop structure in order to juxtapose spatially the upstream enhancer and the core poly(A) site (51).This molecular mechanism is reminiscent of the RNA structure that bisects the HTLV-I poly(A) site (53, 54). That the TAR stem can act as spacer was confirmed in artificial poly(A) constructs with the hairpin inserted between the AAUAAA motifand downstream GUrich box (55).Interestingly, we reported a severe replication defect for HIV-1 mutants with an opened lower part of the TAR stem region, although such TAR mutants are fully active in Tat-mediated trans-activation assays (32). Virus escape mutants did repair the TAR stem, without restoration of the actual base sequence. These observations are consistent with a spacer role for the full-length TAR stem in polyadenylylation. As an alternative for the US-enhancer model for activation of the 3‘-poly(A) site, it is theoretically possible to inactivate the 5’-poly(A) site specifically by silencer elements located in the leader sequences that are not in proximity to the 3’-poly(A) site. Silencing may occur either through binding of proteins that interfere with recognition and/or use of the upstream poly(A) site, or through RNA structural rearrangements that inactivate the upstream site. For instance, van Gelder et al. (56) convincingly demonstrated that binding of two molecules of the U1A protein to the poly(A) region of its own pre-mRNA interferes with polyadenylylation, presumably through blocking the access of polyadenylylation factors. In fact, specific silencing of the 5’-poly(A) site in HIV-1 was proposed to be mediated by the viral Tat protein, which binds the flanking 5’-TAR hairpin as part of the nascent transcript (57).However, this Tat-induced shift from the upstream to the downstream poly(A) site is probably caused by an effect of Tat protein on the processivity of RNA polymerase, an effect that is particularly pronounced in transfection systems with replicating plasmids (58, 59). Except for the role of the TAR hairpin as spacer between the AAUAAA motif and the upstream enhancer, is there any additional role for leader RNA structure in the regulated polyadenylylation of HIV-1 transcripts? We performed a comparative sequence analysis of this part of the RNA genome of different groups of immunodeficiency viruses, including human types 1 and 2 and simian types mandrill, african green monkey, and sykes (11). This analysis revealed the conservation of a hairpin motif despite the divergence in sequence (Fig. 5). In all cases, the AAUAAA signal is flanked by nucleotide segments that can base pair, thus forming a hairpin structure with the poly(A) signal in the single-stranded loop. The thermodynamic stability of this “poly(A) hairpin” was also well conserved, suggesting a biological function for this structure motif. Consistent with this idea, it was shown that FIG.5. Phylogenetic comparison of poly(A)-RNA hairpin structures in different HIV-SIV isolates. The poly(A) signals are denoted by grey boxes. (This figure is adapted from Ref. 11.)
4
GC
UU C U C C
G C C A G
-uc
U-A U-A C -G G-U U-AU U-A C-G G-U C-G A-U C -G
-U C-G U-A C-G C-GU G-C A-U A-U U - AC U-A C-G G-U U-A C-G A- U C- G
HIV-2 ROD
HIV-1 LA1
AAAAU-AGA~U U-A C-G
G-c
D
%u
CUG
G-C A-U U-A A-U U-A A-Uu G-C U-G U-A C-G G-U G- C U- A U- A C- G
SIV-agm AA
GU
G A UA
- AGAus
C-G A-U C- G
C-G G-C U-A U-A C-G G-U U-G U-A C-G G-U U-A U-A
SIV-cpz
SIV-mnd
u11-
HIV-1 ANT70
n
CU G C
U
A
-
C-G G-U C-G A A A G U- A U- A C-G G-C U-A C-G G- U C- G
C
U U C U
@yU U U G
GC U U
*&&
U-A C-G C-GU G-UA U A-U A- U U-G A C- G G-C
u-A
8
CUGCUC
&!$5$f
u AA U
C&&&%%G Bc C
C-G G-C A- U A- U A-UA G-C A-U U-G U-A C-G G - I1 G- c
U- A U-A C-G
SIV-agm TYO
uA
A .u' U- A C-G G- C U- A A- Uc A- U G- C U- G U- A C- G G- U G-C C- G A- U C- G
SIV-agm AB
A-u U-A A-U U-A CA - U 1l-G' U-A C-G G-U G--C 11-A U-A C-G
SIV-agm 677
U-A C -G G-U U-A A - 1', A--U G-C U-G U-A C-G G-U G-C C-G A-U C-G
SIV-agm 691
C-G
G-U C-G A-U 1' 1 - G
c-G
G-U U-A U-A U-A_ C-Gb G -C 11-A
-
U-A C-G G-U U-A U-A G-C C-G
..
SIV-Syk
14
BENJAMIN BERKHOUT
stabilization or destabilization of the stem region does severely inhibit the replication potential of the HIV-1 virus (11).Fine-tuning of the stability of this hairpin may be essential either for polyadenylylation or any other biological reaction in which the HIV-1 leader RNA participates. In general, the tendency not to produce perfect stems may reflect the biological role of this RNA molecule as template in reverse transcription and translation (see Section I). Recently, it was reported that HIV-1 polyadenylylation depends on the US-enhancer because the AAUAAA motif is flanked by a suboptimal sequence context (43). Inspection of the nucleotide sequence of the different HIV-SIV poly(A) hairpins demonstrates a remarkable clustering of CUU triplets flanked by purines (Fig. 5). All viral species contain multiple copies of this RCUUR motif, which may form the binding site for repressor-like proteins. The inhibitory sequences that were identified both 5' and 3' of the hexamer are in fact the sequences that constitute the stem of the poly(A) hairpin, suggesting that it may be the RNA structure that blocks access of the poly(A) site to the processing enzymes. Binding of certain proteins to this RNA structure may also occlude the poly(A) sequences for recognition by the proteins involved in mRNA polyadenylylation. A role for RNA secondary structure in mRNA 3'-end processing was suggested previously for some other mRNA species (60, 61). However, a recent mutational analysis of the adenovirus-2 L4 poly(A) site could not confirm an effect of RNA structure on the efficiency of polyadenylylation (62). Of note, histone mRNAs, the only mRNAs that lack a poly(A) tail, are processed at their 3' end by enzymes that recognize a stem-loop structure (63). The limited mutational data in the HIV-1 system strongly suggest that the RNA hairpin motif encompassing the poly(A) signal plays an important role in the HIV-1 replication cycle (11). Whether the hairpin is actually involved in regulation of HIV-1 polyadenylylation or is active at some other point in the retroviral life cycle awaits further study. We note that a rather different folding scheme for the region encompassing the poly(A) signal has been proposed by others (7). In our model, palindromic sequence elements flanking the poly(A)site are used to form a local stem-loop structure (11).The alternative model, however, proposes that this segment is involved in a long-distance base-pairing interaction with sequences approximately 165 nucleotides further downstream on the linear map (7). The phylogenetic analysis strongly supports the relatively simple hairpin model (8, 11),whereas helices of the alternative structure model are not validated by covariations. Furthermore, the biochemical probing data in support of the latter model are not necessarily inconsistent with the hairpin model. For instance, high reactivity of the AAAG sequence overlapping the
HNA STRUCTUHE AND RETROVIRAL REPLICATION
15
poly(A) signal was reported (7),which is fully compatible with the hairpin structure.
IV. The Primer-binding Site Reverse transcription of the viral KNA genome is mediated by the virionassociated reverse transcriptase (RT) enzyme in combination with a cellular tRNA molecule as primer, its identity depending on the particular retroviral species. In each case, 16-19 nucleotides at the 3’-CCA end of the tRNA are the exact complement of the primer-binding site located in the leader RNA of the corresponding virus genome. The HIV-SIV group of viruses uses tRNALyb-3 as primer (64-66). A priori, incorporation of the proper tRNA primer into the virion can be mediated by specific interaction with a viral protein, most likely the RT enzyme, or through base pairing with the PBS site on the viral RNA genome. Several in vitro studies reported a specific interaction of tRNALys-3 with the HIV-1 RT protein (67-69), although nonspecific binding has been reported by others (70-72). Details of this RNA-protein interaction are currently being analyzed. Chemical cross-linking data suggest that the anticodon region of tRNALyS-3 is in close proximity to the protein (67), but this tRNA domain does not provide the determinants for binding specificity (73, 74). Further, a specific tRNA-binding site within the C-terminal portion of the p66 subunit of RT has been proposed (75). Accumulating evidence that the RT domain of the precursor Gag-Pol protein is both required and sufficient for tRNA encapsidation comes from studies with infectious virus (76-79). Virion particles lacking the RT enzyme contain reduced levels of the tRNA primer, whereas normal tRNA levels are present in virions lacking a viral RNA genome. Consistent with the idea that the HIV-1 RT enzyme is dedicated to the tRNALys-3 molecule as primer for reverse transcription, viruses that are mutated in the PBS site, such that other tRNA primers should be used, are severely replication-defective (76, 78). We stipulate that the situation may be different for the murine retroviruses. Although the RT protein is also important for tRNA encapsidation in this case (80, Sl), there is some evidence that these viruses can efficiently initiate reverse transcription with primers other than the natural tRNAPromolecule (82, 83). One striking feature of the HIV RNA structure models is the predominant single-strand character of the PBS site (Figs. 1and 2). Only four (HIV-1) or three (HIV-2) base pairs of the PBS top stem need to open up for optimal base pairing with the 3’ end of the tRNA primer. Intriguingly, several re-
16
BENJAMIN BEFXHOUT
ports propose additional base-pairing contacts between the tRNA primer and the HIV RNA genome (8, 71, 84-86). For instance, Fig. 6 shows several potential base-pairing interactions between tRNALys-3and the HIV-2 genomic RNA (8). As discussed, the PBS sequence is not the major player in selective tRNALys-3 encapsidation, and these additional base pairs are therefore not expected to be involved in selective tRNA packaging. The additional contacts may also facilitate tRNA-PBS annealing through destabilization and melting of part of the tRNA cloverleaf structure. Alternatively, the RT protein may be actively involved in opening of the tRNA stems (see below). The additive tRNA-vRNA interactions can also trigger the formation of a higher order RNA structure that is specifically recognized and primed by the RT protein, as originally proposed for the Rous sarcoma retrovirus (87-87c). Finally, it cannot be excluded that the multidomain VRNA-tRNA interactions play a role in the maturation of genomic vRNA dimers (see Section V). We performed an extensive comparative sequence analysis of the various proposed interactions. However, because of the absolute conservation of the tRNALys-3sequence and the relative invariance of several of the viral RNA sequences involved, these interactions are difficult to demonstrate by comparative evidence. Specific HIV mutants can now be designed to test the contribution of additional contacts with the tRNA primer, In this respect, we note the presence of important DNA sequence motifs in this region of the HIV-1 genome that are involved in the integration of the reverse transcribed retroviral genome into the chromosomal DNA. This step requires the interaction of the virus-encoded integrase protein (In) with sequences located at the ends of viral DNA, the so-called att sites of the U3 region in the 5’ LTR and the U5 region in the 3’ LTR. In particular, changes in the highly conserved C-A dinucleotide near the 3’ end of U5 have a dramatic effect both on virus infectivity and on in uitro processing by the HIV-1 In protein (88-90). The conserved C-A sequence is, however, not sufficient for integration in uitro (90)and virus replication in tissue culture (91). The latter study suggested that U5 sequences critical for integration are situated within the terminal 16 nucleotides of U5 (91). Thus, the design of mutations in this leader region with specific effects (e.g., on RNA structure) is complicated by the presence of overlapping DNA sequence motifs. Based on RNase footprinting assays, it has been proposed that binding of the RT protein to the tRNALyS-3reverse transcription primer results in an opening of the acceptor stem (68). Recently, we demonstrated annealing of an oligonucleotide mimicking the PBS sequence to tRNALyS-3as part of the RT-tRNA complex, but not with the free tRNA molecule (74).These results suggest that RT opens the acceptor stem to allow intermolecular base pairing with the PBS site. Besides the RT protein, other viral factors may be in-
17
RNA STRUCTURE AND RETROVIRAL REPLICATION
A. C. C. G. pG.C. C-G. C.G. C.G. G.C.
C G
.
A. U D G A S S S S A S
S
?L!??
G D A
G A G C
A
U-GG G-C G-U C-G U-A U-G A-U C-G GU-A G U-A C-G C-G G-cA
C-G A-u G'C C A ' y A A
A U
A
A9 A US
u A
'A
A
C-GG C-G G-C C-G U-G G-C
G A
U A
U-A C-G C-G
AUCUUCU-AACAAAC
t R NA'YsS3
HIV-2 PBS
FIG. 6. Potential base-pairing interactions between the HIV-2 RNA genome and the tRNALyS--3primer. The 18-bp interaction of the 3' end of the tRNA primer with the PBS site is indicated by circles. The putative additional interactions add 8 hp (squares) and 6 bp (triangles and stars). All these additional interactions are hypothetical and should be tested experimentally (by biochemical probing or the analysis of viruses with specific mutations in these domains). (This figure is reproduced from Ref. 8, by permission of Oxford University Press, with some modifications in the number of additional base-pairing contacts.) Based on biochemical probing experiments, a detailed model was recently proposed for the interaction of tRNALvs-3with the HIV-1 genome (85,86).
volved in tRNA annealing andlor initiation of reverse transcription. In particular, nucleocapsid (NC) protein promotes annealing of the primer tRNA to viral RNA (92, 92a), and this activity may largely be due to the ability of the NC protein to destabilize secondary structure (93). In fact, the latter
18
BENJAMIN BERKHOUT
study demonstrated a general property of NC to lower the kinetic barrier for double-strand to single-strand transitions of both DNA and RNA templates. Recent experiments indicate a potential role for the accessory H I V 1 proteins Vif (94) and Nef (95, 96) in the reverse transcription process. Both proteins act in the virus-producing cells to allow the generation of virions that are h l l y competent for efficient reverse transcription of the RNA genome on entering a host cell. However, the mechanisms of these effects remain unknown, and the cellular targets for these viral proteins remain to be identified. These viral proteins could directly affect the reverse transcription process. Alternatively, these viral proteins could also be involved in the assembly of new virions or the processing of internalized virus (uncoating, incorporation of nucleotides, etc.), and such an effect of Vifon virion maturation has been reported (94, 97, 98). Some peculiar features of the PBS region are apparent from inspection of the RNA structures (Figs. 1 and 2). First, we note the presence of perfect repeat of a IO-nucleotide sequence downstream of the PBS site and in the TAR element (Fig. 1; GGAGCUCUCU at positions 32-41 and 223-232). Second, a remarkable exclusion of A nucleotides is observed on the left side of the base pairs that constitute the PBS bottom stem (HIV-1, position 112153; HIV-2, position 197-226). Comparing the two structure models in Figs. 1and 2, it is obvious that the HIV-2 genome is more extended in this domain than the HIV-1 RNA, a situation very similar to that observed for the TAR and poly(A) structures (Sections I1 and 111).
V. The RNA Dimerization Signal The genome of all retroviruses consists of two identical full-length RNA transcripts noncovalently associated near their 5' ends in a region called the dimer linkage structure (DLS). A hairpin motif involved in the initiation of dimerization was recently described for HIV-1 (the dimerization initiation signal, DIS). Dimerization is generally considered to play an important role in the preferential encapsidation of viral genomes within the budding virus particle and in the process of reverse transcription. In particular, the presence of a diploid genome has been suggested to enhance genetic recombination, which may increase the rate of retroviral evolution. Furthermore, a dimeric genome allows the viral RT enzyme to bypass occasional breaks in one of the RNA genomes (99). The mechanism of retroviral genome dimerization is currently unclear, but several models have been proposed based on in uitro studies with purified RNA segments (loo),and several attempts have been made to map the HIV RNA region responsible for dimerization (10, l l a , 102-111). Some
RNA STRUCTURE AND RETROVIRAL REPLICATION
19
reports have described a crucial role for a trans-acting factor, the viral nucleocapsid protein NCp7 in the formation of RNA dimers (101,103), but spontaneous RNA dimerization is possible in the absence of any viral or cellular protein. As discussed in Section IV, this effect of NC protein is based on its ability to activate base-pair rearrangements (93).Furthermore, protein is not required to hold the two RNA molecules together because genomic RNA can be phenol-extracted from mature virion particles as a dimer. It was initially proposed that “purine quartets” in the 3’ end of the HIV-1 leader RNA are involved in the dimerization process (102, 104, 107). This model was based on the presence of several consensus RGGARA tracts in the DLS region downstream of the major splice donor (SD) of all retroviruses, and this mechanism is similar to dimerization of telomeric DNA through formation of quadruple helical structures stabilized by guanine base tetrads (112). For instance, the HIV-2 leader encodes four such motifs (Fig. 2; 27IGGGm27@ 2s4AGGM289,447AGGAGA452,541GGGAGA546),with an additional motif in the 5’ end of the gag open reading frame (573GGGAAA57s). However, we reported efficient dimerization of mutant HIV-2 leader transcripts that were deleted for all RGGARA motifs (106). Similarly, several studies with HIV-1 RNA mutants reported the involvement of sequences outside the original DLS region (108-111). In particular, a dimerization initiation site (DIS) upstream of the SD was identified in the 248-270 region of the HIV-1 genome (Fig. 1). The DIS motif consists of a palindromic sequence in the loop of a hairpin structure (Figs. 1and 2). Dimerization is proposed to be initiated via a looploop interaction based solely on Watson-Crick base pairing (108-111). This “loop-loop kissing” mechanism of autocomplementary loop sequences is very similar to RNA-RNA interactions proposed for the regulation of plasmid replication (113, 114). Based on studies with model RNA stem-loops (115),it can be suggested that not only the complementarity between a pair of single-stranded loops, but also the exact loop sequence (and structure) may play a role in determining the stability of this RNA-RNA complex. It is possible that subsequent opening of both stem regions could further stabilize the structure by the formation of additional base-pairing interactions (108111).Although there is convincing evidence for the critical role of this DIS hairpin in in vitro dimerization, infection experiments with mutant HIV-1 viruses should provide h r t h e r evidence for the proposed mechanism and verlfy the role of potential accessory sequences that may activate dimerization. We performed a phylogenetic analysis of the corresponding region of the RNA genome of other HIV-SIV viruses and were able to fold a similar hairpin motif with a 6-mer palindromic sequence in the single-stranded loop for most of the sequences analyzed (Fig. 7). We did not recognize this motif
BssH I1
A A G-C U-G C-G G-C U-A U-A C-G
HIV-1 LA1 HIV-1 ELI HIV-1 2226
Sno I
A A" G-C U-A C-G G-C U-A U-A C-G
HIV-1 U455 HIV-1 MAL
Sno I
Sno I
G-C G-C
G-C G-U C-G G-C A-U U-A U-A C-G
c-64
G-C A-U U-A U-A C-G
HIV-1 ANT-70
Kpn I
Sno I
A A G-C U-A C-G G-C U-A U-A C-G
SlVcpz
HIV-1 MVP-5180
Kpn I
Kpn I
Hpa I A
GC-G
G-C U-A G-C C-G U-A C-G
A-U U-A C-G U-A U-A C-G
SlVsyk
SlVmnd
A A
A G
C-G G-C G
G
A C-G G-C C-G G-U
HIV-2 ROD
G G
C-G U-A G-C
A A C-G G-C G
C-G G-C C-G G-U
HIV-2 NIH-Z
G G
G
C-G G-C C-G
G-U
SlVsm pbja
FIG.7. Phylogenetic comparison of DIS hairpin structures in the leader RNA of different HIV-SIV isolates. The palindroinic motifs in the loop are
denoted by grey boxes and the restriction enzymes with the corresponding sequence specificity are listed on top of the hairpins. No restriction enzyme with UCUACA sequence specificity has been identified. N o similar hairpin motif could be folded for the SIVagm isolates.
RNA STRUCTURE AND RETROVIRAL REPLICATION
21
in the SIVagm sequences. Among the different DIS hairpins identified, there was considerable sequence heterogeneity in both the stem and loop domains. However, base changes on one side of the stem are compensated by base substitutions in the opposite strand (“base-pair covariation”). Remarkably, sequence variation in the loop demonstrates covariation within the palindromes (“palindromic covariation”). For convenience, we listed the restriction endonucleases with sequence specificity that corresponds to the loop palindromes (e.g., HIV-1 with the GCGCGC palindrome corresponds to the BssHII restriction enzyme). Given the variety of palindromes used by the different viruses, it is likely that the exact base sequence has relatively little importance. Based on these structure models, it now is straightforward to test the requirement for these structures and sequences in the context of the replicating virus. Multiple palindromic sequences are present in other single-stranded regions of the leader transcript. For instance, the prototype HIV-1 LA1 virus (Fig. 1) contains the palindrome ,,AAGCUU,, (Hind111 site) in the loop of the poly(A) hairpin and the ,o,AAAAUUUU,09 octamer palindrome in between the SD and 9 hairpins (see Section VI). Whether these additional palindromes contribute to the stability of the RNA dimer complex remains unclear. It is possible that the DIS interaction initiates dimerization, whereas other base-pairing contacts subsequently stabilize the complex. There is indeed some evidence that dimerization is a multistep process because dimers have been observed to “mature”-that is, to increase their stability during assembly of virion particles (116). Phylogenetic analysis provides little evidence for these accessory palindromes. However, as was observed for the DIS palindromes (Fig. 7), sequence conservation is perhaps not essential for these base-pairing motifs.
VI. The RNA Packaging Signal The packaging of retroviral genomes involves the specific interactions of the full-length RNA genome with Gag-derived proteins, in particular the Cys-His boxes of the NC domains (reviewed in 11 7). Because the sequences located between the major splice donor (SD, HIV-1 position 289) and the gag gene are present in full-length genomes and invariably absent from spliced mRNA forms, this region has received the greatest attention as the primary determinant for encapsidation. Indeed, several laboratories have presented evidence that sequences between the SD and gag open reading frame play a role in genome packaging (118-121). In most cases, this identification has been achieved by deletion mutagenesis, leading to an RNA encapsidation defect. More recently, there is accumulating evidence that
22
BENJAMIN BERKHOUT
other regions of the HIV genome are also involved. In particular, sequences in the U 5 region (98)or the DIS region (122) of the leader, the 5’ part of the gag open reading frame (123-125), and env sequences overlapping the Revresponsive element (RRE) (126) have been reported to contribute to the packaging function. Despite this increased knowledge, the actual RNA-protein interactions involved in packaging are poorly understood (127-130). Efficient binding of the NC protein to a 110-nucleotide HIV-1 RNA domain containing the four stem-loop structures DIS, SD, 9,and AUG has been reported (128). Others observed efficient binding with a three-hairpin fragment (SD, T,and AUG) (11a)or with the single SD hairpin (129), with an essential role for both the loop sequence and the structural integrity of the SD stem. These authors reported an effect of this region on RNA dimerization as well. In fact, the sequence between the SD and q hairpins may be one of the accessory palindromes discussed in Section V. Eight intermolecular base pairs can form between two HIV-1 RNA molecules by means of the typical 30,AAAAUUUU,,, motif and this interaction is not expected to disturb the intramolecular base pairs in the two flanking hairpin structures. There is some conservation of a sequence and/or structure motif in the packaging signals of other retroviruses. There is a hairpin motif with a conserved GACG loop sequence in type C murine retroviruses (131) and a similar structure with a GAPyC loop sequence conserved in some type D retroviruses (132). No similar sequence motifs can be found in the HIV-1 and HIV-2 leaders (Figs. 1 and 2), but we note the occurrence of purine-rich tetraloops in the HIV-leader RNA. These structures resemble the “tetraloops” that account for the majority of hairpins in ribosomal RNA (12). Three predominant tetraloop variants are present in ribosomal RNA (UUCG enclosed by an C - G base pair, GA/CAA with G C as closing pair, and CUUG with a terminal R.Y), and their remarkable structural similarity was elucidated by NMR studies (133, 134). This part of the HIV-2 genome is more extended than the HIV-1 counterpart, as observed for the upstream leader motifs (e.g., TAR; Section 11). Additional hairpin structures are predicted for the HIV-2 RNA (e.g., q 2 , W), with loops that are purine-rich but not consisting of four nucleotides. HIV-2 RNA is predicted to fold two purine tetraloops in the PBS region (Fig. 2; 273GAAA,,, and 333GAGA336).Thus, small hairpins with purine-rich loops may be involved in RNA packaging, but it is clear that additional sequence and/or structure elements are likely to be required for the selective encapsidation of retroviral genomes. In particular, we note the presence of extended polypurine stretches in this region of the HIV-SIV genomes (e.g., HIV-1, ,,AGAAGGAGAGAGAS6; HIV-2, ,,,GGGAGCAGAAGAGG,,,; a conserved 6-mer is underlined).
RNA STRUCTURE AND HETROVIRAL REPLICATION
23
It is expected that an intricate and subtle network of tertiary interactions are involved in RNA dimerization, primer tRNA annealing, and encapsidation of the HIV genome into assembling virions. The temporal relationship of these processes has not been characterized rigorously. A potential link between dimerization and encapsidation of HIV-1 genomic RNA has been proposed, and the RNA signals involved may overlap in the 3’ part of the leader transcript. Recent studies with HIV-1 and other retroviruses suggest that their genomes are already joined into some dimeric structure at the time of virus assembly, which is consistent with the notion that a dimeric genome is specifically recognized during virion assembly (12). Both the RNA signals for dimerization and packaging should be mapped in further detail. Positioning of a critical dimerization signal (the DIS hairpin) upstream of the SD does not necessarily prove the overlap hypothesis to be wrong. First, there still may be sequences downstream of the SD that stimulate dimerization or stabilize the dimer configuration (see Section V). Second, it cannot formally be excluded that part of the packaging signals are also positioned upstream of the SD. A deletion in the DIS region has previously been shown to reduce the amount of intact genomic RNA present per virion (122),which suggests that this region is indeed involved in packaging. An alternative interpretation is that encapsidation of dimeric genomes takes precedence over encapsidation of monomeric RNA. It is also possible that the mutant RNA was packaged, but rapidly degraded due to the absence of stable dimers.
VII. Splicing and Translation Functions Splicing of HIV-1 RNA is extremely complex because of the presence of multiple, alternatively used splice sites (reviewed in 135). In particular, numerous weak acceptor sites, located toward the center ofthe genomic RNA, are competing points of ligation for splicing. The leader encodes the major splice donor used to generate most subgenomic HIV transcripts (HIV-1, zs,CUG 4 GUG,,,; HIV-2, 468AAG4 GUA,,), and these sequence motifs are both present in one of the small hairpin motifs upstream of the gag gene. Mutation of the major SD in the HIV-1 virus slowed the kinetics of RNA and protein synthesis and the kinetics of virus spread (135).No complete loss of virus infectivity was observed because a cryptic SD site, four nucleotides further downstream, was activated in this mutant (2gIUGA1GUA,,,). The sequence of this cryptic SD site is strongly conserved among all HIV-1 isolates, suggesting some kind of selective pressure on this sequence motif, signal. Induction of a nearby cryptic splice site perhaps as part of the
24
BENJAMIN BERKHOUT
suggests that certain features of this leader region (sequences and/or structures) direct the splicing machinery to this part of the HIV genome. An interesting splicing pattern was described for the leader transcript of the HIV-2 virus group (136).In addition to the major SD, a minor SD inside the TAR sequences (GoCAG1GUA,) was used in combination with a splice acceptor (SA) site in the 5' stem region of the PBS structure (zooUAG UCG,,,). Usage of this splice does generate a unique transcript that lacks part of the TAR structure and the complete poly(A) hairpin. The biological significance of this alternative transcript is currently unclear. However, we would predict that such transcripts would remain Tat-inducible because the transcriptional function of TAR is completed prior to the splicing event (15, 137). Although it is clear that TAR RNA functions primarily in transcriptional activation from the LTR promoter, there is compelling evidence to suggest that this RNA motif has separate roles in translational regulation via cis- and trans-acting mechanisms.
1. The TAR RNA structure blocks movement of the scanning ribosome, leading to cis-inactivation of HIV-1 mRNAs (14, 138). Mutations that disrupt predicted secondary structure within the TAR hairpin relieve the inhibition and increase accessibility of the 5'-cap structure of the mRNA to translation initiation factors. Other leader regions may also influence translation; the RNA secondary structure model predicts the gag AUG codon to be occluded in a local hairpin structure that may reduce the efficiency of translation initiation. Dimerization of retroviral RNAs also blocks translation in cell-free assay systems (100). Specific leader mutants should be tested in translation assays to further define such effects. 2. Recent biochemical data indicate that the human autoantigen La is involved in regulation of HIV-1 translation through binding of the TAR RNA 139,140).La is an RNA-binding protein that elicits an autoimmune response in patients with systemic lupus erythematosus and Sjogren's syndrome. La binds to the 5' leader of poliovirus as well, and in vitro translation studies implicate this protein in poliovirus internal translation initiation (141).These results, combined with the observation that scanning ribosomes are inhibited by structure in the HIV-1 leader (see above), may indicate that HIV is using an internal ribosomal entry site (IRES), as originally proposed for poliovirus (142). Evidence for an internal ribosome entry mechanism was recently reported for another retroviral species, the murine leukemia virus (143).However, there is no direct evidence to support a nonscanning mechanism of translation for the HIV-SIV viruses. In fact, inspection of all HIV-SIV leader sequences indicates that upstream AUG triplets, which can potentially usurp
RNA STRUCTURE AND RETROVIRAL REPLICATION
25
the scanning ribosomes, are excluded from the leader region. This bias against upstream initiation codons is the expected condition for a scanning mechanism of translation. Furthermore, we have found that HIV-1 replication is severely inhibited by the introduction of an upstream AUG-initiation codon (unpublished data). 3. Viral RNA can also regulate translation in trans. Several viruses affect the activity of the interferon-induced RNA-dependent protein kinase (PKR), which catalyzes the phosphorylation of protein synthesis initiation factor eIF-2 (144). For HIV-1, the TAR RNA hairpin activates two interferoninduced enzymes, PKR and (2-5)A-synthetase, of which the latter is able to activate a cellular RNase (138, 145). Like other viruses such as adenovirus, influenza virus, and vaccinia virus, HIV-1 has acquired a mechanism for evading the antiviral activity of PKR and (2-5)A-synthetase. The HIV-1 Tat protein inhibits TAR-mediated activation of PKR and (2-5)A-synthetase (145, 146), suggesting an escape mechanism for the virus. There has been some doubt that TAR could activate PKR because maximal PKR activation requires a stem region of about 65 to 85 base pairs (147, 148). TAR has only 23 base pairs, but we mentioned in Section I the possibility of coaxial stacking with the neighboring poly(A) stem, which would result in a duplex structure of 40 base pairs. However, most studies used only partial HIV-1 leader transcripts with sequences up to position +82, where a convenient HindIII restriction site is located, but such RNA fragments contain only the TAR hairpin structure. Because the HIV leader transcript contains multiple hairpin structures of considerable stability (Figs. 1 and 2), the cis- and truns- inhibitory effects on translation should be performed with full-length leader RNAs.
VIII. Base Composition of HIV-SIV leader RNAs We analyzed the nucleotide bias and the compositional tendencies of short oligonucleotides in the leader in order to highlight possible sequence motifs that may play a role in any of the biological functions of this RNA molecule. Lentiviral genomes, including HIV-1 RNA, have an unusual base composition (149-153). In particular, the HIV-1 genome is A-rich (35.6%) and C-poor (17.9%). This points to one or more mutational and selective constraints not yet identified. The biased base composition is present in all open reading frames, and dictates the typical codon usage of these viruses (154). Interestingly, when we compared the amino-acid composition of several HIV-1 and HIV-2 proteins to homologous functions of the human T cell leukemia viruses HTLV-I and HTLV-I1 (that do not have an A-rich genome), we found significant differences in total aniino-acid content that correlate with the preferential use of amino-acid residues encoded by A-rich codons in
26
BENJAMIN BERKHOUT
HIV (155). Furthermore, direct alignment of protein domains indicated that many conservative and some nonconservative amino-acid changes can be explained by strong “A-pressure’’ in HIV. These examples underscore the magnitude of “A-pressure” in the HIV-SIV RNA genomes. We analyzed 17 complete HIV-SIV viral genomes and compared the base composition with that of the corresponding leader regions (7003 nucleotides in total). Surprisingly, we found that the nucleotide composition of the leader is more balanced (Table I), without a preference for the A nucleotide (24.4%)or bias against the C nucleotide (23.8%). Next, the complete genomes and leader regions were evaluated for extremes of dinucleotide relative abundances. A common assessment of dinucleotide bias in a sequence is via the odds-ratio measure, pxy = fxY/fJy, where fx and fy denote the frequency of mononucleotide X and Y, respectively, and fxy the frequency of dinucleotide X-Y in the sequence (156). As a conservative criterion, for pxy > 1.25 (or < 0.78), the X-Y pair is regarded to be of high (or low) relative abundance compared with a random association of mononucleotides (156). Table I1 lists the abundance calculated for the complete genomes and the values obtained for the leader regions. Most strikingly, we observed a strong discrimination against C-G in HIV-SIV genomes (p = 0.27), although the discrimination is somewhat relieved in the leader region (p = 0.59). Furthermore, there is a leader-specific trend to cluster purines, but in an alternating manner (e.g., AGAG). Similarly, pyrimidine clusters are favored (e.g., CUCU). This holds in particular for the sequence A-G (p = 1.47 for the leader vs. 1.19 for the genome in total) and C-U (1.76 vs. 1.20), but also to some extent for G-A (1.16 vs. 0.96) and U-C (1.06 vs 0.90).
TABLE I BASE COMPOSITIONS OF COMPLETE HIV-SIV GENOMES AND LEADERREGIONS” Average base composition (%) Region
A
G
C
U
Complete genomes Leader region
35.0 24.4
24.3 31.0
18.6 23.8
22.0 20.8
a Viral strains analyzed (1) belong to the HIV-1 group (isolates ANT-70, MVP5180, ELI, 2226, LAI, MAL, U455, SIVcpz), the HIV-2 group (isolates NIHZ, ROD,SIVsmmpbja), the SIVagm group (isolates 155, 3, XX, AA), SIVrnnd, and SfVsyk (comgnm).
27
RNA STRUCTURE AND RETROVIRAL REPLICATION
TABLE 11 ABUNDANCES OF DINUCLEOTIDES COMPLETEHIV-SIV GENOMES AND LEADERREGIONS
RELATIVE
IN
5‘\3’ A G C U A
G C U
A
G
C
U
HIV-SIV genornes 0.95 1.19 0.89 0.96 1.20 1.04 1.26 0.27 1.26 0.90 1.13 0.90
0.96 0.83 1.20 1.13
HIV-SIV leaders 1.14 1.47 0.86 1.16 0.92 1.03 0.81 0.59 1.02 0.88 1.05 1.06
0.38 0.88 1.76 0.98
The leader regions restrict the formation of homopolymeric stretches (GG , 0.92 vs. 1.20; C-C, 1.02 vs. 1.26; U-U, 0.98 vs. 1.13).In this context, the 277GGGG28,,301AAAAA305, and ,,UUUU3, stretches present in the HIV-1 q-region are rather unusual and may constitute a signal for RNA dimerization or packaging (see Sections V and VI). Another remarkable difference between the leader region and the complete genome is the occurrence of the A-U dinucleotide. Whereas unbiased levels are present in the complete genomes (p = 0.96), the leader is decisively A-U suppressed (p = 0.38). Apart from the C-G and A-U deficiencies, there are no other pervasive significant dinucleotide extremes in the HIV-SIV genomes and the leader region in particular. We will discuss A-U and C-G motifs in more detail below. Scarcity of the A-U dinucleotide may reflect a general requirement for protection of the viral RNA against ribonucleases, as suggested for cellular transcripts (157, reviewed in 158).This suggestion was based on the discovery of a stereotypic, repeating UUAUUUAU sequence in the 3’-untranslated region of certain genes (159)and the observation that this motif is destabilizing to mRNA molecules that contain them (160). However, there is some experimental evidence that U-A and not A-U is the RNA dinucleotide most susceptible to RNase activity (157). Furthermore, protection against ribonucleases does not easily explain the selective rejection of A-U in the leader region only. From another perspective, in view of the prominent role of the “AAUAAA box” in mediating transcription termination (see Section
28
BENJAMIN BERKHOUT
111),occurrences of A-U might be minimized to avoid inappropriate binding of polyadenylylation factors. At the DNA level, the presence of the A-Tcontaining motif that regulates transcription initiation from the upstream LTR promoter (the “TATAA box”) may further limit the usage of the A-T sequence in the flanking leader region. The C-G level of the lentiviral genome, including that of HIV-1, has been reported to be extremely low (152, 161, 162). In contrast, there is no evidence for selection against C-G in oncoretroviruses such as HTLV-I (162), suggesting that low C-G levels are of biological importance specifically for the HIV-like viruses. In vertebrate genomes, methylation of cytosines occurs in C-G nucleotides, and this modification often correlates negatively with gene expression (reviewed in 163). HIV-1 LTR-directed gene expression is also susceptible to transcriptional inactivation by methylation (164). Of the 81 C-G dinucleotides present in the genome of the prototype HIV-1 LA1 isolate, 21 are located in the 113-469 region, which forms the 3‘ part of the leader and the 5’ end of the gag open reading frame. The reason for this C-G clustering is currently unclear. The dinucleotide analysis demonstrated a preference for R-R and Y-Y motifs without iterations of the same nucleotide (A-G, G-A, U-C, C-U). This trend results in the frequent occurrence of typical sequence motif Rn2,Yn>, and Yn2,Rn2,. In the prototype HIV-LA1 strain, we found 26 Rn2,YnZ2 and 29 Yn2,Rn2, motifs, compared to only 8 motifs of the alternating type (R-Y), and (Y-R),. The latter motifs are not only restricted in number, but also in nucleotide composition; they frequently use G but not A, which is combined either with U to form GUGU motifs or with C to form GCGC sequences. All three G-U repeats are located in the U5 region downstream of the HIV poly(A) site and function as enhancers of polyadenylylation (see Section 111). The C-G repeats do encode 6 of the 21 C-G motifs present in the leader region. The abundance of the R-Y-clustered motifs may suggest that they provide a signal as sequence or structure element. The average length of the clustered motifs is R,,7Y3.5 and Y3,4R3,4 for the prototype HIV-1 LA1 leader RNA, with the sequences at the R-Y borders conforming to the dinucleotide bias described in Table 11. Two extended Y-R motif.. in HIV-1 are repeated in the leader. An 8-nucleotide motif is present twice in the TAR domain (3uCUCUCU-GG,,/,,CUCUCU-GG,,) and a 10-nucleotide segment from TAR is copied in the PBS region (3,gGGAG-CUCUCU,1/zz,agaGGAG-CUCUCUc233). Although there is no evidence that any of these motifs is involved in one of the molecular functions of the leader RNA during viral replication, we did include this analysis to highlight the idea that the leader RNA sequences are distributed in a nonrandom manner. Perhaps a given nucleotide content or pattern of distribution can cause a specific RNA structure, or certain Y-R
RNA STRUCTURE AND RETROVIRAL REPLICATION
29
arrangements may be part of‘ a motif that is recognized by leader RNAbinding proteins.
IX. Concluding Remarks Understanding the three-dimensional structure formed by the HIV leader RNA molecule, both free and in the ribonucleoprotein complex of the virion, is crucial for analyzing its specific recognition by proteins and its interaction by other RNA molecules during virus replication. The number of HIV-SIV sequences now known is sufficiently large that comparative analysis can be used effectively to deduce some of the basic design principles underlying HIV RNA structure. Further, experimental studies on viral RNA seem to be about to enter a new phase with the possibility to perform in oitro selection experiments to analyze details of RNA-protein interactions (165) and in tjivo “forced evolution” systems that describe repair pathways for viruses mutated in specific RNA sequencesistructures (32, 166). By understanding the molecular basis of the interactions that govern critical steps in the retroviral replication cycle, it may be possible to develop methods to intervene therapeutically in the process. For instance, gene therapy has been proposed for treatment of AIDS (167), for which there are currently no effective chemotherapeutic or vaccine therapies, and several molecular strategies have been designed and shown to inhibit replication of the HIV-1 retrovirus in tissue-culture systems. These anti-HIV approaches include RNA molecules such as antisense transcripts, ribozymes, and sense/decoy motifs that mimic important HIV-1 RNA structures (168-180). For instance, a TAR RNA decoy transcribed from a retroviral vector is cur. of different leader rently being tested in clinical trials ( 1 8 0 ~ )Combination RNA motifs can be used to increase further the efficiency or specificity of ‘ may antiviral RNA molecules. Addition of the retroviral packaging signal P result in colocalization of the inhibitor transcript and the target HIV-1 genomic RNA within viral particles (181). Alternatively, it may be possible to inhibit HIV expression specifically in trans by leader-encoded functions like the dimerization signal (182). ACKNOWLEDGMENTS I thank Atze Das for critical reading of the manuscript and members of my laboratory for suggestions and helpful discussions. 1 am grateful to Jan van der Noordaa for support and encouragement. I thank Wim van Est for excellent artwork. The research of my group has been supported by grants from the Netherlands Organization for Scientific Research (NWO), the Dutch Cancer Society (KWF), and the Dutch AIDS Foundation.
30
BENJAMIN BERKHOUT
REFERENCES 1. G. Myers, S. Wain-Hobson, G. N. Pavlakis, B. Korber and R. F. Smith, eds., in “Human Retroviruses and AIDS: A Compilation and Analysis of Nucleic Acid and Amino Acid Sequences.” Los Alamos National Laboratory, Los Alamos, NM, 1994. 2 . P. M. Sharp, D. L. Robertson, F. Gao and B. H. Hahn, AZDS 8, S27 (1994). 3 . M. A. Muesing, D. H. Smith and D. J. Capon, Cell 48, 691 (1987). 4 . J.-I. Sakuragi, M. Fukasawa, R. Shibata, H. Sakai, M. Kawamura, H. Akari, T. Koyomsu, A. Ishimoto, M. Hayami and A. Adachi, Virology 185, 455 (1991). 5. B. Berkhout, NARes 20, 27 (1992). 6. G. P. Harrison and A. M. L. Lever, J. Virol. 66, 4144 (1992). 7 . F. Baudin, R. Marquet, C. Isel, 1.-L. Darlix, B. Ehresmann and C. Ehresmann, J M B 229, 382 (1993). 8. B. Berkhout and I. Schoneveld, NARes 21, 1171 (1993). 9. T. Hayashi, Y. Ueno and T. Okamoto, FEBS Lett. 327, 213 (1993). 10. K. Sakaguchi, N. Zambrano, E. T. Baldwin, B. A. Shapiro, J. W. Erickson, J. G. Omichinski, G. M. Clore. A. M. Gronenborn and E. Appella, PNAS 90, 5219 (1993). 1 1 . B. Berkhout, B. Klaver and A. T. Das, Virology 207, 276 (1995). 11a. J. Clever, C. Sassetti and T. G. Parslow, J . Virol. 69, 2101 (1995). 12. R. R. Gutell, N. Larsen and C. R. Woese, Microbiol. Reu. 58, 10 (1994). 13. M. Kozak, Cell 34, 971 (1983). 14. N. T. Parkin, E. A. Cohen, A. Darveau, C. Rosen, W. Haseltine and N. Sonenberg, EMBO J. 7, 2831 (1988). 15. B. Berkhout, R. Silverman and K.-T. Jeang, Cell 59, 273 (1989). 16. V. K. Pathak and H. M. Temin, J . Virol. 66, 3093 (1992). 17. B. Klaver and €3. Berkhout, NARes 22, 137 (1994). 18. M. L. Hammarskjold, D. Rekosh, B. Berkhout, Y.-N. Changand K.-T., Jeang, AZDS 5, S3 (1992). 19. J. A. Garcia and R. B. Gaynor, AIDS 8, S3 (1994). 20. K. A. Jones and B. M. Peterlin, ARB 63, 717 (1994). 21. C. Dingwall, I. Ernberg, M. J. Gait, S. M. Green, S. Heaphy, J. Karn, A. D. Lowe, M. Singh, M. A. Skinner and R. Valerio, PNAS 86, 6925 (1989). 22. R. A. Marciniak, M. A. Garcia-Blanco and P. A. Sharp, PNAS 87, 3624 (1990). 23. A. Gatignol, A. Buckler-White, B. Berkhout and K.-T. Jeang, Science 251, 1597 (1991). 24. C. T. Sheline, L. H. Milocco and K. A. Jones, Genes Deu. 5, 2508 (1991). 25. F. Wu, J. Garcia, D. Sigman and R. Gaynor, Genes Deu. 5, 2128 (1991). 26. M. P. Rounseville and A. Kumar, J. Virol. 66, 1688 (1992). 27. T. R. Reddy, M. Suhasini, J. Rappaport, D. J. Looney, G. Kraus and F. Wong-Staal, AZDS Res. Hum. Retrooiruses 11, 663 (1995). 28. K. A. Jones, P. A. Luciw and N. Duchange, Genes Deu. 2, 1101 (1988). 29. F. K. Wu, J. A. Garcia, D. Harrich and R. B. Gaynor, EMBOJ. 7, 2117 (1988). 30. J. A. Garcia, D. Harrich, E. Soultanakis, F. Wu, R. Mitsuyasu and R. B. Gaynor, EMBO J. 8, 765 (1989). 31. H. Kato, M. Horikoshi and R. 6. Roeder, Science 251, 1476 (1992). 31a. B. Berkhout and B. Klaver, NARes 21, 5020 (1993). 32. B. Klaver and B. Berkhout, EMBOJ. 13, 2650 (1994). 33. B. Berkhout and B. Klaver, J. Gen. Virol. 76, 845 (1995). 34. D. Harrich, G . Mavankal, A. Mette-Snider and R. B. Gaynor, 1. Virol. 69, 4906 (1995). 35. B. Klaver and B. Berkhout, J . Virol. 68, 3830 (1994).
RNA STRUCTURE AND RETROVIRAL REPLICATION
31
36. R. S. McLaren, S. F. Newbury, 6. S . C. Dance, H. C. Causton and C. F. Higgins, JMB 221, 81 (1991). 37. B. Berkhout, J. L. B. van Wamel and B. Klaver, J M B 252, 59 (1995). 38. K.-Y. Chang and I. Tinoco, Jr., PNAS 91, 8705 (1994). 39. P. Wang, M.-C. Rouyez, S . Ducamp, S. Saragosti and M. Ventura, BBRC 195,565 (1993). 40. E. Wahle and W. Keller, ARB 61, 419 (1992). 41. W. Keller, Cell 81, 829 (1995). 42. S. Bohnlein, J. Hauber and B. R. Cullen, J . Virol. 63, 421 (1989). 43. G. M. Gilmartin, E. S . Fleming, J. Oetjen and B. R. Graveley, Genes Deu. 9, 72 (1995). 44. J. M. Coffin, in “RNA Tinnor Viruses” (R. Weiss et a l . , eds.), p. 261. Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 1984. 45. H. Varmus and R. Swanstriim, in “RNA Tumor Viruses” (R. Weiss et al., eds.), p. 369. Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 1984. 46. P. H. Brown, L. S. Tiley and B. R. Cullen, J. Virol. 65, 3340 (1991). 47. J. Cherrington and D. Ganem, EMBO J. 11, 1513 (1992). 48. J. D. Dezazzo, J. E. Kilpatrick and M. J. Imperiale, MCBiol 11, 1624 (1991). 49. A. Valsamakis, S. Zeichner, S. Carswell and J. C. Alwine, PNAS 88, 2108 (1991). 50. C. Weichs an der Glon, J. Monks and N. J. Proudfoot, Genes Den 5, 244 (1991). 51. G. M. Gilmartin, E. S. Fleming and J. Oetjen, EMBOJ. 11, 4419 (1992). 52. A. Valsamakis, N. Schek and J, C. Alwine, MCBiol 12, 3699 (1992). 53. M . Seiki, S. Hattori, Y. Hirayania and M. Yoshida, PNAS 80, 3618 (1983). 54. Y. F. Ahmed, G. M. Gilmartin, S. M. Hanly, J. R. Nevins and W. C. Greene, Cell 64,727 (1991). 55. P. H. Brown, L. S. Tiley and B. R. Cullen, Genes Deu. 5, 1277 (1991). 56. C. W. G. van Gelder, S. I. Gunderson, E. J. R. Jansen, W. C. Boelens, M. PolycarpouSchwarz, I. W. Mattaj and W. J. van Venrooij, EMBOJ. 12, 5191 (1993). 57. C. Weichs an der Glon, M. Ashe, J. Eggermont and N . J. Proudfoot, EMBOJ. 12, 2119 (1993). 58. K-T. Jeang, B. Berkhout and B. Dropulic, JBC 268, 24940 (1993). 59. P. Nahreini and M. B. Mathews, J. Virol. 69, 1296 (1995). 60. R. P. Woychik, R. H. Lyons, L. Post and F. M . Rottman, PNAS 8, 3944 (1984). 61. E. R. Gimmi, M. E. Reff and I. C. Deckmann, NARes 17, 6983 (1989). 62. A. Sittler, H . Gallinaro and M. Jacob, J M B 248, 525 (1995). 63. A. S. Williams and W. F. Marzluff, NARes 23, 654 (1995). 64. M. Jiang, J. Mak, M. A. Wainberg, M. A. Parniak, E. Cohen and L. Kleiman, BBRC 185, 1005 (1992). 65. M. Jiang, J. Mak, A. Ladha, E. Cohen, M. Klein, 8 . Rovinski and L. Kleiman, J . Virol. 67, 3246 (1993). 66. A. T. Das, S. E. C. Koken, B. B. Oude Essink, J. L. 8 . van Wamel and B. Berkhout, FEBS Lett. 341, 49 (1994). 67. C. Barat, V. Lullien, 0. Schatz, 6. Keith, M. T. Nugeyre, F. Gruninger-Leitch, F. BarreSinoussi, S. F. J. LeGrice and J.-L. Darlix, EMBO J. 8, 3279 (1989). 68. L. Sarih-Cottin, B. Bordier, K. Musier-Forsyth, M. Andreola, P. J. Barr and S. Litvak, JMB 226, 1 (1992). 69. S. Weiss, B. Konig, H. J. Muller, H. Seidel and R. S. Goody, Gene 111, 183 (1992). 70. R. W. Sobol, R. J. Suhadolnik, A. Kumar, B. J. Lee, D. L. Hatfield and S. H. Wilson, Bchem. 30, 10623 (1991). 71. L. A. Kohlstaedt and T. A. Steitz, PNAS 89, 9652 (1992). 72. M . D. Delahunty, S. H. Wilson and R. L. Karpel, J M B 236, 469 (1994). 73. C. Barat, S. F. J. LeGrice and J.-L. Darlix, NARes 19, 751 (1991).
32
BENJAMIN BERKHOUT
74. 8. 13. Oude Essink, A. T. Das and B. Berkhout, JBC 270, 23867 (1995). 75. Y. Mishima and J. A. Steitz, EMBO J. 14, 2679 (1995). 76. X. Li, J. Mak, E. J. Arts, Z. Gu, L. Kleiman, M. A. Wainberg and M. A. Parniak, J. Virol. 68, 6198 (1994). 77. J. Mak, M. Jiang, M. A. Wainberg, M. L. Hammarskjold, D. Rekosh and L. Kleiman, J. Virol. 68, 2065 (1994). 78. A. T. Das, B. Klaver and B. Berkhout, J . Virol. 69, 3090 (1995). 79. A. T. Das and B. Berkhout, NARes 23, 1319 (1995). 80. B. Gerwin and J. G. Levin, J. Virol, 24, 478 (1977). 81. J. G. Levin and J. 6 . Seidman, J. Virol.29, 328 (1979). 82. J. Colicelli and S. P. Goff, J. Virol. 57, 37 (1986). 83. A. H. Lund, M. Duch, J. Lovmand, P. Jorgensen and F. S. Pedersen, J Virol.67,7125 (1993). 84. A. Aiyar, D. Cobrinik, Z. Ge, H.-J. Kung and J. Leis, J . Virol. 66, 2464 (1992). 85. C. Isel, R. Marquet, G. Keith, C. Ehresmann and B. Ehresmann, ] B C 268,25269 (1993). 86. C. Isel, C. Ehresmann, G . Keith, B. Ehresmann and R. Marquet, J M B 247, 236 (1995). 87. D. Cobrinik, L. Soskey and J. Leis, J. Virol.62, 3622 (1988). 87a. D. Cobrinik, A. Aiyar, Z. Ge, M. Katzman, H. Huang and J. Leis, J. Virol. 65, 3864 (1991). 87h. A. Aiyar, Z. C e and J. Leis, /. Virol. 68, 611 (1994). 87c. E. J. Arts, X. Li, Z. Cu, L. Kleiman, M. A. Parniak and M. A. Wainberg, J B C 269, 14672 (1994). 88. A. M. Borman, C. Quillent, P. Charneau, C. Dauguet and F. CIavel, J. Virol. 69, 2058 (1995). 89. R. L. LaFemina, P. A. Callahan and M. 6. Cordingley, J . Virol. 65,5624 (1991). 90. C. Vink, D. C. van Gent, Y. Elgersma and R. H. A. Plasterk, J. Virol. 65, 4636 (1991). 91. A. D. Leavitt, R. B. Rose and H. E. Varmus, J. Virol. 66, 2359 (1992). 92. A. C. Prats, L. Sarih, C. Cabus, S . Litvak, G. Keith and J.-L. Darlix, EMBO J. 7, 1777 (1988). 92a. R. Khan and D. P. Giedroc, JBC 267, 6689 (1992). 93. Z. Tsuchihashi and P. 0. Brown, J. Virol. 68, 5863 (1994). 94. U. von Schwedler, J Song, C. Aiken and D. Trono, J . Virol. 67, 4945 (1993). 95. 0. Schwartz, V. Marechal, 0. Danos and J.-M. Heard, J . Virol. 69, 4053 (1995). 96. C. Aiken and D. Trono, J. Virol. 69, 5048 (1995). 97. D. H. Gabuzda, K. Lawrence, E. Langhoff, E. Terwilliger, T. Dorfman, W. A. Haseltine and J, Sodroski, J. Virol.66, 6489 (1992). 98. E. Vicenzi, D. S. Dimitrov, A. Engelman, T.-S. Migone, D. F. J. Purcell, J. Leonard, G. Englund and M. A. Martin, J. Virol. 68, 7879 (1994). 99. W . 3 . Hu and H. M. Temin, PNAS 87, 1556 (1990). 100. E. Bieth, C. Gabus and J.-L. Darlix, NARes 18, 119 (1990). 101. J. L. Darlix, C. Gabus, M.-T. Nugeyre, F. Clavel and F. Barre-Sinoussi, J M B 216, 689 (1990). 102. R. Marquet, F. Baudin, C. Gabus, J. L. Darlix, M. Mougel, C. Ehresmann and B. Ehresmann, NARes 18, 2349 (1991). 103. H. De Rocquigny, C. Gabus, A. Vincent, M.-C. Fournie-Zaluski, B. Roques and 1.-L. Dalix, PNAS 8, 6472 (1992). 104. 6. Awang and D. Sen, Bchen 32, 11453 (1993). 106. B. Berkhout, B. B. Oude Essink and I. Schoneveld, FASEB J. 7, 181 (1993). 107. W. Sundquist and S. Heaphy, PNAS 90, 3393 (1993). 108. M. Laughrea and L. JettB, Bchern 33, 113464 (1994).
RNA STRUCTURE AND RETROVIRAL REPLICATION
33
109. J.-C. Paillart, R. Marquet, E. Skripkin, B. Ehresmann and. C. Ehresmann, JBC 269, 27486 (1994). 110. E. Skripkin, J . X . Paillart, R. Marquet, B. Ehresmann and C. Ehresmann, PNAS 91,4945 (1994). 111. D. Muriaux, P.-M. Girard, B. Bonnet-Mathonihre and J. Paoletti, JBC 270, 8209 (1995). 112. J. R. Williamson, M. K. Raghuranian and T. R. Cech, Cell 59, 871 (1989). 113. Y. Eguchi, T. Itoh and J. I. Tomizawa, ARB 60, 631 (1991). 114. C. Persson, E . Gerhart, H. Wagner and N . Nordstrom, EMBO J. 9, 3767 (1990). 115. R. S. Gregorian, Jr. and D. M. Crothers, J M B 248, 968 (1995). 116. W. Fu, R. J. Gorelick and A. Rein, J. Viral. 68, 5013 (1994). 117. E. Hunter, Semin. Virol. 5, 71 (1994). 118. A. M. L. Lever, H. Gottlinger, W. Haseltine and J. Sodroski, 1. Virol. 63, 4085 (1989). 119. A. Aldovini and R. A. Young, J. Virol. 64, 1920 (1990). 120. F. ClaveI and J. M. Orenstein, J. Virol. 64, 5230 (1990). 121. T. Hayashi, T. Shioda, Y. Iwakura and H. Shibuta, Virology 188, 590 (1992). 122. H.-J. Kim and J. J. O’Rear, Virology 198, 336 (1994). 123. G. L. Buchschacher and A. T. Panganihan, J. Viral. 66, 2731 (1992). 124. J. Luban and S. P. Goff, J. Viral. 68, 3784 (1994). 125. C. Parolin, T. Dorfman, G. Palu, H. Gottlinger and J. Sodroski, J. Virol. 68, 3888 (1994). 126. J. H. Richardson, L. A. Child and A. M . L. Lever, J. Viral. 67, 3997 (1993). 127. R. D. Berkowitz, J. Luban and S. P. Goff, J. Viral. 67, 7190 (1993). 128. R. D. Berkowitz and S. P. Goff, Virology 202, 233 (1994). 129. K. Sakaguchi, N . Zambrano, E . T. Baldwin, B. A. Shapiro, J. W. Erickson, J. G. Omichinski, G . M. Clore, A. M. Gronenborn and E . Appella, PNAS 90, 5219 (1993). 130. J. Dannull, A. Surovoy, G. Jung and K. Moelling, EMBO J. 13, 1525 (1994). 131. D. A. Konings, M. A. Nash, J. V. Maizels and R. B. Arlinghaus, J. Viral. 66, 632 (1992). 132. G. P. Harrison, E. Hunter and A. M. L. Lever, J. Viral. 69, 2175 (1995). 133. H. A. Heus and A. Pardi, Science 253, 191 (1991). 134. 6 . Varani, C . Cheong and I. Tinoco, Jr., Bchem 30, 3280 (1991). 135. D. F. J. Purcell and M. A. Martin, J. Virol. 67, 6365 (1993). 136. G. A. Viglianti, P. L. Sharma and J. I. Mullins, J. Virol. 64, 4207 (1990). 137. B. Berkhout and K.-T. Jeang, in “Genetic Structure and Regulation of HIV” (W. A. Haseltine and F. Wong-Staal, eds.), p. 205. Raven Press, New York, 1991. 138. D. N. Sengupta, B. Berkhout, A. Gatignol, A. Zhou and R. H. Silverman, PNAS 87,7492 (1990). 139. Y.-N. Chang, D. J. Kenan, J. 11. Keene, A. Gatignol and K.-T. Jeang, J. Virol. 68, 7008 (1994). 140. Y. V. Svitkin, K. Meerovitch, H. S. Lee, J. N. Dholakia, D. J. Kenan, V. I. Ago1 and N. Sonenberg, J. Viral. 68, 1544 (1994). 141. Y. V. Svitkin, A. Pause and N. Sonenberg, J. Virol. 68, 7001 (1994). 142. J. Pelletier and N. Sonenberg, Nature 334, 320 (1988). 143. C. Berlioz and J.-L. Darlix, J. Virol. 69, 2214 (1995). 144. J. Galabru and A. G. Hovanessian, JBC 262, 15538 (1987). 145. H. C. Schroder, D. Ugarkovic, R. Wenger, P. Reuter, T. Okamoto and W. E. 6 . Muller, AZDS Res. Hum. Retroviruses 6, 659 (1990). 146. R. K. Maitra, N. A. J. McMillan, S. Desai, J. McSwiggen, A. G. Hovanessian, G. Sen, B. R. G. Williams and R. H. Silverman, Virology 204, 823 (1994). 147. M. A. Minks, D. K. West, S. Benvin and C. Baglioni, JBC 254, 10180 (1979). 148. L. Manche, S. R. Green, C. Schmedt and M. B. Mathews, MCBiol 12, 5238 (1992). 149. R. Grantham and P. Perrin, Nature 319, 727 (1986).
34
BENJAMIN BERKHOUT
150. P. M. Sharp, Nature 324, 114 (1986). 151. J. Kypr and J. Mrizek, Nature 327, 20 (1987). 152. J. Kypr, J. Mrizek and J. Reich, BBA 1009, 280 (1989). 153. K.-C. Chou and C.-T. Zhang, AIDS Res. Hum. Retrooiruses 8 , 1967 (1992). 154. F. J. van Hemert and B. Berkhout, J. Mol. E d . 41, 132 (1995). 255. B. Berkhout and F. J. van Hernert, NARes 22, 1705 (1994). 156. S. Karlin, W. Doerfler and L. R. Cardon, J. Virol. 68, 2889 (1994). 157. E. Beutler, T. Gelbart, J. Han, J. A. Koziol and B. Beutler, PNAS 86, 192 (1989). 158. C. J. Decker and R. Parker, Trends Biochem. Sci. 19, 336 (1994). 159. D. Caput, B. Beutler, K. Hartog, S. Brown-Shimer and A. Cerarni, PNAS 83, 1670 (1986). 160. G. Shaw and R. Karnen, Cell 46, 659 (1986). 161. S. Ohno and T. Yorno, PNAS 87, 1218 (1990). 162. E. G. Shpaer and J. I. Mullins, NARes 18, 5793 (1990). 163. D. N. Cooper, Hum. Genet. 64, 315 (1983). 164. D. P. Bednarik, J. D. Mosca and N. B. K. Raj, J. Virol. 61, 1253 (1987). 165. C. Tuerk and L. Gold, Science 249, 505 (1990). 166. R. C. L. Olsthoorn, N . Licis and J. van Duin, E M B O J . 13, 2660 (1994). 167. D. Baltimore, Nature 335, 395 (1988). 168. G. J. Graham and J. J. Maio, PNAS 87, 5817 (1990). 269. A. Rhodes and W. James, J. Gen. Virol. 71, 1965 (1990). 170. N. Sarver, E. M. Cantin, P. S. Chang, J. A. Zaia, P. A. Ladne, D. A. Stephens and J. J. Rossi, Science 247, 1222 (1990). 171. G. Sczakiel, M. Pawlita and A. Kleinheinz, BBRC 169, 643 (1990). 172. B. A. Sullenger, H. F. Gallardo, G . E. Ungers and E. Gilboa, Cell 63, 601 (1990). 173. S. Joshi, A. van Brunschot, S. Asad, I. van der Elst, S. E. Read and A. Bernstein, J. Virol. 65, 5524 (1991). 174. K. Rittner and G. Sczakiel, NARes 19, 1421 (1991). 175. G. Sczakiel and M. Pawlita, J. Virol. 65, 468 (1991). 176. B. A. Sullenger, H. F. Gallardo, G. E. Ungers and E. Gilboa, J. Virol. 65, 6811 (1991). 177. M. Weerasinghe, S. E. Liem, S. Asad, S. E. Read and S. Joshi, J. Virol. 65, 5531 (1991). 178. B. Dropulic, N. H. Lin, M. A. Martin and K.-T. Jeang, J. Virol. 66, 1432 (1992). 179. F. Y. Tung and M. D. Daniel, Arch. Virol. 133, 407 (1993). 180. F. Lori, J. Lisziewicz, J. Srnythe, A. Cara, T. A. Bunnag, D. Curie1 and R. C. Gallo, Gene Ther. 1, 27 (1994). 180a. E. Gilboa and R. Smith, Trends Genet. 10, 139 (1994). 181. B. A. Sullenger and T. R. Cech, Science 262, 1566 (1993). 282. B. Berkhout and J. L. B. van Wamel, Antioiral Res. 26, 101 (1995).
High-MobiI ity-G roup Chromosomal Proteins: Arch itectura I Components That Facilitate Chromatin Function MICHAEL BUSTIN* AND
KAYMOND REEVES?
*Laboratory of Molecular Carcinogenesis National Cancer Institute National Institute of Health Bethesda, Maryland 20892 fDepartment of Biochemistry and Biophysics Department of Genetics and Cell Biology Washington State University Pullman, Washington 99164
I . The HMG-I/-2 and HMG-1 Box Proteins . . . . . . . . . . . . . . . . . . . . . . . . . A. Structure of the Proteins . . . . ........... B. Interactions with DNA and Ch ........... C. Cellular Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11. The HMG-I(Y) Family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Structure of the Proteins ............. ., 5. Interactions with DNA an ..................... C. Cellular Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111. The HMG-141-17 Family . . . . . . . . . . . . ................ A. Structure of the Proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. Interaction with DNA and Chromatin . . . . . . . . . . . . . . . . . C . Cellular Function and Mechanism of Action . . . . . . . . . . . . IV. Summary and Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . ..........
37 38
43 49 51 54 58 63 72 73 91 93
Precise interactions between proteins and DNA in chromatin facilitate the orderly progression of complex processes such as transcription, replication, recombination, and repair. Most of the studies on the structure and function of chromatin have focused on interactions occurring between histones and DNA (1-4). It is now clear that the chromatin fiber serves not only to package the DNA into the nucleus but also provides a means to control the accessibility of specific sequences to regulatory factors and to potentiate interactions between distant regulatory elements (5, 6). Thus, most of the Progress in Nucleic Acid Rwearch and Molecular tliolopy Val 54
35
Copyright D 1996 by A c a d ~ m i cPress. Inc. All rights of reproduction in any form reserved.
36
MICHAEL BUSTIN AND RAYMOND REEVES
cellular processes involving DNA have to be considered in the context of chromatin. From this point of view, nonhistone chromosomal proteins, which are either part of, or associated with, the chromatin fiber, provide an additional level of structural and functional complexity. The term “nonhistone” is applied to all the proteins that can be extracted from chromatin and are not histones. In the broadest sense this definition is problematic because it includes many molecular species and it is difficult to ascertain which of the proteins are bona fide chromosomal components and which are nucleoplasmic or cytoplasmic contaminants. Traditionally this term is applied to “structural proteins” and does not include such proteins as modifying enzymes or regulatory factors that affect transcription or replication. The high-mobility-group (HMG) proteins are among the largest and best characterized group of nonhistone chromosomal proteins. Members of this protein group are found in all the cells of higher eukaryotes. HMG proteins are defined as nuclear proteins that can be extracted from nuclei or chromatin with 0.35 M NaC1, are soluble in 5% perchloric or trichloroacetic acid, have a high content of charged amino acids, and have a molecular mass lower than 30,000 Da (7-9). Currently, the HMG proteins are grouped into three families: the HMG-11-2 family, the HMG-141-17 family, and the HMG-I(Y) family. Although the structure of the proteins is well defined their cellular function is not fully understood. Most of the data suggest that these proteins serve as “architectural” elements in chromatin. They are structural elements that bind to specific structures in DNA or in chromatin with little or no specificity for the target DNA sequence. They facilitate, rather than perform, a specific function in chromatin. For example, HMG-1 facilitates the binding of the progesterone receptor by inducing a structural change in the target DNA (10). HMG-14/-17 proteins facilitate transcription from chromatin templates but are not part of the transcription complex (11).HMG-I(Y) proteins modify the structure of the DNA to facilitate protein:protein interactions in the preinitiation complex formed in A-T-rich promoter/enhancer sequences of several genes (12-14). The purpose of this review is to summarize recent information on the function of the HMG proteins. Advances in this field were made primarily by elucidating the structure of the proteins and by understanding their mode of interactions with DNA and chromatin. Therefore, we concentrate mainly on these aspects of HMG proteins. Renewed interest in these proteins is due to the finding that the DNA-binding domains of many regulatory proteins share common elements with the HMG-1/-2 chromosomal protein family. Likewise, recent results with chromatin assembly systems provided evidence that HMG-14/-17 may indeed enhance the transcriptional potential of a chromatin template and that HMG-I/(Y) proteins facilitate protein interac-
37
HMG PROTEINS
tions in certain transcription preinitiation complexes. The scope of this review is limited; for a comprehensive background on the isolation and chemistry of the protein it is best to consult the book edited by Johns (8)as well as several other reviews (7, 9, 15-16a). Information pertaining to the expression of HMG proteins during the cell cycle and ddferentiation has been reviewed by Bustin et al. (15).Information pertaining to the HMG-1 domain proteins can be found in several reviews and articles (16-19). This review also presents information of the structure of the HMG-I(Y) gene and its alternative splicing. A full description of the genes coding for HMG-14/-17 proteins has already been presented elsewhere (7). For descriptions of the genes coding for the mammalian HMG-1/-2 proteins and their homologs in various species, it is necessary to consult original references (20-36). The review covers information available up to May, 1995. The limited scope of the review and the recent widespread activity in the field do not allow us to cite all the references in the HMG field. We do apologize to those whose work we may have inadvertently failed to mention.
1. The HMG-1/-2 and HMG-1 Box Proteins
-
Members of the HMG-1/-2 family are the largest ( M , 25,000) and most abundant (-1 molecule per 10-15 nucleosomes) of the “high-mobilitygroup” of DNA-binding proteins. Proteins in this family are highly conserved. For example the human HMG-1 (215 amino acids) and HMG-2 (209 amino acids) proteins are coded for by separate genes (21,22), but nevertheless share >82% amino-acid sequence identity. Related family members have also been identified in, and their cDNAs and genes cloned from, various other vertebrates (20, 23, 24, 36), insects (25-27), plants (28, 29), protozoans (30, 31), and yeast (32, 33). Although many functions have been proposed for the HMG-1/-2 proteins, their actual biological roles remain elusive (reviewed in 7). Nevertheless, their relative abundance, conservation between species, and apparent lack of sequence specificity of DNA binding suggests that in uivo the HMG-1 and -2 proteins probably perform some general function(s) in the cell, for example, as structural components of chromatin and/or as ancillary transcription factors. Renewed interest in this group of nonhistone proteins also steins from the recent discovery of a large and highly diverse group of additional DNA-binding proteins, the so-called “HMG-1 box” family, related to the HMG-1 and -2 chromatin proteins by virtue of shared sequence homologies in their respective DNA-binding domains (reviewed in 16, 19, 37; see below).
38
MICHAEL BUSTIN AND RAYMOND REEVES
A. Structure of the Proteins The HMG-1/-2 proteins have a tripartite structure (7, 38) originally defined by limited proteolysis under high ionic strength “structuring conditions” (39, 40). The evolutionarily conserved N-terminal A domain and the central B domain, each of -80-90 residues, are internal repeats of similar amino acid sequence (-43% homologous), are extremely basic (net charge -+go), and constitute the nonspecific DNA-binding regions of the protein (41). The highly acidic C-terminal domain contains -30 consecutive aspartate or glutamate residues and is involved in interactions with other proteins, particularly histones (40, 42-45), as well as functioning to regulate DNAbinding affinity of the HMG-1/-2 proteins (46). Regions of -70-80 amino acid residues homologous to the A and B domains of HMG-1 [the so-called “HMG box” (47) or, more appropriately, the “HMG-1 box’’ motif (19)] have been observed in numerous other proteins, many of which are gene transcription factors (reviewed in 16, 19, 37). The HMG-1 box superfamily, with animal, plant, and yeast members, is of ancient evolutionary origin (dating back at least 109 years) (48) and contains both sequence-specific DNA-binding proteins and proteins that bind to DNA without sequence specificity. Analysis of the alignments of a large number of proteins has defined the following distinct amino-acid sequence motif as a “signature” for the HMG-1 box DNA-binding domain (19): (G,S ,A) (Y, F)* * (Y, F ,W)*(G,S ,A) * * (W,Y, F) .* * -..(K, R, Q ,) * - (Y, F, W) * * ....* (K, R, Q) * (Y,F, H)* ...* * (Y, F,W) 9
The most noticeable characteristic of this motif (the parentheses enclose equivalent residues) is the conservation of the position and spacing of the hydrophobic aromatic tyrosine (Y), tryptophan (W), and phenylalanine (F). The asterisk indicates that spacing is not fixed. Phylogenetic analysis (48) distinguishes two subgroups of proteins containing the HMG-1 box motif; one subgroup, including the HMG-1/-2 proteins, as well as the nucleolar HNA polymerase I transcription factor known as UBF (47)and the mitochondria1 transcription factors intTF (49)and ABF2 (33), contains two or more HMG-1 boxes. The other subgroup, as exemplified by the mammalian testes-determining factor SRY (50, 51), the lymphoid enhancer binding factor LEF-1/TCF-la (52, 53),the yeast nonhistone proteins NHP6A/B (54, 54a), a structure-specific recognition protein that binds cisplatin-modified DNA (SS), a component of the V-(D)-Jrecombinase, T160 (S6), as well as numerous other known or suspected transcription factors, many of which are involved in mating-type determinations and sexual development (16, 19, 37), typically contains a single box embedded in a
39
HMG PROTEINS
larger protein. Outside of the signature motifs, these different HMG-1 boxcontaining proteins usually have little or no sequence homologies.
1. STRUCTUREOF
THE
HMG-1 Box
The tertiary structure of one HMG-1 box domain, the box B of mainmalian HMG-1 proteins, has been determined independently by two different groups using 2D 'H NMR and 3D 15N-lH NMR solution spectroscopic techniques (57, 58). More recently, the solution NMR structures of the HMG-1 boxes from the Drosophilia HMG-D protein (59) and the human testesdetermining protein (hSRY-HMG) (60)have also been established. Although there are minor differences in detail, the structures of all of these HMG-1 boxes from both the mammalian and insect proteins are remarkably similar. Figure 1 shows a schematic representation of a coordinate-averaged three-dimensional structure of the rat HMG-1 box B (57).As illustrated, the HMG-1 box is composed of three a-helices and an extended N-terminal peptide segment that have an unusual twisted L (57),or V (58)shape consisting of two arms, one shorter (-31 A) than the other (-36 A), with an angle at the apex between the arms of -70-80". The shorter arm of this boomerangshaped structure consists of helices I and I1 and the longer arm is composed of the extended N-terminal region packed against helix III. The relative
helix Ill
helix II FIG.1. Schematic representation of the three-dimensional structure of the B-domain box of rat HMG-I as determined by solution NMR [redrawn with modifications from Weir et al. (57j1. The extended segment with its highly conserved amino-acid core sequence of P7-K8-R9P10 is proposed to be the region of the hox that binds to the minor groove of DNA (see Section LA1).
40
MICHAEL BUSTIN AND RAYMOND REEVES
positions of the two arms are maintained by a cluster of conserved hydrophobic amino acid residues at the apex of the V. Thus, the apex of the fold contains the hydrophobic core around which the three helices are arranged. In addition, conserved hydrophobic residues stabilize the intersection of helices I and 11. Helices I1 and 111 together with the extended N-terminal region lie approximately in a plane, forming a rather flat surface to one side of the domain, with helix I protruding from the opposite side. The first 12 residues of the HMG-1 box (employing the numbering system of 57) are in an extended configuration lying antiparallel to helix 111, such that the N-terminus of the box and C-terminus of helix I11 lie close together and are stabilized by interactions of hydrophobic residues on the inner amphipathic face of helix 111 with three proline residues of the extended N-terminal region. This stable structural element, composed of the extended segment and part of helix 111, has been called the “terminal unit” (58) and forms the long arm of the L-shaped box. Outside of the extended N-terminal peptide region of the HMG-1 box, which has a remarkable sequence and structural similarity to the extended DNA-binding domain of the HMG-I(Y) proteins (61; see Section III,A), there is no discernible relationship between the HMG-1 box and other previously described DNA-binding structural folds, such as those found in the helix-turn-helix proteins. In toto, the highly conserved HMG-1 box appears to be a novel DNA-binding motif (57, 58). Its three-dimensional configuration (Fig. 1) provides an explanation for most of the sequence identities and homologies found conserved in various HMG-1 box proteins. For example, several of the highly conserved “signature” amino acids (19)are internal hydrophobic residues important for maintaining the integrity of the folds and arms of the box structure (57, 58, 62). Furthermore, the conservation of basic and acidic residues in different HMG boxes (16, 19, 37) suggests that common surface features, such as asymmetric charge distributions, are functionally important. For example, most of the positively charged basic residues are on, or close to, the concave surface formed between the two arms of the box (Fig. l), including both the extended N terminus and part of helix I, suggesting to early workers (57, 58) that this was the region involved in DNA binding. As shown in Fig. 2, this prediction has been confirmed (60) with the determination of the threedimensional solution structure of the hSRY-HMG box DNA cocomplex (see Section 1,B). These and other observations have lead to the notion that HMG-1 box structure is conserved to a greater extent that amino-acid sequence. The overall validity of this idea is also attested to by the findings from recent homology model-building experiments in which a large number of HMG-1 box sequences were “threaded through the solution-NMR structure of the rat HMG-1 B box (62). These model-building studies indicated
HMG PROTEINS
41
that whereas the HMG-1 box does not have rigid sequence requirements for its formation, its overall tertiary domain structure is highly conserved and can be used as a basis for establishing phylogenetic relationships between HMG box protein family members in the absence of statistically significant sequence similarities (62).
2. HMG-1 Box BINDING AND SPECIFICITY The selectivity and specificity of binding by different types of HMG-1 box proteins to linear B-form DNA varies considerably. Binding of the mammalian HMG-1 and -2 and the yeast NHP6A/B chromatic proteins, for example, seems for the most part to be indifferent to DNA sequence. On the other hand, the binding of other types of HMG-1 box proteins, such as the nucleolar transcription factor UBF and the mitochondria1 transcription factors mtTF and ABFB, produces specific DNA footprints but the protected sites do not have a recognizable consensus sequence (49, 63-65). In contrast, the class of HMG-1 box-containing “specific transcription factors,” such as SRY, LEF-UTCF-la, and others, as well as the T160 V-(D)-J recombinase, produce specific footprints on DNA spanning sequegces with a recognizable consensus (reviewed in 16, 19). In general, all of the specific binding sites for the HMG-1 box-containing transcription factors are A-T-rich and the same sequences are often recognized by several different proteins within a related group of factors. Compared to classical transcription factors, the sequence specificity of the HMG-1 box-containing transcription factors is fairly low (66).However, the mere fact that they do possess the ability to recognize and bind to specific DNA sequences is remarkable for several reasons. First of all, methylation-interference, base-substitution, diethyl-pyrocarbonate protection, and hydroxyl-radical cleavage experiments indicate that all HMG-1 boxes interact with DNA primarily through contacts with the minor groove on one side of the duplex (67-69). Except for the well known case of the TBP protein binding to the TATA element (70, 71), such a mode of interaction is unusual for sequence-specific DNA binding proteins because the minor groove provides little opportunity for base-specific contacts and hydrogen bonding cannot distinguish T from C residues (72)or A-T from T.A base pairs (73).These physical limitations on specific protein/DNA interactions in the minor groove are thus probably responsible for the modest sequence selectivity of the HMG-1 box transcription factors. Nevertheless, as will be seen below, hydrogen bonding in the minor groove is well suited for structure-directed recognition because the phosphates on either side of the groove are often spaced at favorable distances for selective interactions. A second reason that the sequence-recognizing ability of the HMG-1 box transcription factors comes somewhat as a surprise is that, as noted above,
42
MICHAEL BUSTIN AND RAYMOND REEVES
the tertiary structure of the DNA-binding domains of all of the HMG-1 box proteins so far investigated are nearly identical (Figs. 1 and 2). Thus, the physical basis for this sequence selectivity must reside in subtleties of either the domain structure itself and/or differences in particular amino acid residues that interact with DNA. In this connection, the long arm (i.e., the terminal unit) of the HMG-1 box has been directly implicated in sequencespecific recognition. In a series of domain-swapping experiments, CraneRobinson and colleagues (74) switched the long and short arms of the sequence-specific HMG box of TCF-la into the equivalent positions in the non-sequence-specific B box of HMG-1, and demonstrated that only chimeric proteins that contained the long arm of the TCF-la protein (i.e., the “extended 12 amino-terminal residues and the last 25 C-terminal residues of helix 111; Fig. 1) formed a sequence-specific complex with DNA. These experiments also clearly demonstrated the additional point that not all HMG-1 boxes are equivalent or interchangeable. The results of these domain-swapping results are also entirely consistent with earlier reports showing that certain of the highly conserved amino acids in the first 12 residues of the extended N-terminal region of the box (numbering system of 57) are directly involved in DNA binding because mutations of these residues in the HMG-1 boxes of SRY (69, 75) and LEF-1 (68) significantly reduce, or abolish, binding without obviously interfering with the structural protein folding interactions of the box. In particular, the three mutations, V7L, R9G, and M111, in SRY that result in sex reversal, and the double mutant K8E and K9E in LEF-1, all fail to bind DNA. Furthermore, a clear distinction has now emerged between sequencespecific and nonspecific HMG boxes in the extended N-terminal segment at positions 7 and 12 (74). The residue at position 7 is proline in all nonsequence-specific boxes, whereas in sequence-specific boxes a hydrophobic residue (valine or isoleucine) is common. The hydrophobic residue at position 7 could be involved in sequence recognition whereas a conserved proline at this position would be expected to have relaxed sequence dependence (61, 74). All presently known sequence-specific HMG boxes also have an asparagine residue at position 12 (Asn-12)whereas a serine at this position is typical for non-sequence-specific boxes. Because the hydrogen-bonding potential of asparagine residues for base recognition is well established (76), substitution of this amino acid at position 12 could reasonably be expected to contribute to altered HMG box sequence specificities. As illustrated in Fig. 2, and consistent with these predictions, Clore and colleagues (60),in their determination of the structure of the cocomplex of hSRY-HMG with DNA, identified seven different amino-acid residues (among them Asn-12, or, in their designation, Nlo) distributed along the entire binding surface of the box
HMG PROTEINS
43
that make direct contacts to particular bases and hence would be expected to mediate sequence specificity.
B. Interactions with DNA and Chromatin 1. HMG Box PROTEINSRECOGNIZE BENT AND DISTORTED DNAS The HMG-1 and -2 proteins have long been known to bind nonspecifically to both double- and single-stranded DNAs, with a marked preference for the latter. Additionally, they can unwind and introduce supercoils in plasmid DNAs, can preferentially bind to cruciform structures as well as to B-Z DNA junctions and also apparently possess the ability to distinguish between different conformations of single-stranded molecules (reviewed in 7). Recent results indicate that the HMG-1I-2 proteins (77, 78), the sequence-specific HMG box-containing SRY protein (66), and the HMG-1 boxes from a number of other proteins recognize the sharp angles present in synthetic four-way junction (4WJ) DNA molecules. In fact, it now appears likely that 4WJs are the universal target for all HMG box proteins (79-82). The physical basis for this specific structural recognition remains a matter of speculation because neither the actual structure of 4WJs nor the mode of interaction of HMG-1 boxes with these structures is currently known. Nevertheless, models have been proposed (58, 83) suggesting that the terminal unit (i.e., the long arm) of the HMG box interacts with the minor groove in the two acute angles of such structures. Indirect support for this model comes from recent hydroxyl radical footprinting experiments that show that the bacterial HU protein preferentially and symmetrically binds to the two acute angles of4WJ DNA (84).Because, in many respects, HU, a homolog of the bacterial I H F protein, is similar to HMG-1 in its ability to interact with and bend DNA and, in fact, can actually replace the HMG-1 protein in certain functional assays (79, 85), these footprinting results suggest that the HMG-1I-2 proteins may interact with 4WJ DNA in a similar fashion. The inherent capacity of HMG-11-2 proteins, and the isolated HMG boxes, to bind to already bent or distorted DNA (78,86)is also attested to by their ability to bind to both the major l,%intrastrand d(GpG) and to the minor 1,Sintrastrand d(GpTpG), DNA adducts of the antitumor drug cisplatin (87-89). These adducts are known to bend DNA by -32-34" (90). Analogous to the situation with 4WJ DNAs, isolated HMG box domains can also preferentially bind to cisplatin-modified DNAs, and DNase I footprinting indicates that both strands of the DNA around the adduct are bound by the box peptides (91). In normal cells, both the major and minor cisplatin DNA adducts are thought to be repaired in uivo by the human excision nuclease system (92).The biological significance of HMG-1I-2, or HMG box
44
MICHAEL BUSTIN AND RAYMOND REEVES
protein, binding to cisplatin adducts is not known, but in vitro HMG-1 binding to such adducts inhibits repair of the major intrastrand cross-linked products by the human excision nuclease system, suggesting that the types and levels of HMG-domain proteins in a tumor may influence the responsiveness of that cancer to chemotherapy (92).Alternatively, cisplatin adducts may function by nonspecifically trapping or "hijacking" essential HMG boxcontaining regulatory proteins, such as the ribosomal gene transcription factor hUBF (93), thereby leading to cellular toxicity.
2. HMG Box PROTEINSBEND,LOOP,AND SUPERCOIL DNAs In addition to recognizing bent DNA, both sequence-specific and nonsequence-specific HMG box proteins are capable of inducing bends in DNA (16, 60, 86). In the case of sequence-specific HMG box proteins this has generally been established from circular permutation assays, and bend angles of -130" for LEF-l(68,94,95)and -85" for mouse and human SRY (66) have been reported. In the case of non-sequence-specific proteins such as the mammalian HMG-1/-2 proteins and the yeast NHPGA/B nonhistone protein (96, 97), DNA bending to varying degrees has usually been demonstrated both by permutation assays (98)and by ring closure, or circularization assays (10, 96, 99) with the best bending results being obtained with reduced "native" proteins that have never been denatured or exposed to acids (10, 100, 101). The most definitive information so far available on the molecular mechanisms involved in HMG-1 box-induced DNA bending comes, however, from solution NMR studies of a complex of the human SRY-HMG box with its specific recognition sequence (60, 102, 103). As shown by several views of the hSRY-HMG-DNA cocomplex illustrated in Fig. 2, on binding to the minor groove of its recognition sequence the hSRY-HMG box induces a large conformational change in the duplex DNA from a B type in the free state to a markedly bent and underwound form that follows the contours of the concave binding surface of the box perfectly. Hence, this protein-DNA interaction represents a classical example of induced fit. The DNA in the complex is bent by -70-80" in the direction of the major groove, which is accomplished by induction of large positive local interbase pair role angles for six of the seven base steps present in the octamer substrate. In addition, the DNA is also severely underwound (with an average interbase pair helical twist of -26") and, as a result, the minor groove is shallow and significantly expanded, with a width of -9.4 A compared with -4.0 bi in B-DNA. Concomitantly, the major groove is substantially compressed. As originally predicted (102), a principal factor in the bending of the DNA is the partial intercalation of an isoleucine residue (113) between base pairs near the center of the DNA substrate. Widening of the
HMG PROTEINS
45
minor groove appears to be mediated by five amino acid residues that form a T-shaped wedge in direct contact with the central base pairs of the DNA octamer. The overall structure of the hSRY-HMG-DNA cocomplex with its widened minor groove and DNA bent toward the major groove is strongly reminiscent of the structure of another minor groove binding protein, TBP, the TATA box binding protein, in complex with its DNA substrate (70, 71). Although the molecular mechanisms involved in the formation of these two types of complexes are quite different, what is clear is that very different protein binding surfaces placed within a widened minor groove can bend and unwind DNA in a similar manner. In contrast to these examples, the means by which non-sequence-specific HMG-1 box proteins induce bends in DNA are unknown. Such bending may involve a combination of several of the above mechanisms and may also include others, such as asymmetric DNA charge neutralizations (104). In addition to their capacity to induce DNA bending, both HMG-1/-2 (7, 46, 105-107) and the non-sequence-specific HMG box proteins, such as the ribosomal gene transcription factor UBF (108-110), have the potential to induce (in the presence of topoisomerase I) supercoils in topologically closed domains of DNA. Furthermore, these proteins also can introduce loops in either linear DNAs or relaxed circular plasmids in the absence of other factors. The efficiency of HMG-1 protein-induced looping and supercoiling is modulated by its acidic C-terminal domain with a four- to fivefold reduction in both DNA binding affinity and supercoiling ability when the tail is present (46, 105). The ability of HMG-1 box proteins to bend and modulate the topological configuration of DNA substrates has led to the idea that the HMG box is an all-purpose “DNA benderiwrapperllooper” domain (81, 82) that in many ways acts like a eukaryotic equivalent of the bacterial I H F and HU proteins (which also have these capabilities) and has therefore been recruited by different proteins in order to facilitate a variety of DNA biological functions, including transcription, repair, and recombination (see Section I, C). In considering the probable biological validity of such a proposed in vivo function for the HMG-1 box, it should also be kept in mind that superimposed on these manipulative abilities for DNA substrates is an even more fundamental ability of HMG-1 boxes: namely, their generalized capacity to recognize and bind tightly to altered DNA conformations, such as intrinsically bent or underwound structures, stem-loops, 4WJs, and cisplatin adducts, regardless of their nucleotide sequences. Importantly, as noted above, in most instances the HMG-1 box proteins actually possess considerably greater in vivo binding aflinities for such distorted DNA structures than they do for normal B-form DNA (66, 79, 83, 111). For instance, the sequence-specific SRY protein has about the same
46
MICHAEL BUSTIN AND RAYMOND REEVES
-.,--
Sequence Dependent
n
I I
Induced Bending
h
FIG. 3. Diagram depicting the various functional capacities of an individual HMG-1 box with respect to DNA structure recognition and bending in oitro. As shown by the large arrows, HMG boxes have an inherent ability to nonspecifically, yet very tightly, bind to altered DNA structures such as those that are either intrinsically bent, undenvound, or adducted by cisplatin (pathway 1)or structures formed by four-way junctions, cruciforms, or DNA cross-overs (pathway 2). Pathway 3 indicates that HMG boxes also have the ability to nonspecifically bind to B-form DNA and induce bends, but, as shown by the smaller arrow, such binding is of mnsiderably lower affinity than that observed for binding to previously distorted structures. Pathway 4 indicates binding to B-form DNA of sequence-specific HMG box transcription factors with a subsequent introduction of a bend in the substrate. As in the case with the binding shown in pathway 3, the smaller arrow in pathway 4 indicates that the afhity of binding of sequencespecific HMG boxes to linear B-form DNA is often less than that observed for binding of the same box to DNA that is already intrinsically bent or distorted. The solid boxes in pathway 4 indicate the defined sequence binding sites on the DNA.
nonspecific binding affinity for 4WJ DNAs as does the HMG-1 protein (with Kd values between 10-8 and 10-9 M) and this &nity is even greater than the affinity of the SRY protein for its normal recognition sequence in B-form DNA (66).Thus, as depicted in Fig. 3, the HMG-1 box proteins in vivo are likely to bind selectively to previously bent or altered DNA structures in preference to B-forms of DNA and therefore, by inference, to favor selectively DNA structural recognition and/or stabilization over induction of DNA bending. As first suggested (66), such an in vivo situation for HMG-1
47
HMG PROTEINS
-
Sequence-Specific
-
Competition for Non-specific
Binding to Bent DNA
recognition
Sequence-specific DNA binding protein
Non-specific DNA binding protein
FIG. 4. Competition between sequence-specific and conformation-specific DNA binding by an HMG-1 box-containing transcription factor [redrawn with modifications from Landsman and Bustin (19) and based on an original model by Ferrari et al. (66)]. A sequence-specific HMG-1 protein can bind to linear B-form DNA containing its recognition sequence (filled box) and introduce a bend or conformational change in the target DNA. The same HMG box can also, and often with higher a n i t y , bind nonspecifically to previously bent or distorted DNA. Thus, when both types of DNA are present in a given reaction there is a competition, based on their relative binding a n i t i e s , between the specific sequence-containing DNA and the nonsequence-specific DNA for binding by the sequence-recognizing HMG box protein. Nonspecific HMG box proteins such as HMG-1 and HMG-2 also recognize and hind to bent DNA with high a n i t y . The cellular concentrations of the latter proteins are orders of magnitude higher than that of the sequence-specific HMG box proteins. Therefore, the nonspecific HMG box proteins will inhibit the binding of the sequence-specific proteins to bent DNA and thus facilite preferential binding to their recognition sequences on linear B-form molecules.
box-containing transcription factors (for example, the male sex-determining protein SRY) could potentially have disastrous biological and/or developmental consequences because the effective cellular concentrations of essential sequence-specific proteins could be substantially reduced by their nonspecific trapping by bent or -distorted DNA structures that transiently exist in cells for a variety of reasons. As illustrated in Fig. 4, it has been suggested (19, 66) that one, but
48
MICHAEL BUSTIN AND RAYMOND REEVES
obviously not the only, possible function for the existence nonspecific HMG box proteins in cells is to provide a biological solution to this differential substrate competition problem. Because the normal concentration of nonspecific DNA-binding HMG box proteins, such as HMG-1 and -2, in cells is orders of magnitude higher than that of sequence-specific HMG box transcription factors, these nonspecific proteins would be expected to saturate preferentially the multitude of nonspecific DNA-binding sites, thereby ensuring that the concentrations of these sequence-specific proteins remain high enough to successfully find their targets.
3. INTERACTION OF HMG-11-2 PROTEINS WITH CHROMATIN Contradictory results have often clouded attempts to elucidate the role, if any, played by HMG-1I-2 proteins in the regulation of chromatin structure. For example, there are early conflicting reports (reviewed in 7) of both the mediation and the repression of in vitro nucleosome assembly by HMG-1I-2 proteins. However, early studies did clearly demonstrate that, in viuo, HMG-1I-2 proteins, like the lysine-rich histones H1 and H5, are bound preferentially to linker DNA between adjacent nucleosomes in the bulk of eukaryotic chromatin (reviewed in 1, 2). Nevertheless, mononucleosomes can be isolated from a subfraction of total chromatin that contains near stoichiometric amounts of HMG-1/-2 proteins, but lacks histone H1, suggesting that a major function of HMG-1 and -2 is to replace H1 in restricted linker regions so as to promote the accessibility of local chromatin domains (112,113), presumably those involved in transcription. Recent investigations reveal that whereas H 1 is a repressor of transcription (reviewed in 114),the HMG-1/-2 proteins appear to be general chromatin factors that can either stimulate (10,115-119) or reversibly repress (120, 121) polymerase I1 transcription, depending on the experimental conditions (see Section 1,C). The molecular mechanisms by which these two classes of highly basic nuclear proteins either repress or activate transcription in uivo are not known. Nevertheless, recent findings provide some novel support for the long-held suspicion that HMG-1I-2 proteins compete with histone H1 for binding to localized regions of chromatin, thereby potentially affecting their functional activity. There is general agreement that the linker histones H1 and H5 interact with DNA at the cross-over point where it enters and exits the nucleosome (1, 122). Furthermore, like histone H1, HMG-1 induces a chromatosome stop in reconstituted chromatin digested with micrococcal nuclease (36), suggesting that both classes of proteins bind to similar regions on the front face of nucleosome particles. Both H1 and H5 bind to cross-overs of doublestranded DNA (123), as well as to synthetic four-way junctions (124, 125) that
49
HMG PROTEINS
structurally mimic cross-overs (86, 126), in preference to regions of linear double-stranded DNA. Furthermore, the same workers demonstrated that HMG-1 can compete effectively with H1, but not histone H5, for binding to 4WJs in uitro, suggesting that replacement of histone H1 by HMG-1 may play a part in the putative transcriptional activation of chromatin by HMG-1 (127). Although of considerable intrinsic interest, and of possible heuristic value, the biological relevance of these in uitro observations remains to be confirmed because there is still no direct evidence that either cross-over DNAs (86) or 4WJs (126) do, indeed, mimic the structure of DNA on the front face of nucleosomes and, so far at least, all of the evidence relating the preferential association of HMG-1/-2 proteins with transcriptionally active regions of chromatin in uivo is of a correlative nature (reviewed in 128).
C. Cellular
Functions
Although the cellular function of several HMG-1 box-containing transcription factors has been firmly established, the in uivo roles played by the HMG-1/-2 proteins are less clear owing to often conflicting in vitro experimental results (reviewed in 7). Nevertheless, numerous lines of evidence suggest that the HMG-1/-2 proteins participate in the regulation of chromatin structure as well as being involved, either as positive or negative factors, with various aspects of DNA replication, transcription, and repair. As previously noted, perhaps the most widely accepted function for the HMG-I/-2 proteins is their ability to bind preferentially to, as well as induce, bent or distorted DNA structures and to facilitate the formation of supercoils and loops in topologically restricted DNA domains. This ability of HMG box proteins to recognize and modulate DNA structure, as well as participate in specific protein-protein interactions, has led to their designation as “architectural transcription” factors (reviewed in 16), implying that they are involved in the formation of stereospecific nucleoprotein complexes involved in gene transcriptional activation (16, 95, 129), although this is not necessarily always the case because these same capabilities can just as easily be employed to regulate other aspects of nuclear DNA structure and function (96, 97, 130). The uncertainties surrounding the biological role of the HMG-1/-2 proteins are well illustrated by the continuing controversy over the role played by these nonspecific DNA-binding proteins in regulating transcription. Early reports indicated that HMG-l/-2 proteins could significantly stimulate specific in uitro transcription from the adenovirus major-late promoter in HeLa cell lysates (116)and suggested that this effect is caused, in part, by an HMG-mediated increase in the rate of binding of a viral transcription factor (MLTF or USF) to a 5’ upstream promoter element (115, 117).
50
MICHAEL BUSTIN AND RAYMOND REEVES
More recently, HMG-11-2 proteins have likewise been reported to stimulate the in uitro transcription of a number of other nonviral genes (131, 132), possibly by acting to stabilize an activated conformation of the transcription factor TFIID-TFIIA initiation complex (133) on the promoters of such genes. HMG-1 and -2 also appear significantly and specifically to stimulate the binding of other nonviral transcription factors to their cognate promoter/enhancer sequences (10, 118, 134). For example, in uitro HMG-1 stimulates by over 10-fold the sequence-specific binding of a complex of purified human progesterone/progesterone receptor proteins to oligonucleotides containing progesterone-response elements (PREs), most likely as a consequence of HMG-induced bending of the PRE-containing DNA substrates (10).In addition, HMG-2 specifically interacts in vitro with the POU domains of the octamer transcription factors Octl and Oct2, thereby increasing the sequence-specific DNA binding of these proteins (135). Perhaps more importantly, the results of cell transfection experiments involving an octamer-reporter gene construct cotransfected with either an antisense HMG-2 expression vector or a vector expressing a VP-16/HMG-2 chimeric protein also strongly suggest that the Oct and HMG-2 proteins physically interact with each other in viuo and thereby stimulate octamerdependent gene transcriptional activity (135). In contrast to these findings, purified HMG-1/-2 proteins repress transcription in uitro by RNA polymerase I1 (Pol-11) as a consequence of specifically interacting with components of the basal transcription initiation complex at two different steps in its formation. At the initial stages of initiation complex formation HMG-1 can interact with the TATA-binding protein (TBP) in the presence of a TATA-box-containing oligonucleotide to form a specific HMG-1.TBP.promoter complex (120).This quaternary complex prevents factor TFIIB from binding to TBP and, consequently, blocks both formation of the preinitiation complex and in vitro transcription from the substrate DNA. Furthermore, transcription factor TFIIA can, in a concentration dependent manner, compete with HMG-1 for TBP binding and thus reverse the HMG-mediated in uitro repression of Pol-I1 basal transcription. In addition, purified HMG-2 proteins inhibit basal transcription by binding later in the assembly process after the assembly of the TBP*TFII.promoter complex but before formation of the fourth phophodiester bond by Pol-I1 (121). Interestingly, this basal repression of transcription by HMG-2 can be counteracted in an ATP-dependent process that is mediated by a TFIIHassociated factor, possibly a helicase. In viuo experiments have also resulted in apparently conflicting effects of the HMG-11-2 proteins on transcription. For example, two types of cell transfection experiments indicate that HMG-1 proteins can stimulate transcription in vivo (119). In one type of experiment, HMG-1 protein intro-
51
HMG PROTEINS
duced into COS-1 cells as a complex with an expression plasmid carrying the bacterial lac2 gene was found to enhance the level of reporter gene expression. In the second type of experiment, cells were cotransfected with an expression carrying the HMG-1 cDNA and the lac2 gene reporter plasmid and, again, the transcriptional activity from the reporter plasmid was enhanced. Significantly, in these cotransfection experiments the acidic C-terminal region of the HMG-1 protein was essential for the observed enhancement of reporter gene expression, suggesting that this region of the protein acts as a transcriptional activator (119). Furthermore, overexpression of HMG-1 (but not HMG-2) protein in cells stably transfected with cDNA-expressing bovine papilloma virus vectors leads to increased expression of reporter genes transfected into these cells as well as a loosening or “relaxation” of the chromatin structure of the minichromosomes derived from the transfected reporter gene plasmids (136). Nevertheless, these in uiuo results obtained with mammalian cells stand in marked contrast to the situation in yeast cells where the C-terminal end of the mammalian HMG-1 protein has been demonstrated not to act as a transcriptional activator (137), suggesting that the acidic terminal region of this protein probably functions in a different manner in these highly divergen t organisms.
II. The HMG-I(Y) Family The mammalian HMG-I(Y)protein family consists of three members (Fig.
5):the isoform proteins HMG-I [also called 6,4a-protein (138,140-143)]and HMG-Y (142, 144) and the closely related protein HMGI-C (145, 146). Complementary DNA clones have been isolated for the mouse (144) and human (142, 147) HMG-I and -Y proteins, as well as for mouse (145) and human (146)HMGI-C. The HMG-I (107amino acids; -11.9 kDa) and HMG-Y
-
(96 amino acids; 10.6 kDa) proteins are identical in sequence except for an 11-amino-acid internal deletion in the latter and are produced by alternative splicing (142, 144) of transcripts from a single gene (148)(Fig. 6). The HMGIC protein (109 amino acids; 12 kDa) has high amino-acid-sequence homology (-50% overall) with the HMG-I and HMG-Y proteins, has the internal deletion of 11amino acids characteristic of HMG-Y (Fig. 5), but is the product of a separate gene (145, 146, 148). In viuo, members of the HMG-I(Y) family exhibit considerable additional heterogeneity as a result of secondary biochemical modifications (143, 1 4 4 , certain of which (for example, reversible phosphorylations) (150-154) are cell cycle regulated (see Section 11,B,3). The human HMG-I(Y) gene (Fig. 6) is located on the short arm of chro) a region involved in rearrangements, translocations, mosome 6 (at 6 ~ 2 1in
-
52
hu hu mu hu
MICHAEL BUSTIN AND RAYMOND REEVES
HMG-I HMG-Y *
1 1 1 1
HMG-Y+ HMGI-C
(M)SESSSKSSQPLASKQEKDGT (M)SESSSKSSQPLASKQEKDGT (M)SESGSKSSQPLASKQEKDGT
EKRGRGRPRKQPP EKRGRGRPRKQPP EKRGRGRPRKQPP (M)SARGEGAGQPSTSAQGQPAAPAPQKRGRGRPRXQQQ
I- BD
+
II--I
35 VSPGTALVGSQKEPSEVPTPKRPRGRPKGSKNKGAAKT RKTTT 35 KEPSEVPTPKRPRGRPKGSKNKGAAKT RKTTT 35 KEPSEVPTPKRPRGRPKGSKNKGAAKT RKVTT 38 EPTGEPSPKRPRGRPKGSKNKSPSKAAQKKAEA
........... ........... ............ I- BD
+
34 34
34 37
77 66
66 70
1 1- 1
LEK EEEEGISQESSEEEQ 67 TPGRKPRGRPKK LEK EEEEGISQESSEEEQ 66 APGRKPRGRPKK LEK EEEEGISQESSEEEQ 71 TGEKRPRGRPRKWPQQWQKKPAQEETEETSSQESAEED 78 TPGRKPRGRPKK
107 96
96
109
+
A*T-DNA Binding Domain Consensus: TP-KRPRGRPKK (the A - THook Motif) FIG. 5. Comparison of the amino-acid sequences of members of the mammalian HMG-I(Y) family of nonhistone chromatin proteins. The human (*, 142, 148) and mouse (+, 144) HMG-I and HMG-Y are isoform proteins produced by alternative mRNA splicing from a single gene, whereas the closely related human HMGI-C protein p, 146) is the product of a separate gene. Both the HMG-Y and the HMGI-C proteins are missing an internal stretch of 11-12 amino acid residues (....-.) that is present in the HMG-I protein. The DNA-binding domains (BD-I, -11, and -II), also called the A.T-hooks (61), of the HMG-I and HMG-Y proteins are indicated, as is the “consensus” amino-acid sequence for these motifs. The amino-acid sequences of the DNAbindings domains of the HMGI-C protein are quite similar to corresponding regions of the HMG-I and HMG-Y proteins, but these proteins diverge considerably elsewhere in their sequences, hence the necessity of introducing blank “gaps” to facilitate comparisons of maximal amino-acid similarities. The diamonds (+) indicate the sites of in v i m phosphorylation of the human HMG-I and HMG-Y proteins by cdc2 kinase (151,152);the double circles (00)indicate the sites of in uitro phosphorylation by casein kinase 11.
and other abnormalities correlated with a number of human cancers (148).In the mouse the cognate gene, Hmgi, is located in the t-complex region of chromosome 17 in an area containing a number of genes that, when mutated,
I II 11'
111
IV
1 -'ID V V
VI
VII
Vlll
Untranslated cDNA
HMQ-Y
llntranslated ORF -
Proteln Coding 33 mer,
splicing
FIG. 6. Diagram of the human HMG-I(Y) gene showing patterns of transcript and alternative splicing [redrawn, by permission of Oxford University Press, from Friedrnann et d.(148)with modifications]. The human gene is longer than 10 kb and contains eight exons (Roman numerals I-VIII) and seven introns (numbers 1-7). Curved arrows show the four different in uioo start sites (labeled 1A-IOA, 2B-7C, 6 A and 11D)for transcription, and the solid lines connecting the various exons indicate different alternative splicing patterns that result in the production of different mRNA species, including those coding for the HMG-I and HMG-Y isoform proteins. Note that the three independent DNA-binding domains ofthe HMC-I(Y) proteins (BD-I, -2, and -3) are located on different exons.
54
MICHAEL BUSTIN AND RAYMOND REEVES
cause embryonic lethality, suggesting that Hmgi is a good candidate locus for embryonic lethal mutations (155).In contrast, studies of transgenic insertional mutations in mice have localized the HMGI-C gene to the pygmy (or “minimouse”) locus on chromosome 10 (156-158). Because the pygmy phenotype does not result from lack of growth hormone or its receptor, it seems likely that this growth defect is due to a reduced response to an embryonic growth factor such as IGF-1. This observation therefore suggests that HMGI-C may either be involved in the regulation of genes activated by embryonic growth factors and/or be specifically responsive to such factors
(156-158). Of interest in this connection is the recent demonstration that stimulation of quiescent cultured mammalian cells by a variety of growth factors (e.g., PDGF, FGF, EGF, phorbol esters, or serum) leads, within a few (l4)hours, to the induced expression of a number of “delayed early response” (DER) genes (159),among them HMG-I, HMG-Y, and HMGI-C (148, 159, 160). Such gravth-factor induction of gene expression can be quite specific. For example, of the four different promoters and mRNA transcription start sites present in the complex human HMG-I(Y) gene (148) (Fig. 6), only one site is specifically induced by phorbol ester stimulation of quiescent cells (160), whereas stimulation by E G F leads to induced transcription from only two of the four sites (161). These results indicate that the different promoterlenhancer sequences are individually and specifically stimulated in response to particular growth factors, a fact that may have biological significance not only for embryonic development but also for regulation of the HMG-I(Y) gene in normal somatic cells and in transformed cancerous cells (see below).
A. Structure of the Proteins The peptide domains of the HMG-I(Y) proteins that preferentially interact with B-form A-T-DNA (see Section II,B,2) have been experimentally determined and a short synthetic peptide (Tl-P2-K3-R4-P5-R6-G7-RS-P9K10-K11) corresponding to a “consensus” binding domain (BD) sequence was found to footprint to the minor groove of a stretch of 5-6 bp (or one-half a helical turn) of A*T-DNA in a manner similar to binding of the intact protein (61). Each HMG-I(Y) protein has three separate BD motifs (also referred to as “A*T-hookmotifs) (Fig, 5) separated by stretches of flexible peptide backbone sequences. Thus, the tandem binding of all three BDs in an HMG-I(Y) protein should occupy the minor groove of -15-18 bp (or about one and one-half helical turns) of contiguous A.T-residues. Such a DNA-binding arrangement is predicted to induce secondary structural changes in the HMG-I(Y)protein, particularly in the flexible peptide regions
HMG PROTEINS
55
between BDs (61, 162), a speculation supported by preliminary two-dimensional solution 1H N M R studies (163). Analogous to the situation for the “HMG-1 box” motif of the HMG-1/-2 family, amino acid sequences similar to the BD domain (or A.T-hook) of HMG-I(Y) are found in numerous other DNA-binding proteins present in many different organisms, including yeast, plants, sea urchins, insects, and mammals. Often multiple copies of these BD-like sequences are present within otherwise unrelated proteins. Many of these proteins bind preferentially to A*T-rich DNA sequences in uitro, and most are suspected of being transcription factors involved in gene regulation. A palindromic BD-like sequence “P-R-G-R-P,” flanked by basic residues (arginines or lysines), is present in most of these conserved motifs and likely represents the consensus “core” of the A-T-DNA-binding domains of these proteins (61, 164). As illustrated in Fig. 7 , the peptide backbone of the consensus BD peptide is predicted (61)to have a planar, crescent-shaped structure that has general similarities to distamycin A and netropsin and to the fluorescent dye Hoechst 33258, ligands that also bind to the minor groove of A-T-sequences. Spaced along this crescent peptide backbone, and projecting above and below its plane, are the positively charged side chains of Arg and Lys residues that are so positioned (when the BD is bound to the minor groove of A.T-rich sequences) that they can interact with and neutralize the negatively charged phosphate residues on the two antiparallel strands of DNA. Evidence supporting a structural relatedness of the above minor groove ligands to the planar backbone of the BD peptide of HMG-I(Y) is provided by the striking similarity of their footprints on A.T-DNAs (165) and by their competition with each other for substrate binding both in uitro (61, 165,166) and in viuo (162; unpublished data). Indeed, in viuo displacement of the HMG-I(Y) proteins by the antiviral and antitumor drugs netropsin and distamycin has been suggested to be, at least partially, the basis for their marked cellular toxicity (167). Two-dimensional 1H NMR solution studies (168-1 70) have also directly confirmed crucial features of the proposed planar crescent-shaped backbone structure of the BD peptide, particularly demonstrating the existence of all of the proline residues in the expected all-trans configuration, as well as showing its minor groove binding to B-form linear A*T-DNAsubstrates (170). As discussed in Section II,B,2, the HMGI(Y) protein, as well as the BD peptide itself, can bind preferentially to nonB form DNAs, such as four-way junctions and supercoiled plasniids. How this is accomplished is unknown but it is tempting to speculate that the inherent rotational flexibility of the glycine residue in the middle of the BD peptide allows for enough pliancy to adopt certain alternative, thermally stable, backbone configurations (169) that could potentially accommodate
56
MICHAEL BUSTIN AND RAYMOND REEVES
I
.
H2Cb/P
I H0N4
'C"-CH3
,c=o -4
b \ C H 3
"h+O
/
P
/CH2
H2fi.b
N H ~
+
3. ’ 0
2
%J
*3? 0 H-U3+
0"
N A
9 C"3'
\cc
FIG.7. Comparison of the predicted planar crescent-shaped backbone structure of (A) the consensus DNA-binding domain peptide of the HMG-I(Y) family of proteins with those of the minor groove A.T-DNA-binding ligands netropsin (B) and Hoechst 33258 (C). [Redrawn with modifications from Reeves and Nissen ( S l ) . ]
57
HMG PROTEINS
TABLE I PROTEINSWITH SEQUENCESSIMILAR TO THE HMG-I(Y) DNA-BINDINGMOTIF Protein
Peptide sequence
HMG-IIY (human) MLL (ALL; HRX) (human) MIF2 (yeast) Datin (yeast) D l (Drosophila) cHMGI (insect) Histone H 1 (sea urchin sperm) Histone H2B (sea urchin sperm) C H D l (mouse) SBlGA,B (soy bean) ATBP-1 (pea) PF1 (oat)
TPKRPRGRPKK SPRKPRGRPRIK
Consensus
KIRPRGRPKIR
Binds A T-DNA
Suspected transcptioii factor
Ref.
+
+ +
61 186, 203a-c
+
RPRGRPKK (GRKP . . KIRRGRPKK RPRGRP (SITPRKIR) (SITPRKIR) KRPKKRGRPR KRPIGRGRPKI PK KI RRRIPGRPRI PK RPRGRPKK
203d
203e 203f
187 171, 203g,
+
+ + + + +
h 171, 203g,
h 203i 203j 203k
2031
61, 164
binding to such altered DNA structures. On the other hand, in analogy with the proposed mode of binding of isolated HMG-boxes to 4WJ DNAs (58),the extended BD peptides of the HMG-I(Y) proteins may not have to vary much in overall conformation to accomplish minor groove binding to non-B-form structures. Future structural studies of HMG-I(Y) proteins complexed to different types of DNA substrates should resolve these issues. The 11 amino acids that comprise the “consensus” sequence of each of the three independent DNA-binding domains of the HMG-I(Y) proteins (Table I) seem to form a “unit” that is modular in both structure and function. The planar, extended conformation of the BD peptide backbone (Fig. 7) facilitates tight, general structural recognition of the minor groove of DNA (61).On the other hand, the conserved palindromic “P-R-G-R-P”core of the BD peptide (along with positively charged flanking sequence) (Table I) probably imparts specificity in determination of the structure of the narrower minor groove of A*T-richsequences (61).The generality of a consensus of this type for recognizing minor groove structure has been recognized (171) and termed the GRP motif. And, finally, as discussed below, the amino-terminal threonine residue of the BD peptide appears to function as a “regulatory” residue involved in modulating the &nity of association of the protein with substrate DNA as a result of reversible phosphorylations.
58
MICHAEL BUSTIN AND RAYMOND REEVES
B. Interactions with DNA and Chromatin 1. HMG-I(Y) PREFERENTIALLY BINDSA-T-RICHDNA in Vivo AND in Vitro Although one member of the HMG-1/2 family, HMG-2a, also displays preferential binding in vitro to A-T-rich DNA fragments from a variety of sources (272), of all of the other known HMG proteins only HMG-I and HMG-Y preferentially bind to A*T-richDNA both in vitro and in vivo. By a combination of methylation interference, dI.dC base-pair substitutions, minor groove ligand-binding competition studies, and a variety of DNA footprinting techniques, these proteins have been shown to bind, in vitro, to the narrow minor groove of short stretches of A.T-rich B-form DNA (61, 140, 141, 165,173). In viuo, the HMG-I(Y) proteins have been immunolocalized to the AaT-rich G/Q and C bands of mammalian metaphase chromosomes (174), suggesting that they may play an important role in chromosome structural changes during the cell cycle (162,165). In vivo experiments employing high-resolution confocal laser microscopy and immunolocalization techniques have shown HMG-I(Y) to colocalize, along with topoisomerase 11, to A-T-rich scaffold-associated regions (SARs) of mitotic chromosomes (175177). Careful microscopic analyses have revealed that HMG-I(Y) is distributed along the longitudinal length of the backbone scaffolding, or ‘‘AaTqueue”, of native chromosomes, including colocalization in the GIQ bands and C bands, postulated to represent tightly coiled SAR sequences (175, 176). These in vivo observations confirm and extend earlier in vitro data showing that purified HMG-I(Y) proteins preferentially bind to isolated SAR fragments (178) and, in fact, effectively out-compete histone H 1 for binding to such A-T-rich sequences (162, 179). 2. HMG-I(Y) PROTEINSRECOGNIZEDNA STRUCTURE DNA footprinting experiments employing purified proteins indicate that in vitro the HMG-I(Y) proteins do not bind to all stretches of A-T-rich DNA equally well, or with equal affinity, indicating that these proteins recognize the structure, rather than the sequence, of such DNA (61, 143, 165, 180182). Recent polymerase chain reaction (PCR)-based DNA selection techniques (183) also demonstrate the marked differences in binding afXnity of HMG-I(Y) for different types of A*T-DNA (184). In linear duplex B-form DNA, the affinity and specificity of HMG-I(Y) structural recognition is significantly influenced by both the length and sequence of the particular A*T stretches (141, 180, 185) and by the “context” of flanking or adjacent nucleotide sequences (165, 180-182, 185). Interestingly, HMG-I(Y) also has the capacity to recognize and preferentially bind to certain types of structures formed by non-A-T-rich DNA se-
HMG PROTEINS
59
quences. For example, in uitro, the whole HMG-I(Y) protein (80), as well as the DNA-binding domain (186, 187) binds to synthetic four-way junction (cruciform) structures in preference to linear duplex DNA molecules of identical sequence. Likewise, HMG-I(Y) recognizes and binds to non-B-form structures in supercoiled plasmids (188) as well as to distorted regions of DNA found on isolated nucleosome core particles (189). The mode of interaction of the HMG-I(Y) protein, or its DNA-binding domains, with these non-B form DNA structures is presently unknown.
3. PHOSPHORYLATION OF HMG-I(Y) BY Cdc2 KINASE ALTERS ITS BINDINGAFFINITY The HMG-I(Y) proteins, along with histone H1, are among the most highly phosphorylated proteins in the nucleus and the extent of such phosphorylation is cell cycle dependent (reviewed in 162). In mammals the extensive phosphorylation of histone H 1 that occurs in proliferating cells is catalyzed by an enzyme homolog of the yeast cyclin-dependent kinase (cdk) p34cdc2/CDC28[also called Cdc2 kinase; formerly referred to as growth-associated histone-H1 kinase (190)], the activity of which is sharply elevated at mitosis. Activated Cdc2 kinase phosphorylates serine or threonine residues within the consensus sequence Ser/Thr-Pro-(Xaa)-Lys/Arg, where the presence of Xaa is variable but, when present, is often a polar residue (191). An inspection of the sequences of the three DNA-binding domains found in different HMG-I proteins (Fig. 5) reveals that in the human protein, two of the three BDs (at residues Thr-53 and Thr-78) have potential Cdc2 kinase phosphorylation sites, whereas, in the mouse protein, only one site (at residue Thr-53) conforms to the consensus phosphorylation sequence. Activated Cdc2 kinase isolated from mammalian cells (151, 152), as well as from starfish oocytes and sea urchin eggs (154),efficiently phosphorylates both human and murine HMG-I and HMG-Y proteins in uitro at the expected modification sites. Furthermore, in uivo 32P-labeling studies of synchronized human and mouse cells show that these same Cdc2 consensus phosphorylation sites are radiolabeled in HMG-I(Y) proteins isolated from metaphase cells (but not from nonproliferating, G1, or S phase cells) (151, 152, 154). These results clearly indicate that the mammalian HMG-I(Y) proteins are in viuu substrates for Cdc2 kinase and demonstrate that the extent of DNA-binding domain phosphorylation varies in a cell cycle-dependent manner. The in uivo effect of such modifications is uncertain, but in vitro phosphorylation of purified human recombinant HMG-I proteins by Cdc2 kinase results in a greatly reduced binding a n i t y (to 1/20 at physiological ionic strength) of the phosphorylated protein for A-T-DNA substrates, probably as a result of negative charge repulsions (152, 162). Nevertheless, as noted earlier, even during mitosis, when the HMG-I(Y)
60
MICHAEL BUSTIN AND RAYMOND REEVES
proteins are most highly phosphorylated, they do not completely dissociate from metaphase chromosomes (174),although their strength of DNA binding may well be weakened. Because in vitro mutagenesis experiments show that replacement of the two conserved Cdc2 kinase-modifiable threonine residues in human HMG-I with nonphosphorylatable alanine residues does not change the binding affinity of the mutant protein for substrate DNA (192), it is likely that the threonine residues at the N-terminal ends of BD peptides are “regulatory residues” involved in reversibly modulating the afhity of association of the protein with substrate DNA at specific points in the cell cycle. Such modulations of binding affinity as a result of reversible Cdc2 kinase phosphorylations can reasonably be expected to have significant effects on the in vivo function(s) of HMG-I(Y) proteins, for example, during the extensive condensation and decondensation of chromosomes accompanying cell division (162).
4. HMG-I(Y) INDUCES BENDS AND SUPERCOILSIN DNA Circular dichroism measurements (193), circular permutation DNA bending analyses (184), and topoisomerase-I-mediated plasmid supercoiling assays (188)all indicate that HMG-I(Y) binding markedly alters DNA conformation by introducing bends, supercoils, and possibly other distortions in the substrates. Given the mode of interaction of the individual binding domains with the minor groove of linear DNA or relaxed plasmids, the most likely physical explanation for at least some of the HMG-I(Y)-induced bending is by asymmetric charge neutralization (104) of the negative phosphate residues located on one face of the DNA helix by the positively charged Arg and Lys residues of the BD peptides (61, 162). In addition, HMG-I(Y)mediated strand unwinding also appears to contribute significantly to the ability of the protein to introduce distortions in DNA (188). For example, recent studies employing relaxed circular plasmids DNAs, topoisomerase I, and HMG-I(Y) indicate that increasing concentrations of the nonhistone protein in the assay used results in the introduction of increasing numbers of supercoils in the plasmid DNAs (188). Interestingly, at low input ratios, HMG-I(Y) introduces positive supercoils in the plasmids, whereas at progressively higher concentrations the protein induces increasing numbers of negative supercoils. Detailed analyses of this phenomenon reveal that such changes in the sign of plasmid supercoiling probably result from a combination of both HMG-I(Y)-induced DNA bending and strand unwinding. An additional finding of considerable interest from these studies is that an in vitro-produced mutant HMG-I protein, lacking the negatively charged carboxyl-terminal domain, binds A-T-DNA with approximately the same affinity as the full-length wild-type protein and yet is 8- to 10-fold more
HMG PROTEINS
61
effective in introducing negative supercoils. This suggests that the highly acidic C-terminal region of the HMG-I(Y) proteins may function as a regulatory domain influencing the amount of topological change induced in DNA substrates by protein binding (188). 5 . HMG-I(Y) BINDING TO CHROMATIN AND NUCLEOSOMES
Early studies (140, 194) investigating the chromatin organization of A.Trich a-satellite DNA in CV1 monkey cells demonstrated by two-dimensional electrophoretic methods that a distinct subpopulation of isolated monomer nucleosome core particles contained a-protein (also called HMG-I), in addition to HMG-14 and -17. In subsequent experiments, the same workers found that the pattern of in oitro binding of a-protein to bulk CV-1 mononucleosomes is strikingly similar to that of HMG-14/-17 binding (140). Both native and recombinant HMG-I(Y) proteins also bind to preferred regions on isolated avian nucleosome core particles containing -146 bp of random sequence DNA (189). Up to four discrete HMG-I(Y).core particle complexes can be detected by electrophoretic mobility shift assays when increasing molar ratios of protein are associated with cores. In vitro and in vivo chemical cross-linking investigations indicate that HMG-I(Y) proteins bind to nucleosome core particles in close proximity to histones H2A, H2B, and H3. Thermal denaturation and DNase I protection studies in vitro show that when HMG-I(Y) is present in less than equal molar concentrations with mononucleosomes the protein initially binds to DNA in the vicinity of the DNA termini at the entrance and exit points on the face of the particle. With increasing molar ratios of bound protein (up to -4 : 1)DNase I footprinting shows that other preferred regions of DNA along the sides of the nucleosome particle are also protected. Both protein-DNA and protein-protein interactions are involved in HMG-I(Y) core particle association. These findings, combined with other information, suggest that HMG-I(Y), like HMG-14 and -17 (195,196), selectively binds to the front face of nucleosome core particles near the dyad axis, as well as near the entrance and exit of DNA from core particles, when the protein is bound at low molar ratios ( < 1 : 1 HMGI(Y):core particles) (189). Because not all random sequence nucleosomes are expected to have A.Trich sequences located in the preferred binding sites on the front face of core particles noted above, it seems plausible that the HMG-I(Y)protein is recognizing and binding to altered DNA structures in these locations (189).Additional support for this idea comes from subsequent studies (197) involving binding of HMG-I(Y) to in vitro reconstituted mono- and dinucleosomes containing DNAs of defined sequence that have various types of A*T stretches (bent, rigid, flexible) located at a particular site in the reconstituted substrates (198). The principal finding from these investigations is that
62
MICHAEL BUSTIN AND RAYMOND REEVES
HMG-I(Y) protein preferentially binds to different sites on defined-sequence DNA depending on whether the duplex substrate is free in solution or has been distorted by being wrapped around a histone octamer core (197). In addition, these studies show that (1)HMG-I(Y) has the capacity to associate with certain types of A.T sequences even when they are located on the lateral sides of the reconstituted nucleosome and (2) on binding, the protein can induce a localized change in the rotational setting of the DNA on the core particle surface. In tuto these studies indicate that HMG-I(Y) binding to D N A associated with chromatin core particles in vitru is mediated, just as in the case of binding of the protein to free DNA substrates, by recognition of preferred DNA structures. Although HMG-I(Y) and HMG-14/-17 proteins do share certain similarities in the way they bind nucleosomes, these two families of HMG proteins are distinctly different in many other important respects. For example, whereas HMG-14 and -17 bind to only two specific sites on each core particle (196,199,200), at high molar ratios (-4 : 1) HMG-I(Y) can form up to four discrete complexes with random sequence core particles in vitru (189). Furthermore, in contrast to HMG-14 and -17, which bind more tightly to core particles than to naked DNA (7,195,196, 199,200)and which also bind .~ manner (199-203), HMG-I(Y) binds more to nucleosomes in a cooperative ~tightly to naked A-T-rich substrates (61)than to random sequence core particles (189)and, so far, there is no evidence for cooperative HMG-I(Y) binding to core particles (189). Based on these differences in binding characteristics, it is expected that in chromatin containing A*T-rich linker regions, HMGI(Y) would preferentially associate with the linker DNA whereas HMG-14 and -17 would bind to nucleosomes. On the other hand, in chromatin in which both the nucleosome and linker DNAs are of random sequence it would not be unreasonable to expect simultaneous binding of both HMG-14/-17 and HMG-I/Y to at least some fraction of the nucleosome core particles, as has previously been reported for the a-protein (140, 194). 6. SIMILARITIESOF THE HMG-I(Y) AND HMG-1I-2 PROTEINFAMILIES Given the marked differences in their amino-acid sequences and their folded peptide structures, there is a remarkable similarity in many of the in uitro DNA-binding characteristics of the HMG-1/42 and HMG-I(Y) proteins. Both families of proteins bind to the minor groove of DNA and have the ability to induce bends and supercoils in DNA, as well as possessing the ability to recognize and preferentially bind to altered DNA structures, e.g., four-way junctions, cruciforms, and certain types of adducted, or non-B form, DNA conformations. This unusual constellation of shared capabilities suggests that the DNA-
63
HMG PROTEINS
binding domains of the two families of proteins probably also share some important common features. At first glance, however, the three-dimensional L- or V-shaped arrowhead structure of the HMG box of the HMG-1I-2 proteins (Fig. 1) appears superficially to be quite different from planar, crescent-shaped BD peptide of the HMG-I(Y) proteins (Fig. 7). Nevertheless, on closer inspection of these two motifs, there does appear to be a significant commonality in both the structure and the sequence of the peptides that actually interact with the minor groove of DNA. As outlined above (Section I,A,2), the first 12 residues of the N-terminal region of the HMG-1 box have been strongly implicated in binding to the minor groove of DNA and, significantly, just as in the case of the BD peptides of the HMG-I(Y) proteins (61),the peptide backbone of this region of the box is in an extended configuration compatible with preferential binding to a narrow minor groove (57, 58). Additionally, there is a highly conserved consensus sequence, P7K8-R9-P10, present in the extended N-terminal peptide of HMG-1 boxes (Fig. 1) (57) that is also faithfully conserved (P2-K3-R4-P5) (Table I; 203a-2) in the BD motif of many HMG-I(Y) proteins. And, most importantly, all of the prolines present in both the BD peptide (61, 168, f69),as well as in the N-terminal region of the HMG-1 box (57, 60) are in the trans configuration, a situation that facilitates both an extended peptide structure and minor groove binding (61). The available information therefore strongly argues for a preservation of similar peptide backbone structures as well as conservation of particular amino acid residues and conformations in the minor groove DNA-binding regions of the HMG-1 box and HMG-I(Y) proteins. The preferential recognition capabilities of individual proteins, for either bent or four-way junction DNAs, for specific DNA sequences, for certain stretches of A.T-DNA, or for other types of unusual DNA structures, are probably imparted by a combination of the subtleties of the actual amino acid sequence and structure of a given HMG DNA-binding domain as well as by the particular flanking, or adjacent, peptide residues.
C. Cellular
Functions
1. HMG-I(Y) Is AN in Vivo STRUCTURAL TRANSCRIPTION FACTOR
The in vivo function of the HMG-I(Y) family is much better understood than that of either the HMG-1I-2 or HMG-14/-17 families. Earlier studies (summarized in 152, 162), postulating a role for the HMG-I(Y) proteins in nucleosome phasing, metaphase chromosome condensation, DNA replication, and 3'-end processing of mHNA transcripts, have all generally been of a correlative nature, thus leaving unanswered the question of whether such
64
MICHAEL BUSTIN AND RAYMOND REEVES
observations have in vivo biological significance. Recently, however, a series of reports have presented compelling evidence directly implicating HMGI(Y) proteins in the in uiuo transcriptional regulation (either positive or negative) of a number of mammalian genes lying in close proximity to A-Trich promoter/enhancer sequences (Fig. 8). The first example of in vivo transcriptional regulation by HMG-I(Y) was reported (12) in studies of the promoter region of the murine lymphotoxin (LT; also called tumor necrosis factor$) gene that is constitutively expressed in transformed B-cell lines. Mutation and promoter deletion analysis delineated a 5‘ poly(dA0dT) upstream activating sequence (UAS), an essential component of LT transcriptional activation in vivo. Additional experiments showed that recombinant HMG-I specifically binds this U A S element in vitro and that nuclear extracts from LT-expressing mouse cells contain an HMG-I-like protein with identical UAS binding characteristics. Electrophoretic mobility shift analyses (EMSAs) using LT promoter DNA incubated in nuclear extracts demonstrated that anti-HMG-I(Y) antibodies gave band “supershift” patterns identical to those observed when the antibodies reacted with recombinant HMG-I protein alone bound to the promoter DNA. And, finally, EMSA combined with antibody reactivity analyses revealed that at least one additional protein was present in the nuclear extracts that bound to both HMG-I and the UAS, suggesting that HMG-I (probably in combination with other proteins) facilitates the formation of an active promoter/enhancer transcription complex necessary for LT gene expression in vivo (12). Since this initial report, additional examples documenting the in vivo involvement of HMG-I(Y) in the positive induction of gene transcription have appeared. These include the human genes coding for p-interferon (13, 173) for the a-subunit of the interleukin-2 receptor (14), and for E-selectin (204, 205). Examples are also known of instances where HMG-I(Y) binding to promoter regions seems to be involved in negative regulation of transcription, including the genes coding for human interleukin-4 gene (206) and GP91-PHOX (185), a component of the respiratory burst NADPH-oxidase complex of phagocytes, as well as the murine gene coding for heavy chain embryonic E-immunoglobulin (E-IgG) (207) (Fig. 8). Positive Regulation Murine tumor necrosis factor-p (TNF-P) (12) Human interferon-p (IFN-P) (13, 173) Human IL-2 receptor-a (IL-2Ra) (14) Human E-selectin (204, 205)
Negative Regulation Human interleukin-4 (IL-4) (206) Human GP91-PHOX (185) Murine E-immunoglobulin (e-IgG) (207)
FIG. 8. Positive and negative in uiuo regulation of gene transcription by HMG-I(Y) proteins.
HMG PROTEINS
65
Several of the reports supporting an in oivo role for HMG-I(Y) in positive gene regulation suggest that the protein probably functions as an “architectural transcription factor (16, 19, 208) both by bending DNA and by directly interacting with other transcription factors to facilitate formation of a stereospecific, multiprotein complex that brings together upstream promoter/enhancer elements with the proximal basal transcription apparatus during the process of transcription induction. Consistent with the basic tenants of such models is the fact that, in vitro, HMG-I(Y) bends and unwinds DNA substrates (see Section II,B,2). Furthermore, HMG-I(Y) also specifically associates either free in solution or as part of a complex in nuclear extracts, with a number of known sequence-specific transcription factors, including NF-KB,ATF-2, IRF, and c-Jun (13,173,209,210),and the lymphoid specific factor Elf-1, an Est family member (14). It should be noted, however, that direct experimental evidence supporting the presence of such stereospecific protein-DNA transcription initiation complexes in living mammalian cells has yet to be demonstrated. Nevertheless, two examples supporting the in vivo existence of such inducible HMG-I(Y) promoter complexes are of particular interest. One example comes from the recent studies of John et al. (14), who investigated the inducible expression of the gene coding for the a-subunit of the 1L-2-receptor (IL-2R) in human T cells in response to mitotic stimuli (Fig. 9). These workers identified and characterized a new positive regulatory region (PRRII) in the gene’s promoter (nucleotides -137 to -64) that binds both HMG-I(Y) and the lymphoid cell-specific factor Elf-1. Cell transfection experiments with an expression vector containing the IL-2Ra promoter ligated to the bacterial CAT reporter gene (Fig. 9A) demonstrated that mitogen-inducible expression of the promoter is inhibited when either the Elf-1 or the HMG-I(Y) binding sites in PRRII are specifically mutated. Furthermore, coexpression of both Elf-l and HMG-I(Y) proteins in nonlymphoid COS-7 cells (which normally lack the Elf-1 protein) containing the same CAT reporter construct activated transcription from the PRRII element. Previous work from the same group had also identified another mitogen-inducible promoter element (PRRI) farther upstream of the transcription start site (at nucleotides -276 to -244) that contained binding sites for two additional transcription factors, serum response factor (SRF) and NF-KB. Importantly, when specific antibodies [anti-Elf-1, anti-HMG-I(Y), anti-NF-KB, etc. ] against various putative components of the transcriptional system were employed in coimmunoprecipitation or EMSA supershift assays using either nuclear extracts or recombinant proteins free in solution, a direct physical interaction was found between Elf-1 and HMG-I(Y) as well as between Elf-1 and the NF-KB p50/c-rel heterodimer, suggesting that protein-protein interactions functionally coordinate the actions of the upstream
66
A
MICHAEL BUSTIN AND RAYMOND REEVES
HUMAN IL-2 RECEPTOR-a PROMOTER I
I
POSITIVE REGULATORY
9' O Y n n PRRII
PRRI
-47
C C G C ~ C T A T A T T G T ~ A T(CA )
l9c
GGCGTTTGATATAACAGTAQ3T)lgG
HMG-I
B
IiMG-I
HMG-I
Activated T-Cells
Resting T-Cells
HMG-I(Y) Molecules FIG.9. (A) Diagram of the human IL-2Ragene 5' regulatory region between nucleotides -472 and 109, including the upstream and downstream positive regulatory regions (PRRI and PRRII) attached to a bacterial chloramphenicol acetyltransferase (CAT)reporter gene used for in uiuo expression assays. The binding sites for transcription factors NF-KB,serum response factor
+
HMG PROTEINS
67
(PRRI) and downstream (PRRII) positive regulatory elements to form a protein complex necessary for inducible IL-2Ra gene expression (Fig. 9B). Another example comes from the laboratory of Maniatis (13, 173, 209, 210) and colleagues, who demonstrated in uiuo that HMG-I(Y) plays a causal role in the virus-induced expression of the human p-interferon gene (IFN-P). Induction of IFN-(3 depends on the simultaneous binding of both HMG-I(Y) and transcription factors NF-KBand ATF-2/c-Jun to two separate “positive regulatory domains” (PRDII and PRDIV) located in the gene’s 5’ promoterlenhancer region. HMG-I(Y) also interacted directly with both NF-KB and ATF-2 as free proteins in solution and thereby significantly increased the binding affinity of these transcription factors for their cognate DNA recognition sites in uitro. In this experimental system the HMG-I(Y) protein is also proposed to function as a mediator for the assembly of a stereospecific protein complex [including NF-KB, ATF-2, c-Jun, and HMGI(Y)] involving the two different upstream enhancer domains, as well as the basal promoter region that is required for virus-induced transcription of the IFN-P gene. In this system, HMG-I(Y) can either stimulate or inhibit the in uitro binding of different ATF-2 isoform proteins to the PRDI site, depending on whether these isoforms contain a short stretch of basic amino-acid residues, located near the leucine zipper dimerization motif, that is necessary for HMG-I(Y) binding (209). This differential association of HMG-I(Y) with different ATF-2 isoforms determines whether a functional ATF-2 dimer is formed that is capable of PHDI enhancer binding and thus, by inference, whether a functional, inducible transcription complex is formed on the IFN-P promoter. The HMG-I(Y) protein significantly increases the afEnity of binding of both NF-KB (13, 173) and the ATF-2 (209, 210) for their recognition sequences in the IFN-P promoter. In the case of the NF-KB site in PRDII, various footprinting techniques have shown that the NF-KBp50/p65 heterodimer binds to the terminal regions of a 10-bp regulatory sequence through contacts in the major groove, while HMG-I(Y) recognizes the central region of the same sequence through contacts in the minor groove; thus, the recog-
(SRF), Elf-I, and HMG-I(Y) are indicated. [Redrawn with modification from John et ~ l ( I. 4 ) . ](B) Diagrammatic model of the promoter region of the human interleukin-2 receptor a chain gene
(IL2-Ra) before (resting T cells) and after (activated T cells) mitogen stimulation indicating direct interactions between NF-KB, Elf-1, and HMG-I(Y) proteins. Two possibilities are indi-
cated for the activated state: the upper schematic depicts direct Elf-1-NF-KB interactions, whereas the lower diagram additionally shows the possibility that HMG-I(Y) may also enhance Elf-I-NF-KB interactions. It is possihle that both models depicting the activated state exist at the same time. [Redrawn with modification from John et al. (14).]
68
MICHAEL BUSTIN AND RAYMOND REEVES
nition sites of these two proteins overlap but their binding occurs in opposite grooves of the DNA (173). Because both proteins are proposed to occupy their respective PRDII binding sites simultaneously during initiation complex formation, a necessary prediction of such a model is that binding of NF-KBto the major groove will not interfere with HMG-I(Y) binding to the minor groove. That this prediction may indeed be correct is suggested by the recently determined X-ray crystallographic structure of a NF-KB p50 homodimer bound to a KB site (211, 212) showing that binding of the butterfly-shaped dimer to the major groove leaves the minor groove open for potential binding by HMGI(Y) (Fig. 10, see color plate). These X-ray structures do not, unfortunately, provide any clues as to how HMG-I(Y) binding in the minor groove might facilitate increased NF-KBaffinity for binding in the major groove. HMG-I(Y) is not the only HMG protein that facilitates increased binding affinity of NF-KBfor its recognition site. Purified HMG-1 (or HMG-2) stimulates, by greater than 19fold, the site-specific binding of all forms of NF-KB (p50, p52, and p65 homodimers as well as p5OIp65 heterodimers), with significant binding enhancements being observed with nearly stoichiometric amounts of HMG-1 to NF-KBprotein (134).Intriguingly, although HMG-1 greatly facilitates the binding of NF-KBto its recognition sequence, based on the failure of anti-HMG-1-specific antibodies to cause an electrophoretic “supershift” of the NF-KB-DNAcomplex, it does not appear that HMG-1 is part of the final ternary complex formed in these in vitro experiments (134). These findings are reminiscent of a previous report (10)describing the capacity of HMG-1 to enhance dramatically (>lo-fold) the binding affinity of purified human progesterone receptor (PR) for DNA fragments containing the progesterone response element (PRE) without being incorporated into the final PRaPRE complex. One interpretation of these combined experiments is that HMG-1 perhaps functions by a “hit-and-run” mechanism whereby the protein induces some type of structural change in the target DNA that facilitates transcription factor binding, but thereafter is not required for the maintenance of such binding and therefore readily dissociates from the complex. An alternative possibility, however, is that HM G-1 is, in reality, actually part of the final ternary transcription factorlDNA complex but is so loosely associated that it readily dissociates from the complex during gel electrophoresis. In either case, the remarkable fact that both the HMG-I(Y) and HMG-1I-2 protein families are able to facilitate enhanced transcription factor binding in vitro again reinforces the notion of an overall general similarity of DNA-binding capacities and possible biological functions of these two groups of proteins. A certain degree of caution may be exercised, nevertheless, in interpreting the results of experiments in which basic proteins such
HMG PROTEINS
69
as HMG-I(Y) or HMG-1 are shown to increase the in vitro DNA-binding d n i t y of NF-KB. In several reported cases such in uitro results have been interpreted as demonstrating that the observed increase in NF-KB binding affinity is the direct result of ancillary protein-induced DNA bending (13, 14, 173, 209, 210). However, because similar stimulations of NF-KB binding affinity can also be induced in vitro by certain proteins that do not cause DNA bending (134),the question of the actual role played by such ancillary proteins in stimulating NF-KB binding remains unclear.
2. HMG-I(Y) PROTEINS AND CANCER In light of the compelling evidence demonstrating that HMG-I(Y) proteins are structural transcription factors in vivo, it is not surprising that a number of laboratories have observed a striking correlation between high levels of HMG-I(Y) gene expression and neoplastic transformation of normal cells and/or increased metastatic potential of tumor cells. In normal differentiated somatic cells, HMG-I(Y) mRNAs and proteins are expressed at only very low (142-144, 213, 214), or nondetectable (215, 216), levels. In contrast, in neoplastically transformed cells (215, 217-223), as well as in embryonic cells that have not yet undergone differentiation (215, 216, 224), levels of HMG-I(Y) gene products are often exceptionally high. Spontaneously derived tumors, or normal cells experimentally transformed by chemicals, by ionizing or UV radiation, or by viral oncogenes (v-src, v-ras, v-mos, v-myc), contain abnormally high levels of HMG-I(Y) proteins and mRNAs. Because cellular levels of HMG-I(Y) mRNAs are known to vary with the rate of proliferation in normal cells, being very low in nondividing or quiescent cells and increasing about fourfold during exponential growth (213), it is important to emphasize that the elevated HMG-I(Y) product levels found in tumors appear to be relatively independent of cellular growth rates because untransformed normal cells proliferating at about the same rate as their transformed counterparts consistently contain much lower levels of HMGI(Y) (220-222). Estimates have been made (142, 144,213) that certain malignant cell lines constitutively contain 15 >> 50 times the level of HMG-I(Y) mRNAs found in nontransformed normal cells. The correlation between cancerous transformation and high constitutive levels of HMG-I(Y) gene products is so striking that Goodwin and colleagues (215, 21 7,218)have suggested that elevated concentrations of these proteins are a characteristic and diagnostic feature of the transformed cellular phenotype. Schalken’s laboratory (220) has also identified increased levels of HMG-I(Y) mRNAs as a progression marker for prostate cancer metastasis in the Dunning rat model system, demonstrating that the extent of HMG-I(Y) overexpression directly correlates with the degree of metastatic aggressiveness of the tumors rather than with their growth rates. More recent studies
70
MICHAEL BUSTIN AND RAYMOND REEVES
have extended these findings to human prostate cancers in a retrospective in situ RNA hybridization study of HMG-I(Y) mRNA levels in paraffin-embedded materials obtained from patients presenting different Gleason grades of metastatic prostate cancer (222). Likewise, retrospective studies have also correlated high levels of HMG-I(Y) protein expression with the malignant phenotype of human thyroid neoplasias (225). Similar correlations for increased levels of HMG-I(Y) mRNA and protein being reliable biochemical markers for different stages of tumor progression have been reported for a well-characterized mouse mammary epithelial cell system (221). The reverse situation also appears to be true, namely, that when undifferentiated, highly aggressive mouse teratocarcinoma cells are induced to undergo overt cellular differentiation, they lose both their high constitutive levels of HMG-I(Y) gene products and their in uiuo tumorigenic potential (224).But perhaps of greater biological significance is the recent report (223) that inhibition of HMG-I(Y) protein synthesis by gene antisense methodology suppresses the ability of transforming retroviruses (carrying v-mos or v-rus-Ki) to induce neoplastic transformation in rat thyroid cells. Together these reports provide strong experimental support for involvement of the HMG-I(Y) proteins in both neoplastic transformation and increased metastatic tumor potential. However, HMG-I(Y) genes do not behave like classical transforming oncogenes in that their transfection into normal cells does not usually lead to transformation (223),suggesting that in many cases their overexpression may be necessary, but not sufficient, to achieve the neoplastic phenotype; the activation of other factors, as well as alterations in the way the HMG-I(Y) protein functions as an architectural transcription factor, may also be required. Specific chromosome translocations are frequently found in human lymphomas and leukemias (139, 226) and recently the human mixed-lineage leukemia (MLL) gene (186) [also called ALL-1 ( 2 0 3 ~or ) HRX (203b)l involved in a number of such rearrangements has been isolated and sequenced. Significantly, the N-terminal region of the MLL (ALL/HRX) gene was found to code for an amino-acid sequence almost identical to the “A*Th o o k DNA-binding motif of the HMG-I(Y) proteins and it is this region of the gene that is frequently translocated in human leukemias (239, 203u,b). These findings raise the intriguing possibility that in certain human cancers, chromosomal translocation and fusion of an A-T-hook-like motif to a new cellular protein may convert the resulting hybrid into a transforming oncoprotein as a result of DNA mistargeting. Compelling support for such a scenario has recently been provided by two additional observations: (1)the demonstration that the HMG-I(Y) A-T motif peptide found in the MLL gene, which is involved in many aberrant chromosomal translocations (re-
71
HMG PROTEINS
viewed in 139), can specifically bind to both A*T-richsequences and to cruciform structures in uitro (186);and (2) chromosomal rearrangements at the site of the HMGI-C gene on human chromosome 12 result in the fusion of the A-T-hook motifs of this HMG-I(Y) family member to new transcriptional trans-activating regulatory domains during the formation of benign lipomas (227).
3. HMG-I(Y), HISTONEH1, AND OF CHROMATIN DOMAINS
THE
OPENING
Another recently postulated function of the HMG-I(Y) proteins relates to their in uivo roles as structural transcription factors and their intimate relationship to the binding of histone H1 and nucleosomes to substrate DNAs. It has been known for some time that if either H 1 histones (228, 229) and/or nucleosomes (reviewed in 2, 230, 231) bind to gene promoter/enhancer regions, transcription of the associated gene by RNA polymerase is usually either repressed or greatly inhibited. It is of some importance then, that, like the BD peptides of HMG-I(Y), the peptide tails of H1 histones also bind preferentially to the narrow minor groove of stretches of A.T-DNA (reviewed in 171). Furthermore, in uitro, HMG-I(Y) out-competes histone H1 for such DNA binding (162, 179). And, as previously mentioned, HMG-I(Y) also binds -50 times more tightly to free A.T-DNA than to chromatin core particles. It was therefore suggested (162)that one of the likely in uivo functions of the HMG-I(Y) proteins is to act as an antirepressor molecule that out-competes, or displaces, inhibitory histone H 1 and/or nucleosomes for A*T-DNA binding, thus assisting in the establishment of an open or accessible chromatin structure over important gene regulatory regions. Once such an “open” chromatin structure has been formed by HMG-I(Y) binding, this accessible configuration can potentially be propagated from one cellular interphase to the next as both HMG-I(Y) and histone H1 change their CdcZ-kinaseinduced phosphorylation levels, and hence their relative DNA-binding strengths, in a coordinated manner during mitosis (162). Considerable support for the above scenario has recently come from the in uitro demonstration (179)that HMG-I(Y) not only acts as an antirepressor molecule by preventing histone H1 binding to isolated SAR sequences, but also functions as a true derepressor by displacing previously bound proteins, thereby relieving histone H 1-mediated repression of reporter gene transcription. Based on the ability of HMG-I(Y) to function as a derepressor molecule in uitro, a model has been presented (166,179)for the involvement of both SARs and HMG-I(Y) in establishing the overall pattern(s) of inactive and transcriptionally competent chromatin domains during cellular differentiation.
72
MICHAEL BUSTIN AND RAYMOND REEVES
In this model, inactive chromosome loops or domains (232, 233) are proposed to be compacted and stabilized by “nucleating” histone H1 molecules that initially bind tightly to A*T-richSAR sequences located at the base of chromatin loops and then, through subsequent cooperative H 1-H1 protein interactions, “spread their inhibitory influence throughout a topologically defined domain. The compact, H1-containing domains thus formed remain transcriptionally inactive until HMG-I(Y) (or another “distamycin-like” D-protein) binds to the SARs and “mobilizes” or displaces histone H1; i.e., HMG-I(Y) binding is proposed to interfere with the ability of SARs to serve as nucleation sites for cooperative histone H1 assembly leading to chromatin domain activation (179).As a consequence of HMG-I(Y) binding, the equilibrium of histone H1 association is postulated to shift toward a reduction in occupancy of nucleosome linker regions in the domain, thus resulting in its “opening” into a transcriptionally competent or active region (166, 179). Although of considerable intrinsic interest, it should be kept in mind that the in vitro experiments on which this attractive model of domain activation is based were not performed in a nucleosomal chromatin context and therefore the in vivo biological relevance of the findings remains to be established.
111. The HMG-14/-17 Family Chromosomal proteins HMG-14 and HMG-17 are closely related proteins present in the cells of most higher eukaryotes. They have a high content of lysine, alanine, and proline and lack aromatic amino acid residues. Their amino-acid composition is reminiscent of the H1 linker histones, except that they have a significantly lower ratio of basic to acidic amino acids. Although they are ubiquitous in higher organisms, the HMG-14/-17 proteins have not been detected in yeast or other lower eukaryotes. Fish tissues have one protein, named H6, which contains all of the evolutionarily conserved domains of this protein family (see Section III,A, 1). Avian erythrocytes contain two types of HMG-14 proteins. The main component, HMG-l4a, has a higher molecular weight than most HMG-14/-17 proteins, whereas the minor component, named HMG-14b, is the homolog of mammalian HMG-14. In the chicken genome single-copy genes code for each of the HMG-14/-17 genes. The functional genes coding for both the human and chicken HMG-14 and HMG-17 have been isolated and fully sequenced (see 7). Structural analyses of these genes suggest that they evolved from a common ancestor. Mammalian genomes contain multi-
73
HMG PROTEINS
ple retropseudogenes for either HMG-14 or HMG-17; these are among the largest known retropseudogene families in mice and humans (234). The presence of HMG-14 and HMG-17 proteins in all the tissues of higher eukaryotes is perhaps the strongest argument favoring the possibility that this HMG family is necessary for proper cellular function. Furthermore, all cells contain both HMG-14 and HMG-17, suggesting that the proteins are involved in distinguishable functions. Although their exact cellular function and mode of action are still not fully understood, results from many types of experiments are consistent with the possibility that the HMG-14/-17 proteins modulate the effect of chromatin on transcription. Insight into their cellular function have been obtained from studies on their structure, their mode of interaction with the nucleosome cores, and their effect on the transcriptional potential of chromatin templates assembled under controlled conditions.
A. Structure of the Proteins 1. CONSERVED STRUCTURAL DOMAINS IN
THE
HMG-14/-17
PROTEINFAMILY Alignment of all the HMG-14/-17 protein sequences reveals structural motifs that are characteristic of this protein family. A sequence logo ( 2 3 4 ~ ) depicting the conserved amino-acid positions is shown in Fig. 11. This logo is based on a multiple alignment of the 12 known HMG-14/-17 protein sequences. Gaps have been introduced to maximize the homology between the members of the HMG-14/-17 protein group. Therefore, the sequence logo contains more amino-acid positions than an alignment of either the HMG-14 or HMG-17 protein subgroup alone, each of which contain respectively, 98 and 89 amino acids. From the sequence logo, it is apparent that the HMG-141-17 protein group has four regions with high sequence information content. The first region, with the sequence PKRK, consists of the first 4 amino acids from the N terminus of the proteins. The second conserved region consists of amino acids 17 to 47; the third region, spanning positions 64 to 69, consists of 5 amino acids with the sequence GK(KR)G, and the fourth region, positions 87 to 94, consists of 8 amino acids. In addition, residues 109 to 111 are also highly conserved. Residue 109 is negatively charged except in H6, where it is an asparagine. Residue 110 is invariably alanine except in the chicken HMG-l4b, where it is a valine. Further analysis of the alignment indicates an uneven distribution of charged amino-acid residues along the polypeptide chain. The HMG-14/-17 proteins can be subdivided into three regions. The first region, containing
74
MICHAEL BUSTIN AND RAYMOND REEVES
FIG. 11. Sequence logo of multiple alignment of HMG-14/17 proteins. The sequence logo is derived from a multiple alignment of the sequences obtained from SWISSPROT version 31.0. (For accession numbers see Fig. 13.)The information content, in bits, is determined at each position. The size of each letter is proportional to the information content (in bits) for that amino acid, which is a graphical representation of the frequency of an amino acid at a given position. Thus, taller letters represent high information content (i.e., positions 1-3). Shorter letters, or the absence of a letter, indicate positions with a variable content of amino acids, i.e., low information content. The logo was constructed by David Landsman (NCBI, NLM, NIH), using the methods described by Schneider and Stephens (23.1~).
residues 1-17, has a slight net positive charge of +2. The central region of the proteins, from residue 17 to residue 73, has a net positive charge of 16 for HMG-14 and + 13 for the HMG-17 subgroup. The C-terminal region of the proteins is negatively charged and has a net charge of -8 and -3, respectively, for HMG-14 and HMG-17. An outline of the conserved domains and the charge distribution in the HMG-14/-17 protein family is presented in Fig. 12. The asymmetric distribution of charged residues along the polypeptide chain is reminiscent of the structure of certain transcription
+
75
HMG PROTEINS HMG-14
+I
+I6
-8
HMG-I7
t2
+I3
-3
Exon
I
I
’
I
III
,
1v
V
I
I
,
I
VI
17 14
FIG. 12. The structure of the HMG-141-17 protein family. The evolutionarily conserved amino-acid residues are clustered into four major domains. The positions (the amino-acid position corresponds to that of the sequence logo in Fig. 1)of the domain boundaries are indicated. Note the correspondence between these domains and the organization of the gene. Thus. domain A is at the 3’ end of exon I; domain B is encoded by exons I11 and IV and domain D is located at the 3’ end of exon V. The charged residues are also clustered, giving raise to regions of low and high cationic charge. The C-terminal regions of the molecules are negatively charged.
factors in which the positive and negative charged residues are clustered into domains. Furthermore, as in the case of acidic transcription factors, the negatively charged C-terminal regions of the HMG-14/-17 proteins have the potential to form a helices with negatively charged surfaces. However, in spite of these structural similarities, experimental evidence suggests that HMG-14/-17 proteins d o not act as “classical”transcriptional activators (137). Figure 12 also illustrates an interesting correlation between the structure of the HMG-14/-17 genes and the conserved protein domains. The 3‘ end of exon I codes for domain A, the 3’ end of exon I1 codes for 3 amino acids at the N-terminal region of domain B, the 3’ end of exon V codes for domain D, and the 3‘ end of exon VI codes for the conserved residues at position 109111. Exons I11 and IV code for most of domain B, a 30-amino-acid evolutionarily conserved sequence, which is the nucleosomal binding domain of the HMG-14/-17 protein family (195). Exon I11 codes for a decapeptide in which 9 positions are absolutely conserved. HMG-14 and -17 are positively charged proteins. HMG-14 contains 21 lysine and 5 arginine residues and HMG-17 contains 21 lysine and 4 arginine residues. The N-terminal half of domain B, encoded by exon 111, contains 3 of the arginine residues and therefore can be considered as an arginine-rich cassette inserted into lysinerich proteins. The 17 amino acids in the C terminus of domain B are encoded by exon IV. This region contains an invariant motif, KPKKA, which is also present in H1 histones but not in any other known protein. This motif is similar to that of domain D, KGK(KR)G.
A ------DKSSDKKVQTKGKRGAKGKQAEVMQETKED-LPAENGETKTEE SPASDEAGEK-EAKSD
WMG-14 human
PKRK VSSAEGAAKEE-PKRRSARLSA KP-PAKVEAKPKXAMK
HMG-14
PKRK VSSAEGAAKEE-PKRRSARLSA KPAPAKVETKPKXAFGK ------DKSSDKKVQTKGKRGAKGKQAEVANQETKED-LPAENGETKNEE SPASDEAEEK-EAKSD
calf
HMG-14 mouse
I I I I I I I I I I I I I I I I I I I I I I I I I I I OIII I I I I I I I Ill1 I I l I I I I I . I IIIIIIIIII llIIllI* IIIIIIII
EXON
II IllIIlllIlIIIIIIIl.II I I I
IIII'I
II IIIII
IIIIIIII Ill. Ill-Ill 11.1
IIIII.
IIIIIIII I
PKRK V-SADGAAKAE-PKRR8SA KPAPAKVDAKPKKAAGK ------DKRSDKKVQIKGKRGAKGKQADVACQQTTE--LPAENGETENQ- SPASEE--EK-EAKSD
IIII I
I ....I IIIIIIIIII II I I
Ill1
I
. 1.1
HMG-14b chicken PKRK V-AASRGGREEVPKRRSARLSA rcmrPDKAEPHMG-14a chicken
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIII IIIIIIIII I l l I I I I I I I I IIOI
"11
I I I I I I I I I I I1 I l l l l . l
I I
I I
II
lIIllI.IIIII.
I
------DKSENKKAQSKGKKGPKGKQTEETNQEQIKDNLPAENGETKSEETPASDAAVEKEEVKSE III.IIIIIIIIIIII
I
I1 I
I
I1
I11
IV
VI
V
B HMG-17
chicken PKRK AEGDTKGDKAKVKDZ PQRRSARLSA KPAPPKPEPKPKKAAPK KSEKVPKGKKGKADAGKEGNNPAENGLlAK TDQAEKAEGAGD--AK
IIII Ill1 I I I I I I I I I I IIIIIIIIII IIIIIIIIIIIIII I I
I I I I I I I I l I I I I I I I I l I l l l l l I I l IIII l l l l l l l - - l l
KPAPPKPEPKPKRAPAK KGEKVPXGlOCGKADAGKEGNNPAENGDAKTDQAQKAEGAGIJ-AK
HMG-17 human
PKRK AEGDAKGDKAKVKDE PQRFCSA?&SA
HMG-17
PKRK AEGDAKGDKAKVKDE PQRRSARLSA KPAPPKPEPKPKKAPAK K G E K V P K G K K G K A D A G K G D A K TNQAEKAEGAGD--AK
calf
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I l l ' l l l l l l l l l l l I I1 ill////
I1
I l l I I I I I I I I I I I I I I I IIIIIIIIII IIIIIIIIIIIIIIIII IIIIIIIIIIIIIIIIIIIIIIIIIIIII I I I I I I I I I I I I I I
HMG-17 rat
PRRN AEGDAKGDKAKVKDE PQRRSARGSA KPAPPKPEPKPKKAPAK KGEKVPKGFXGKADAGKDGNNPAEDGDAK TNQAEKAEGAGD--AK
HMG-17 pig
PKRK AEGDAKGDKAKVKDE PQRR-SA
HMG-17 mouse
PKRK AEGDAKGDKTKVKDE PQRRSARLSA KPAPPKPEPKPIUAPAK KGEKVPXGKKGKADAGKDANNPAENGMX TDQAQKAEGAGD--AK
H6
PKRK SAT--KG------DEP W A R L S A RPVP-KPAAICPKIUUULP KU-V-KGCDICIU-----------AENGLlAK AEAKVQAAGDGAGNAK
EXON
trout
'IIIII
PKRK A - P A E G E A K E E - P I S A KPAPPKPEPKPKKUPK KEKAANDKKEDKKAATKGKKGAKGKG-ETK-QEDAKEENESEWGDKKTNE APAAEASDDK-EAKSE
I l l I I I I I I I I I I I I I I I IIOIIIIII IIIIIIIIIIIIIIIII IIIIIIIIIIIIIIIIIIIIIIIIIIIII I I I I I I I I I I I I I I
KPAPPKPEPKPKKAPAK KGEKVPKGKKGKADAGKDGNNPAENGDAK TNQAEKAEGAGD--AK
Ill1 I I I I I I I I I ' I I I I I IIIIIIIIII IIIIIIIIIIIIIIIII
Ill1
II
I
I1
II I IIIIIIII - 1 I II
IIIII I
IIIlIIIIIlIIlIIIIl.IIIIIIIIIl 1.11 I
I IIII.
IIIIIII
V
I
IIIIIII
II
I I I
II
VI
HMG PROTEINS
77
In summary, the HMG-14/-17 family of proteins contains four evolutionarily conserved domains. The charged amino-acid residues are unevenly distributed along the polypeptide chain. There seems to be a correlation between the structure of the gene and that of the protein; some of the evolutionarily conserved protein domains are encoded by distinct exons. According to the “exon shuffling” hypothesis (235), it is conceivable that structural motifs similar to those present in HMG-14/-17 may be found in other proteins. Indeed, one of the proteins interacting with the thyroid hormone receptor in a hormone-dependent manner is highly homologous to HMG-14/-17 (236).
2. STRUCTURAL SPECIFICITY OF HMG-14 AND HMG-17 PROTEINS Although HMG-14 and HMG-17 proteins may have evolved from a common ancestor and have many features in common, structural analysis reveals a clear distinction between them. The two subgroups have less than 60% of their sequence in common. Multiple alignment of the protein sequence of each group (Fig. 13) indicates a high degree of sequence conservation among the HMG-17 and the HMG-14 proteins. In the HMG-17 group the sequences of the chicken, human, calf, rat, pig, and mouse differ from each other by less than 3%. Trout H6 is 62-67% similar to the various members of the HMG-17 group. The HMG-14 group is less conserved. The hydropathy index of the two protein groups is about 20 (indicative of a high content of hydrophilic amino acids); however, the hydropathy index profiles are clearly different, suggesting that the structures of the proteins are distinct from each other (237). Particularly noteworthy is the difference between the two protein groups in the 17 amino acids comprising the C-terminal half of their nucleosomal binding domain, which is encoded by exon IV (Fig. 14). In the HMG-17 group this region contains 7 prolines, whereas the HMG-14 group contains only 3 prolines. In summary, although the HMG-14 and HMG-17 chromosomal proteins
FIG. 13. Multiple alignment of HMG-14 and HMG-17 proteins. The protein sequences obtained from SWISSPROT version 31.0 were aligned with the MACAW program and the alignments were optimized visually. The accession number of the sequences are as follows: P02316, HMG14-BOVIN; P12274, HMG14-CHICK; P12902, HMGl5-CHICK; P05114, HMG14-HUMAN; P18608, HMGI4-MOUSE: P02313, HMG17-BOVIN; P02314, HMG17CHICK; P05204, HMG 17-H UMAN, PO9602, H MGl7-MOUSE : P80272, HMG 17-PIG; P18437, HMGIZRAT; P02315, Hti-ONCMY. Amino acids in the conserved domains are indicated by bold letters. Note that in chicken HMG-14a the region encoded by exon IV is identical to that encoded by exon IV of the HMG-17 group.
78
MICHAEL EUSTIN AND RAYMOND REEVES
HMG-17 (FROM RESIDUE 19) HMG-14 (FROM RESIDUE 14) Exon:
PqRRSARLSA PkRRSARLSA
KPAPpKpEpKPKKApAK KPAPaKvE( )KPKKAaGK
111
IV
FIG. 14. Differences between the HMG-14 and HMG-17 protein groups in the consensus sequence of their nucleosomal binding domains. Lowercase letters indicate positions at which the amino-acid residues differ between HMG-14 and HMG-17. Note that in the C-terminal portion encoded by exon IV, a11 the differences involve proline residues.
are similar in many respects, the two subgroups are clearly distinct. The high degree of sequence conservation, especially in the HMG-17 subgroup, suggests that the proteins are architectural elements in chromatin and that most of the primary sequence is necessary for their proper function. The structural daerences between the proteins and their copresence in every tissue raise the possibility that the two proteins participate in specific interactions, each of which is necessary for proper cellular function.
B. Interaction with DNA and Chromatin 1. COOPERATIVE INTERACTIONS WITH NUCLEOSOME CORES Chromosomal proteins HMG-14 and HMG-17 are located in the nucleus associated with the chromatin fiber. HMG-14/-17 are the only nuclear proteins known that specifically recognize the 146-bp nucleosornal core particle (199,200,238). Both proteins bind to nucleosome cores without any specificity for the underlying DNA sequence, suggesting that they recognize structural features specific to these chromatin subunits. Specific interactions between these proteins and nucleosomal core particles can be detected by mobility shift assays. At low ionic strength the binding of HMG-14 or HMG-17 protein to the nucleosomal cores produces two additional bands of lower mobility corresponding to complexes containing either one or two molecules of HMG protein per core particle. Under cooperative conditions only complexes containing two HMG inolecules per core particle are observed. The dissociation constant for the binding of the proteins to cores at low ionic strength (1.0 x 10-9) is about &th of that at higher ionic strength (1.0x lo-'), (201). The ionic-strength dependent differences in the affinity constants could be explained by assuming that the binding at low ionic strength is stabilized by nonspecific ionic interactions between the protein and the charged residues in the nucleosome core particle. Higher ionic strengths would weaken these interactions and increase the dependence of binding on stringent conservation of the residues in the binding domain. Indeed, the nucleosomal binding domain of the protein is highly conserved during evolution, and single-point mutations in this domain reduce the binding constant of the
79
HMG PROTEINS
proteins to nucleosomes (201).These results suggest that a distinct protein conformation is required for proper binding. Because in solution the proteins behave as random coils, it seems likely that the nucleosomal binding site induces a conformational change in the proteins. The ion concentration required for cooperative binding is close to physiological, suggesting that in the nucleus HMG proteins bind to chromatin in a cooperative fashion. Post-translational modifications of the HMG-14/-17 proteins may affect their interaction with nucleosomes. Of particular interest is phosphorylation of Ser-6 in HMG-14, which is one of the first molecular events associated with the induction of immediate-early genes on mitogenic stimulation (239). Phosphorylation reduces the f i n i t y of HMG-14 to nucleosome core particles (240); therefore, this post-translational modification might result in structural changes in chromatin regions containing HMG-14 protein. As shown in Fig. 15 the cooperative interaction of HMG-14/-17 proteins
QW
Core particles (CR)
J
CPc HMG complexes
Only heterodimers
Random mixture
Only homodimers
FIG. 15. Possible complexes between HMG-14/-17 and core particles. Under cooperative binding conditions, at ionic strength closer to physiological, HMC-14/-17 proteins form nucleosome complexes containing two molecules of HMG protein. The interaction of core particles with a n equirnolar mixture of HMG-14 and HMC-17 could potentially lead to three types of complexes. A nucleosonie core could bind exclusively one molecule of HMG-14 and one of HMG-17 to form heterodimers. A second possibility is that the binding is totally random. The third possibility is that the proteins segregate to form hoinodimer complexes. Recent results indicate that the interaction or core particles with an equimolar mixture of HMG-14 and HMG-17 proteins yields complexes containing, exclusively, either two molecules of HMG-14 or two molecules of HMG-17. The proteins "cross-talk" by inducing allosteric transitions in the nucleosome core particle (241).
80
MICHAEL BUSTIN AND RAYMOND REEVES
with nucleosome cores could lead to nucleosome complexes containing either a random mixture of these HMGs, complexes containing exclusively heterodimers (i.e., one molecule of HMG-14 and one of HMG-17), or complexes containing exclusively homodimers of either HMG-14 or HMG-17. Recent results indicate that the binding of HMG-141-17 to nucleosome cores is not random and that this interaction produces complexes containing either two molecules of HMG-14 or two molecules of HMG-17 (241). These results suggest that in chromatin these proteins may be clustered and associated with specific DNA sequences. Studies with deletion mutants suggest that the formation of homodimeric HMG complexes is not dependent on contacts between the nucleosome-bound HMG-141-17 proteins. Most probably the nucleosome-bound proteins “cross-talk by inducing specific allosteric transitions in the chromatin subunits. 2. THE NUCLEOSOMAL BINDINGDOMAIN OF THE HMG-141-17 PROTEINS The HMG-141-17 proteins bind to nucleosomes through a positively charged domain spanning residues 17 to 47 in the HMG-17 family and residues 12 to 41 in the HMG-14 family (195, 242). This region is evolutionarily conserved and has a characteristic amino acid composition; however, the HMG-14 subgroup is clearly distinct from the HMG-17 subgroup (see Section III,A,2). Studies with synthetic peptides indicated that a 30amino-acid peptide, corresponding to the nucleosomal binding domain of HMG-17, binds specifically to nucleosome cores and retains many of the binding characteristics of the intact protein. Point mutations in this protein region reduced the aflinity of the protein to cores (201). Removal of histone tails by trypsin digestion of nucleosomes abolishes the binding of both the peptide and the intact protein, suggesting that the histone tails are required for binding (195). The finding that a protein region can act as an independent functional domain suggests that the HMG proteins are modular proteins containing several functional motifs. Experiments in progress indicate that the negatively charged C-terminal domain is involved in transcriptional activation (243). 3. THE ORGANIZATION OF HMG-14/-17 IN NUCLEOSOME CORES
A model of the location of HMG-14/-17 proteins in nucleosomes is presented in Fig. 16. This model is based mainly on data obtained by DNase-I and hydroxyl-radical footprinting (196) and on the analysis of DNA-protein and protein-protein cross-links in HMG-nucleosome core complexes. In this schematic model two HMG molecules are bound by their N-ter-
HMC PROTEINS
81
FIG. 16. A model of the organization of HMG-14/-17 proteins in nucleosorne core particles. Two molecules of HMG contact the DNA approximateIy 25 bp from the entry/exit paint of the core (the histones in the octamer are depicted as spheres) and in the two major grooves flanking the dyad ityis of the particle (+). Thus, the HMG proteins may stabilize the structure of the nucleosome by bridging the two DNA strands looping around the histone octarner. Part of the HMG proteins may be in contact with, and cause structural changes in, the histone octamer.
minal regions to the DNA 20 to 30 base pairs from the ends of the core particle DNA, in the region where the DNA starts and ends looping around the histone octamer. The protein loops under one of the DNA strands and emerges on the surface of the central DNA strand in the major groove neighboring the nucleosomal dyad axis. In this way, the protein forms a bridge across two adjacent DNA strands on the front surface of the core particle. As elaborated elsewhere (196), this model is based on the following experimental results: (1) mobility shift assays and DNA cross-linking experiments that indicate that each core particle has two binding sites for either HMG-14 or HMG-17 (195, 199-201, 203, 238); (2) DNase-I digestion and DNA-protein cross-linking experiments that indicate that the two HMGs bind to a region about 20 base pairs away from the end of the core particle DNA (195, 196, 199, 200, 244); (3)DNA-protein cross-linking experiments that indicate that part of the HMG proteins is located at the inner surface of the DNA that faces the histone octainer (244);(4)iinmunochemical experiments that indicate that the DNA-binding domain of the protein is sterically hindered, and the C-terminal region exposed, to antibody binding (245);(5) NMR spectroscopy experiments that indicate that the proteins interact with the core particles through their central, positively charged region (242, 246);
82
MICHAEL BUSTIN AND RAYMOND REEVES
(6) mobility shift, thermal denaturation and DNase-I digestion assays that indicate that a peptide corresponding to the positively charged binding domain (residues 17-47 of HMG-17) of the HMGs mimics the binding of the entire molecule (195); (7) protein cross-linking experiments that indicate preferential interaction with histone H2A (247) and H3 (248); (8) protein cross-linking experiments that indicate that the central region of histone H 3 is near the central region of the HMGs, suggesting that they are located near the dyad axis of the core particle (248). This model is consistent with some of the observation on the effect of HMG-141-17 on the structure of nucleosome cores and chromatin (see Section III,B,4). In addition, as discussed in Section III,C, the model raises the possibility that interactions between histone H I and HMG-14/-17 may affect the structure and the transcription potential of the chromatin fiber. 4. EFFECTOF HMG-141-17 ON THE STRUCTURE OF NUCLEOSOMES AND CHROMATIN The binding of HMG-14/-17 to chromatin subunits increases the stability of these particles and is accompanied by only small changes in the radius of gyration of the chromatin subunit, perhaps due to minor conformational changes (reviewed in 7, 196). The model in Fig. 16 is consistent with these findings. HMG-141-17 proteins bridge two adjacent DNA strands on the surface of the core, and therefore could stabilize the structure of the nucleosome core particles by inhibiting the unraveling of the DNA from the histone octamer. The binding of the proteins would not necessarily cause significant changes in the size or structure of the particle. Neutron scattering experiments on the binding of HMG-14/-17 to salt-washed chromatin suggest that the proteins decrease the mass per unit length of the chromatin fiber without changing the chromatin fiber repeat distance (249). These results are in agreement with studies suggesting that the proteins render the chromatin fiber more susceptible to digestion by several nucleases (11). However, the proteins do not prevent the formation of higher order chromatin structure (250). In summary, the binding of HMG-14/-17 proteins to nucleosomes induces minor structural changes in these particles. These proteins stabilize the structure of the nucleosome subunits and at the same time destabilize the higher order structure of the chromatin fiber. In uitro studies in which HMGs are added to preassembled chromatin may result in structures different from those assembled in the intact cells. Chromatin assembly and maturation is an orderly process involving sequential deposition of the H3-H4 histone tetramer followed by the deposition of two H2A-H2B dimers and establishment of proper nuclear spacing (reviewed in 251,252).Furthermore, the assembly of components into the final chromatin structure may be facilitated by specific factors and could depend
HMG PROTEINS
83
on the concentration of the components in the assembly mixture. For example, competition between binding of transcription factors and histones during chromatin assembly on replicating DNA affects the transcriptional potential of the resulting Chromatin template (253-254). Therefore, studies on the effect of HMG-14/-17 on the structure of chromatin must take into account that these proteins are an integral part of the chromatin fiber and that the kinetics of their assembly into the nucleosome may determine their effect on the structure of chromatin. Indeed, recent studies with chromatin assembled in extracts prepared fi-om Xenopus eggs indicate that HMG-14/-17 proteins are incorporated into niicleosomes prior to completion of chromatin assembly (11, 255). At present, the effect of HMG-14/-17 on the nucleosomal repeat is controversial. Assembly of minichromosomes from double-stranded DNA and an extract prepared from either Xenopus eggs (11, 255) or from Drosophila embryos (256) suggest that the proteins increase the length of the nucleosoma1 repeat and may serve as spacing factors (259, 260). On the other hand, studies in which minichromosomes were assembled from single-stranded M13 plasmids and an extract prepared from Xenopus eggs suggest that the proteins do not affect the nucleosomal repeat (11, 255, 257). The differences in the interpretation of the results may reflect minor differences in the experimental systems. In addition, interpretations of the effects of HMG on the nucleosomal repeat must take into account the molecular effects known to occur during the digestion of chromatin by micrococcal nuclease. As elaborated elsewhere (11, 257), it is known that due to the exonucleolytic activity of this enzyme and the tendency of nucleosome cores to slide, the length of the nucleosomal repeat gradually decreases during the course of digestion (258). Because HMG-14/-17 stabilize the position of the nucleosome core, they could protect the core from exonucleolytic attack and minimize nucleosome “sliding.” Thus, the oligonucleosomes derived from chromatin assembled in the presence of these proteins would be somewhat longer than those assembled in the absence of the proteins. The HMG-dependent increase in the length of the nucleosome multimers could be interpreted as an indication that HMG-14/-17 can act as nucleosomal spacing factors (259, 260). However, as elaborated above and elsewhere (11, 255), this interpretation is difficult to reconcile with the kinetics of chromatin digestion by micrococcal nuclease, and with other contradictory results. Further studies are needed to determine whether HMG-14/-17 proteins alter the nucleosomal spacing in the nucleus. The minichromosomes assembled from M13 DNA, in the presence of HMG proteins, have a more extended conformation than those assembled in the absence of the proteins (11).It has been suggested that the HMGs could
84
MICHAEL BUSTIN AND RAYMOND REEVES
A HISTONES
/
DNA
CHROMATIN ASSEMBLY
B FIG. 17. Effect of HMG-14/-17 proteins on chromatin structure. Cellular chromatin is assembled during replication. Assembly in the absence of HMG yields structure B, which is more compact than structure A, which represents chromatin assembled in the presence of HMG. It is important to note that the length x of the linker region (i.e., the nucleosomal repeat) has not changed. The concept is similar to that presented by Hansen and Ausio (261)for core histone termini. HMG-14/-17 may unfold chromatin by interacting with the termini of core histones (11), with histone H1 (263), or with both, By unfolding the chromatin template, HMG-141-17 proteins enhance the transcriptional potential of chromatin.
unfold the minichromosomes, without changing the nucleosomal repeat by interacting with core histone tails, which may play a role in chromatin folding (11, 255, 261). Likewise it is possible that HMG-14/-17 proteins unfold the chromatin fiber by modlfying the interaction of the linker histone H1 with nucleosomes near the dyad axis (196, 262). Indeed, recent studies with SV40 minichromosomes provide direct evidence that an interplay between HMG-14 and histone H1 affects the rate of RNA polymerase I1 elongation on the chromatin template (263). Figure 17 presents a scheme of the effect of HMG-14/-17 on chromatin structure. In summary, studies on the interaction of HMG-14/-17 with chromatin have to take into account the kinetics of chromatin assembly that occurs during DNA replication. Addition of HMG to preasseinbled chromatin may give a structure similar, but not identical, to that assembled under more physiological conditions (see also Section 111,C). Incorporation of HMG-14/-17 into chromatin during replication unfolds the chromatin fiber without significantly affecting the nucleosomal repeat. These effects may be mediated by interaction with the termini of the core histones or with histone H1. Conceivably, by unfolding the higher order chromatin structure, the proteins may increase the accessibility of target sequences to the transcriptional apparatus and facilitate transcription through a nucleosome.
85
HMG PROTEINS
C. Cellular Function and Mechanism of Action 1. HMG-14/-17
IN
ACTIVE GENES
The presence of HMG-14 and HMG-17 proteins in all the cells of higher eukaryotes suggests that both of these proteins are necessary for proper cellular function; however, in spite of numerous experiments, their role is not fully understood. Most probably, their role in cellular function depends on specific interactions with nucleosoines in chromatin, perhaps through the evolutionarily conserved domains characteristic of this protein family (see Section 111,A). Many of the experimental data available (for a comprehensive review of previous experiments see 1-9) are consistent the possibility that the proteins are involved in some aspect of transcriptional regulation. Weintraub and collaborators were first to suggest that HMG-14/-17 may modulate the chromatin structure of active genes (264). This proposal remained controversial because differences between H MG-free and HMGbound particles could not be demonstrated, and because these proteins did not always affect the DNase-I sensitivity of active genes. The finding that the structure and transcriptional potential of chromatin are dependent on the kinetics of chromatin assembly (11, 255), rather than on the composition of the assembled chromatin, and the tendency of these HMG proteins to migrate and rearrange even at low ionic strength (265)could account for some of the discrepancies in the experimental results obtained by various laboratories. Reconstitution experiments with isolated nucleosomes revealed that HMG-14/-17 proteins preferentially bind to particles enriched in sequences from transcribed genes (199, 266). However, studies with mononucleosoines of the avian P-globin cluster suggested that, although HMG-17 binds to isolated nucleosome core in a tissue-specific manner, this interaction is not always correlated with the DNAse-I hypersensitivity or active gene transcription (267). Thus, nucleosomes containing HMG-141-17 inay have unique features that are preserved even when the proteins have been removed. For example, HMGs inay recognize particles enriched in acetylated histones or with an increased length of linker DNA (238, 268). In these reconstitution experiments it is not clear whether the HMG-14/-17 proteins indeed reassociated with the same sequences they were originally bound to in chromatin. Immunochemical approaches have been used to assess the intracellular distribution of nucleosome-bound HMG proteins. Immunofluorescence studies indicated that antibodies against HMG-14 preferentially stain transcriptionally active regions in polytene chromosomes of Chironomus palliduittatus (269). Microinjection of antibodies to HMG-17 into human fibroblasts inhibited transcription (270). These results are in agreement with
86
MICHAEL BUSTIN AND RAYMOND REEVES
the suggestion that the two proteins are preferentially associated with transcriptionally active chromatin. Immunoaffinity chromatography experiments indicate that chromatin regions containing transcribable genes are only twoto threefold enriched in HMG-14/-17 as compared to total nuclear DNA (271-273). Immune precipitation experiments suggested that HMG-17 protein is clustered downstream from the start of transcription, which is depleted of nucleosomes and HMG proteins (272). These experiments must be viewed with caution because the ionic conditions used could have led to protein rearrangements. The problems associated with protein rearrangements can be minimized by cross-linking the proteins prior to fractionating the chromatin. Using this approach it was found that the transcribed chromatin of chicken embryonic P-globin gene has a 1.5- to 2.5-fold increase in HMG-141-17 content and a 2-fold lower density of H I (274).Because histone H1 compacts the structure of the chromatin fiber, whereas HMG-141-17 may induce an more open conformation, these compositional differences suggest that the chromatin structure of a transcriptionally active gene is indeed significantly different from that of untranscribed genes. The results are also consistent with nucleosome footprinting studies (Section 111,B,3)and recent studies with SV40 minichromosomes (263), which indicate that an interplay between HMG-14/-17 and histone H 1 may affect the transcription potential of chromatin. 2. CHANGESIN HMG-14/-17 DIFFERENTIATION
DURING
CELLULAH
Cellular differentiation is often accompanied by a programmed change in the repertoire of expressed genes. In view of the putative role of HMG-141-17 in chromatin structure and gene expression, it was of interest to study the expression of these HMGs during differentiation. (reviewed in 15). Analyses of the mRNA levels during the course of erythropoiesis (275), myogenesis (276),osteoblast differentiation (277), and the differentiation of several additional cell lines (278) indicate that undifferentiated cells synthesize more HMG mRNA than do differentiated cells. The differentiation-related downregulation in HMG-14/-17 mRNA levels is not due to cell-cycle-associated events. Inhibitors of DNA synthesis do not significantly affect the HMG-14/-17 mRNA levels. However, there seems to be a positive correlation between the rate of cellular DNA synthesis and the rate of HMG mRNA synthesis, suggesting that the levels of HMG-141-17 mRNA may also be regulated by cell-cycle events. The biological significance of the differentiation-related down-regulation in HMG-14/-17 expression is not obvious, in that it is difficult to ascertain whether these changes are a prerequisite, or a consequence, of the differentiation program. This question was addressed in a study in which myoblasts
HMG PROTEINS
87
were transfected with plasmids expressing HMG-14 under the control of the dexamethasone-sensitive MMTV promoter (279). Low levels of dexamethasone do not affect the differentiation of myoblast into myotubes. The transfected cells dfierentiated normally in the absence of the inducer. However, addition of dexamethasone to these cells induced the synthesis of HMG-14 mRNA and inhibited the myogenic process. Revertants of these cells, which lost the ability to synthesize HMG-14 mRNA, were not affected by addition of dexamethasone. These results suggest that myogenic differentiation may require regulated levels of HMG-14 protein. The gene coding for human HMG-14 protein is located on chromosome 21 in a region whose triplication is associated with the etiology of Down syndrome, one of the most common human birth defects. The levels of HMG-14 mRNA and protein are elevated in tissues taken from individuals suffering from Down syndrome (280) and in trisomy-16 mice, an animal model for this human syndrome (279). Because HMG-14 may modulate the structure of active chromatin, an imbalance in this gene may have pleiotropic effects on gene expression, resulting in the complex phenotype characteristic of Down syndrome. However, recent studies indicate that transgenic mice overexpressing human HMG-14 have only very mild abnormalities in their thymus (287). Thus, the experimental data do not suggest that overexpression of HMG-14 by itself has a deleterious effect on differentiation. Perhaps synergistic interactions between elevated levels of HMG-14 and other proteins encoded by genes located on chromosome 21 contribute to the etiology of Down syndrome.
3. HMG-14/-17 ARE NOT CLASSICAL TRANSCRIPTION FACTORS Because the structure of HMG-14/-17 proteins is reminiscent of that of certain transcription factors and because HMG-14/-17 proteins enhance the transcription potential of chromatin templates (see Section III,C,4), it is possible that these proteins can function as transcription factors. The possibility has been examined in Succharomyces cerevisiae cells expressing LexA-HMG fusion proteins, which bind to reporter plasmids containing the P-galactosidase gene downstream from the ZexA operator (137).The LexAHMG fusion protein did not elevate the level of P-galactosidase expressed in the yeast cells, suggesting that the HMG proteins do not function as classical transcription activators. THE TRANSCRIPTIONAL POTENTIAL 4. HMG-14/-17 INCREASE OF CHROMATIN BMPLATES
New insights into the possible role of HMG-14/-17 in affecting the structure and transcriptional potential of chromatin were obtained using minichromosomes assembled in extracts obtained from Xenopus eggs or Dro-
88
MICHAEL BUSTIN AND RAYMOND REEVES
sophila embryos and in SV40 minichromosomes isolated from CV-1 cells. Although some of the components in these assembly systems are not fully characterized, chromatin assembly in cell extracts may provide additional insights that cannot be obtained from chromatin templates reconstituted from purified components. Using a reconstituted Xenopus luevis egg extract chromatin assembly system, in which Xenopus Nl/N2.(H3,H4) complexes and chicken H2A and H2B histones were assembled onto double-stranded DNA, it was found that phosphorylated HMG-141-17 extracted from human placenta can stimulate transcription, perhaps by replacing histones H2A and H2B (281).However, other studies with similar extracts, in which the minichromosomes were assembled from single-stranded templates (11, 255), as well as studies in which Drosophila embryo extracts were used to assemble minichromosomes from double-stranded DNA (256), did not find a requirement for phosphorylation and failed to detect an HMG-14/-17-related decrease in the amount of histones H2A and H2B present in the chromatin templates. Ding et al. introduced the human HMG-14 cDNA into CV-1 cells, which are permissive to SV40 infection, and established cell lines expressing elevated levels of HMG-14 (282).Minichromosomes isolated from these cell lines contain elevated levels of HMG-14 protein. In these minichromosomes, transcription from both the early and late SV40 promoters was increased 2.5 and 5.5 times, respectively, compared to control minichromosomes. Transcription was elevated from chromatin, but not from deproteinized DNA templates. HMG-14 stimulated the rate of RNA polymerase-I1 elongation but not the level of initiation of transcription. Transcriptional enhancement was also observed in experiments in which recombinant HMG-14 protein was added to purified minichromosomes, isolated from nontransfected, parental CV-1 cells. In this experimental protocol, a HeLa cell extract supplies all the components necessary to support RNA polymerase-I1 transcription from SV40 chromatin templates. HMG-14 may alleviate the inhibitory effects of a component present either in the HeLa extract or in the isolated minichromosomes. Recent results suggests that HMG-14 stimulates transcription by negating the repressive effects of the linker histone H1 (263). Similar results were obtained by analyzing the effects of HMG-14/-17 proteins on the polymerase-111-driven transcription of the Xenopus borealis 5-S RNA gene, which was assembled into minichromosomes in a Xenopus lueuis egg extract (11, 255). In these extracts, single-stranded M13 plasmids carrying the 5-S RNA gene are converted into double-stranded DNA and assembled into minichromosomes. During this process transcription factors compete with histones for binding to promoter regions. Transcription occurs
HMG PROTEINS
89
from only a small fraction of the templates in which the transcription factors prevent the assembly of nucleosomes on the promoter regions. Addition of recombinant human HMG-14 or HMG-17 protein to the extracts increases the transcription potential of these minichromosomes, but not that of “naked” double-stranded DNA. The increase in transcription potential is observed only if the HMG proteins are present in the extract during chromatin assembly. Addition of HMG-14/-17 to preassembled minichromosomes did not affect the transcription potential of the minichromosomes. Single round transcription assays indicated that the proteins stimulate transcription by increasing the specific activity, and not the number, of transcribed templates. Structural analysis of these minichromosomes suggested that the specific activity of the template increased because the HMG-14/-17 proteins reduced the compactness of the template. By decreasing the compactness of the templates the proteins facilitate the accessibility of RNA polymerase, and perhaps additional transcription factors, to their target sequences. Similar results were recently described in another experimental system, in which minichromosomes were assembled by a Drosophila embryo extract using double-stranded DNA and exogenously added histones (256). In these experiments recombinant HMG-17 protein, in conjunction with the sequence-specific activator GAL4-VP16, stimulated transcription by RNA polymerase I1 from chromatin, but not from DNA templates. In agreement with the previous results, the protein stimulated transcription initiation only when assembled into chromatin together with histones. Thus, experiments using various assembly systems indicate that HMG-14/-17 proteins can stimulate transcription from chromatin, but not from DNA templates. In most cases the timing of incorporation of the HMGs into chromatin is important. In spite of some variations in the results, most of the data are consistent with the possibility that HMG-14/-17 proteins stimulate transcription by unfolding the chromatin template (11). The ability of HMG-14/-17 to enhance transcription from chromatin templates provides a functional assay for these proteins. Studies with N-terminal and C-terminal deletion mutants revealed that the negatively charged C-terminal region of the proteins is involved in the transcription activation function (11).A peptide corresponding to the nucleosomal binding domain of the protein failed to enhance transcription. In fact, addition of this peptide to an assembly system inhibited the ability of the intact proteins to enhance the transcription potential of chromatin, suggesting that the peptide competitively inhibited the assembly of the intact protein into chromatin. Subsequent studies with shorter peptides indicated that the minimal nucleosomal binding domain spans residues 17-40 of HMG-17. These results suggest that HMG-14/-17 proteins are modular and that the structural domains of this
90
MICHAEL BUSTIN AND RAYMOND REEVES
protein family (see Fig. 12) may correspond to distinct functional motifs. A modular structure may be of advantage for proteins that participate in multiple cooperative interactions. What is the mechanism whereby HMG-14/-17 proteins reduce the compactness of the chromatin fiber? One possibility is that the proteins increase the nucleosomal spacing and reduce the density of the nucleosomes along the DNA fiber (257, 259, 260). Most of the physical measurements and the micrococcal nuclease digestion studies are not consistent with this possibility (11,255). A more plausible possibility is that the proteins modify the interaction of histones with DNA. HMG-14/-17 may affect either the interaction of the core histone tails with DNA (11)or the binding of the linker histone H1 to nucleosomes. The latter interaction is suggested by footprinting studies indicating that both histone H 1 and HMG-14/-17 interact with nucleosomes near the dyad axis (196,262) by immunofractionation studies suggesting that chromatin regions enriched in HMG-14/-17 are depleted of H1(274), and by recent experiments with SV40 minichromosomes which demonstrate that HMG-14 relieves an H1-mediated inhibition of transcriptional elongation (263). Interactions of HMG-14/-17 proteins with histone H1 and with the termini of core histones are not mutually exclusive. Both of these interactions could synergistically act to reduce the compactness of the chromatin fiber and enhance the transcriptional potential of a chromatin template. In view of the many similarities between HMG-14 and HMG-17 it is puzzling that all cells contain both of the proteins. It is well documented that the binding of HMG-14/-17 to nucleosomes is associated with structural changes in these chromatin subunits. Recent findings that these proteins bind to nucleosomes to form specific complexes that contain either two molecules of HMG-14 or two molecules of HMG-17 (241) suggest that each of the proteins induces specific allosteric transitions in the particles. Thus, HMG-14 and HMG-17 may be involved in different functions or affect the transcription of different sets of genes. Indeed, mitogenic stimulation of immediate-early gene transcription is associated with rapid and extensive phosphorylation of HMG-14 but not of HMG-17 (239). Apparently both HMG-14 and HMG-17 are necessary for proper function; however, a gene deletion experiment suggests that HMG-17 protein is not necessary for the in vitro growth of chicken DT40 cells (283). How do these proteins, which bind to chromatin without any specificity for the DNA sequence, recognize transcriptionally active regions in chromatin? One possibility is that the proteins bind to unique regions in chromatin, perhaps those with a unique nucleotide composition or those enriched in histone variants. Indeed, immunoafhity chromatographic studies suggest that the proteins are preferentially associated with nucleosomes enriched in
91
HMG PROTEINS
acetylated histones (268).A second possibility is that the deposition of HMG into chromatin is regulated by cell-cycle events. Because the levels of HMG-141-17 mRNA rise sharply at the GUS boundary (284) it is conceivable that, at this point in the cell cycle, the level of newly synthesized HMG protein also increases. Transcriptionally active genes are preferentially replicated early in S phase; therefore, it is possible that they preferentially assemble into nucleosomes containing HMG proteins. Thus, the HMG content of various chromatin regions may depend on a coupling between the synthesis of the protein and the replication of specific DNA sequences. A coupling between the timing of protein synthesis and chromatin assembly may provide a general mechanism whereby structural protein can be targeted to chromatin regions containing specific DNA sequences. In summary, most of the data suggest that HMG-141-17 proteins indeed are associated with transcriptionally active regions in chromatin and that they modify the structure of chromatin so as to facilitate transcription. The content of HMG-14/-17 in active chromatin is approximately twice that in inactive chromatin. This seemingly small enrichment may have significant effects on the local chromatin structure, especially if the presence of the proteins interferes, or modulates, the binding of histone H1 and is associated with regions enriched in acetylated core histones. Most of the data suggest that the proteins are not functioning as classical transcription factors. The proteins seem to function as architectural components in chromatin, that is, they modify the structure so as to facilitate a function. By reducing the compactness of chromatin they facilitate transcription without actually being a part of the transcription complex.
IV. Summary and Perspective A survey of the literature pertaining to the function of the HMG proteins does not provide a clear answer as to the particular function of these proteins. Most of the data suggest that they are associated with selected regions in chromatin; however, the binding does not seem to be dependent on the DNA sequence. Thus, HMG-1/-2 bind preferentially to regions containing unique DNA conformations or bends. HMG-14/-17 recognize structural features specific to nucleosomes, whereas HMG-I(Y) preferentially binds to regions enriched in AT, From quantitative considerations it is obvious that the proteins are associated with only a subset of the genome. Thus, a major question pertaining to the HMG proteins is elucidation of the mechanisms whereby these proteins are targeted to restricted regions in chromatin. We have suggested that cell-cycle events, in which protein synthesis, or mod-
92
MICHAEL BUSTIN AND RAYMOND REEVES
ification, is coupled to chromatin assembly, may serve as a mechanism whereby architectural proteins can be targeted to specific regions in a fashion independent of the DNA sequence (257). Historically, the HMG proteins were somewhat arbitrarily categorized as a protein group based on certain shared chemical and physical properties (7, 8), without any preconceived notion that various members of the group might also be related in other ways as well, such as by their common ability to recognize variations in DNA structure. Furthermore, it is now apparent that these proteins, as a group, also have the ability to modify the structure of DNA or chromatin and by doing so facilitate specific functions. The question arises whether HMG proteins function only as nonessential “facilitators” to improve a cellular process or if they are components that are necessary to cell survival. For example, it has been suggested that HMG-1/-2 proteins function as DNA chaperons, to bend the DNA and facilitate chromatin assembly (285);yet, nucleosome cores and even chromatin can be assembled in the absence of these proteins. Likewise, HMG-14/-17 proteins enhance the transcription potential of chromatin (11,255, 282); yet, transcription also occurs from templates lacking these proteins. The widespread occurrence of these proteins seems to argue that their presence is obligatory for cell survival; yet HMG-17 is not necessary for survival of chicken D40 cells (283). All higher cells contain not only all the classes of HMG proteins, but also each of the structural homologs (i.e., HMG-1 and HMG-2; HMG-14 and HMG-17; HMG-I and HMG-Y). This strongly suggests that all these proteins are in fact obligatory components and that each member of the family is involved in a particular function or associated with a discrete set of genes. Indeed, immunofluorescence studies indicated that the HMG-11-2 variants are differentially distributed in Chironomus polytene chromosomes (286). Likewise, in vitro binding studies revealed that HMG-14 and HMG-17 bind to nucleosomes to form complexes containing either two molecules of HMG-14 or two molecules of HMG-17 (241). Thus, a second important problem is classification of the genomic regions associated with each type of HMG protein and determination of whether some of these interactions can be altered. However, it is important to note that in some cases the effect of HMG on the activity of a template depended on their kinetics of assembly into chromatin (11, 255). Thus, studies on the function of HMG must take into account not only their location in the genome but also their pathway of assembly into the final chromatin structure. In conclusion, most of the data available on HMG proteins suggest that these proteins are associated with chromatin and that this association affects the architecture and increases the structural complexity of the chromatin fiber. Studies on their function are relevant to the understanding of the role of chromatin in regulating the genetic information encoded in DNA.
93
HMG PROTEINS
ACKNOWLEDGMENT We thank Ms. Sabrina Ferguson for editorial assistance
REFERENCES 1 . K. E. van Holde, “Chromatin.” Springer-Verlag, New York, 1989. 2 . A. Wolffe, “Chromatin: Structure and Function.” Academic Press, San Diego, CA, 1992. 3. T. Owen-Hughes and J. L. Workman. CRC Crit. Reu. Gene Expression 11, 1 (1994). 4. S. M. Paranjape, R. T. Kamakaka and J. T. Kadonaga, ARB 63, 265 (1994). 5. M. Grunstein, Annu. Rev. Cell B i d . 6, 643 (1990). 6. A. P. Wolffe, Cell 77, 13 (1994). 7 . M. Bustin, D. A. Lehn and D. Landsman, BBA 1049, 231 (1990). 8. E. W. Johns, “The HMG Chromosomal Proteins” Academic Press, London, 1982. 9. L. Einck and M. Bustin, Exp. Cell Res. 156, 295 (1985). 10. S. A. Onate, P. Prendergast, J. P. Wagner, M. Nissen, R . Reeves, D. E. Pettijohn and D. P. Edwards, MCBiol 14, 3376 (1994). 1 1 . L. Trieschmann, P. J, Alfonso, M . P. Crippa, A. P. Wolffe and M. Bustin, E M B O J . 14, 1478 (1995). 12. S . J. Fashena, R . Reeves and N . H. Ruddle, MCBiol 12, 894 (1992). 13. D. Thanos and T. Maniatis, CSllSQB 58, 73 (1993). 14. S. John, R. Reeves, J.-X. Lin, R. Child, J. M. Leiden, C. B. Thompson and W. J. Leonard, MCBiol 15, 1786 (1995). 15. M. Bustin, M . P. Crippa and J. M . Pash, CRC Crit. Reu. Eukaryotic Gene Expression 2, 137 (1992). 16. R. Grosschedl, K. Giese and 1. Pagel, Trends Genet. 10, 94 (1994). 16a. 6. H. Goodwin and M. Bustin, in “Architecture of Eukaryotic Genes” (G. Kahl, ed.), p. 187. VCH Press, Germany, 1988. 17. A. D. Baxevanis, S. H. Bryant and D. Landsman, NARes 23, 1019 (1995). 18. A. D. Baxevanis and D. Landsman, NARes 23, 1604 (1995). 19. D. Landsman and M. Bustin, BioEssays 15, 539 (1993). 20. M. Stros, S. Nishikawa and 6. H. Dixon, EJB 225, 581 (1994). 21. L. Wen, J. K. Huang, B. H. Johnson and 6 . R. Reeck, NARes 17, 1197 (1989). 22. A. Majumar, D. Brown, S. Kerhy, I. Rudzinski, T. Polte, Z. Randawa and M. M. Seidman NARes 19, 6643 (1991). 23. M. Kinoshita, S. Hatada, M. Arashima and M. Noda, FEBS Lett. 352, 191 (1994). 24. H . Shirakawa, K.-I. Tsuda and M. Yoshida, Bchem 29, 4419 (1990). 25. C. R. Wagner, K. Hamana and S. C. R . Elgin, MCBiol 12, 1915 (1992). 26. J. R. Wiseniewski and E. Schulze, JBC 267, 17170 (1992). 27. S. S. Ner, M . E. A. Churchill, M . A. Searles and A. A. Travers, NARes 21, 4369 (1993). 28. K. D. Gasser and G. Felix, NARes 19, 2573 (1991). 29. K. D. Grasser, PlantJ. 7, 185 (1995). 30. T. Hayashi, H. Hayashi and K. Iwai, J. Biochem. 105, 577 (1989). 31. I. 6. Schulman, T. Wang, M . Wu. J. Bowen, R. G . Cook, M. A. Gorovsky and C. D. Allis, MCBiol 11, I66 (1991). 32. D. Kolodruhetz and A. Burgum, J B C 265, 3234 (1990). 33. J. F. X. Diflley and B. Stillman, €“AS 88, 7864 (1991). 34. S. Ferrari, L. Ronfani, S. Calogero and M . E. Bianchi, JBC 269, 28803 (1994).
94
MICHAEL BUSTIN AND RAYMOND REEVES
H. Shirakawa and M. Yoshida, JBC 267, 6641 (1992). K. Nightingale, S. Dimitrov, R. Reeves and A. P. Wolffe, unpublished (1996). S . S . Ner, Curt-. B i d . 2, 208 (1992). G. R. Reeck, P. J. Isackson and D. C. Teller, Nature 300, 76 (1982). M . Carballo, P. Puigdomenech and J. Palau, E M B O J . 2, 1759 (1983). P. D. Cary, C. H. Turner, E . Mayes and C. Crane-Robinson, EJB 131, 367 (1983). M. E. Bianchi, L. Falciola, S. Ferrari and D. M. Lilley, EMBO J. 11, 1055 (1992). M. Stros and M. Vorlickova, Znt. J. Biol. Macrornol. 12, 282 (1990). L. A. Kohlstaedt, E. C. Sung, A. Fujishige and R. D. Cole, JBC 262, 524 (1987). L. A. Kohlstaedt and R. D. Cole, Bchem 33, 570 (1994). L. A. Kohlstaedt and R. D. Cole, Bchem 33, 12702 (1994). M. Stros, J. Stokrova and J. 0. Thomas, NARes 22, 1044 (1994). H. M. Jantzen, A. Admon, S . P. Bell and R. Tjian, Nature 344, 830 (1990). V. Laudet, D. Stehelin and H. Clevers, NARes 21, 2493 (1993). M. A. Parisi and D. A. Clayton, Science 252, 965 (1991). A. H. Sinclair, P. Berta, M . S. Palmer, J. R. Hawkins, B. L. Griffiths, M. J. Smith, J. W. Foster, A. M . Frisch, B. R. Lowell and P. N. Goodfellow, Nature 346, 240 (1990). 51. J. Gubbay, J. Collignon, P. Koopman, B. Capel, A. Economou, A. Musterberg, N. Vivian, P. Goodfellow and B. R. Lovell, Nature 346, 245 (1990). 52. A . Travis, A. Amsterdam, C. BBlanger and R. Grosschedl, Genes Deu. 5, 880 (1991). 53. M . van de Wetering, M. Oosterwegel, D. Dooijes and H. Clevers, EMBO J, 103, 123 (1991). 54. D. Kolodrubetz, W. Haggren and A. Burgum, FEBS Lett. 238, 175 (1988). 540. D. Kolodrubetz and A. Burgum, JBC 265, 3234 (1990). 55. S. L. Bruhn, P. M. Phil, J. M. Eissigman, D. E. Houseman and S. J. Lippard, PNAS 89, 2307 (1992). 56. M. Shirakata, K. Huppi, K. Okazaki, K. Yoshida and H. Sakano, MCBiol11, 4528 (1991). 57. H. Weir, P. J. Kraulis, C. S. Hill, A. R. C. Raine, E. D. Laue and J. 0. Thomas, EMBOJ. 12, 1311 (1993). 58. C. M. Read, P. D. Cary, C. Crane-Robinson, P. C. Driscoll and D. G. Norman, NARes 21, 3427 (1993). 59. D. N. Jones, M. A. Searles, 6 . L. Shaw, M. E. Churchill, S . S . Ner, J. Keeler, A. Travers and D. Neuhaus, Structure 2, 609 (1994). 60. M. H. Werner, J. R. Huth, A. M. Gronenborn and 6. M. Clore, Cell 81, 705 (1995). 61. R. Reeves and M. S . Nissen, JBC 265, 8573 (1990). 62. A. D. Baxevanis, S . H. Bryant and D. Landsman, NARes 23, 1019 (1995). 63. S. P. Bell, C. S . Pikaard, R. H. Reeder and R. Tjian, Cell 59, 489 (1989). 64. R. P. Fisher, M. A. Parisi and D. A. Clayton, Genes Dev. 3, 2202 (1989). 65. C. S. Pikaard, L. K. Pape, S. L. Henderson, K. Ryan, M. Paalman, M. A. Lopata, R. H. Reeder and B. Sollner-Webb, Cell Mol. Biol. 10, 4816 (1990). 66. S . Ferrari, V. R. Harley, A. Pontiggia, P. N. Goodfellow, R. Lovell-Badge and M. E. Bianchi, EMBO J. 11, 4497 (1992). 67. M. van de Wetering and H. Clevers, EMBO J. L l , 3039 (1992). 68. J. Guesem, A. Amsterdam and R. Grosschedl, Genes Dev. 5, 2567 (1991). 69. N. Nasrin, C. Buggs, X. F. Kong, J. Carnazza, M. Goebl and M. Alexander-Bridges, Nature 354, 317 (1991). 70. J. L. Kim, D. B. Nikolov and S. K. Burley, Nature 365, 520 (1993). 7 1 . Y. C. Kim, J. H. Geiger, S. Hahn and P. B. Sigler, Nature 365, 512 (1993). 72. D. B. Starr and D. K. Hawley, Cell 67, 1231 (1991).
35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50.
HMG PROTEINS
95
73. N . C. Seeman, J. M. Rosenberg arid A. Rich, PNAS 73, 804 (1976). 74. C. M. Read, P. D. Cary, N . S. Preston, M. Lenicek-Allen and C. Crane-Robinson, E M B O J. 13, 5639 (1994). 75. V. R. Harley, D. I. Jackson, P. J. Hextall, J. R. Hawkins, 6. D. Berkovitz, S. Sockanathan, R. Lovell-Badge and P. Goodfellow, Science 255, 453 (1992). 76. C . 0. Pabo and R. T. Sauer, ARB 61, 1053 (1992). 77. M. E. Bianchi, EMBO J. 7, 843 (1988). 78. M. E. Bianchi, M. Beltrame and G. Paonessa, Science 243, 1056 (1989). 79. M. E. Bianchi, Mol. Microbio/. 14, 1 (1994). 80. L. Falcola, D. Hill, R. Reeves and M. E. Bianchi, unpublished observations (1995). 81. M. E. Biarichi, in “DNA-Proteii1:Structure Interactions” (D. M . J. Lilley, ed.). IRL, Oxford, 1995. 82. M. E. Bianchi and D. M. J. Lilley, Nature 375, 532 (1995). 83. L. Falciola, A. I. H . Murchie, D. M. J. Lilley and M. E. Bianchi, NARes 22, 285 (1994). 84. E. Bonnefoy, M. Takahashi and J. R. Yaniv, J M B 242, 116 (1994). 85. A. M. Segall, S. D. Goodman and H. A. Nash, E M B O J . 13, 4536 (1994). 86. D. M. J. Lilley, Nature 357, 282 (1992). 87. S. L. Bruhn, P. M. Pil, J. M. Eissigman, D. E. Hansrnan and S. J. Lippard, PNAS 89, 2307 (1992). 88. C . S. Chow, C. M. Barnes and S. J. Lippard, Bchern 34, 2956 (1995). 89. P. M. Pi1 and S. J. Lippard, Science 256, 234 (1992). 90. S. F. Bellon and S. J. Lippard, Biophys. Chem. 35, 179 (1990). 91. D. Locker, M. Decoville, J. C. Maurizot, M . E. Bianchi and M. Leng, ] M B 246, 243 (1995). 92. J. C. Huanp, 1).B. Zarnhle, J. T. Reardon, S. J. Lippard and A. Sancar, PNAS 91, 10394 (1994). 93. D. K. Treiber, X. Zhai, H.-M. Jantzen and J. M. Eissigman, PNAS 5672, 5676 (1994). 94. K. Giese, J. Cox and R. Grosschedl, Cell 69, 185 (1992). 95. K. Giese, C. Kingley, J. R. Kirshner and R. Grosschedl, Genes Deu. 9, 995 (1995). 96. T. T. Paull, M. J. Haykinson and R. C. Johnson, Genes Deu. 7, 1521 (1993). 97. T. T. Paull and R. C. Johnson, JBC 270, 8744 (1995). 98. C . S. Chow, J. P. Whitehead and S. J. Lippard, Bchem 33, 15124 (1994). 99. P. M. Pil, C. S. Chow and S. J. Lippard, PNAS 90, 9465 (1993). 100. J. P. Wagner, D. M. Quill and 1).E. Pettijohn, JBC 270, 7394 (1995). 101. T. S. Elton and R. Reeves, Anal. Biochem. 149, 315 (1985). 102. C.-Y. King and M. A. Weiss, PNAS 90, 11990 (1993). 103. C. M. Haqq, C.-Y. King, E. Ukiyama, S. Falsafi, T. N. Haqq, P. K. Donahoe and M. A. Weiss, Science 266, 1494 (1994). 104.’ K. Strauss and J. Maher, Science 266, 1829 (1994). 105. L. G. Sheflin, N. W. Fucile and S. W. Spaulding, Bchem 32, 3238 (1993). 106. L. G. Sheflin and S. W. Spaulding, Bchem 28, 5658 (1989). 107. M. Stros, J. Reich and A. Kolibalova, FEBS Lett. 344, 201 (1994). 108. C . H. Hu, B. McStay, S.-W. Jeong and R. H. Reeder, MCBiol 14, 2871 (1994). 109. D. P. Bazett-Jones, B. Leblanc, M . Herfort and T. Moss, Science 264, 1134 (1994). 110. C . D. Putnam, 6. P. Copehaver, M. L. Denton and G. S. Pikkard, MCBiol 14, 6476 (1994). 111. A. Pontiggia, R. Rimini, V. R. Harley, P. N. Goodfellow, R. Lovell-Badge and M. E. Bianchi, EMBO 1. 13, 6115 (1994). 112. J. B. Jackson, J. M . Pollo,’ lnd H. L. Rill, Bchem 18, 3739 (1979).
96
MICHAEL BUSTIN AND RAYMOND REEVES
113. J. B. Jackson and R. L. Rill, Bchem 20, 1042 (1981). 114. J. Zlatanova and K. E. van Holde, J. Cell Sci. 103, 889 (1992). 115. F. Watt and P. Molloy, NARes 16, 1471 (1988). 116. D. J. Tremethick and P. L. Molloy, J B C 261, 6986 (1986). 117. D. J. Tremethick and P. L. Molloy, NARes 16, 11, 1107 (1988). 118. J. Singh and G. H. Dixon, Bchem 29, 6295 (1990). 119. S. Aizawa, H. Nishino, K. Saito, K. Kimura, H. Shirakawa and M. Yoshida, Bchem 33, 14690 (1994). 120. H. Ge and R. G. Roeder, JBC 269, 17136 (1994). 121. G . Seltzer, A. Goppelt, F. Lottspeich and M. Meisterernst, MCBiol 14, 4712 (1994). 122. K. E. van Holde and J. Zlatanova, BioEssays 16, 59 (1994). 123. D. Krylov, S . Lube, K. E. van Holde and J. Zlatanova, PNAS 90, 5052 (1993). 124. P. Varga-Weisz, K . E. van Holde and J. Zlatanova, JBC 268, 20699 (1993). 125. P. Varga-Weisz, J. Zlatanova, S. Leuba, G. P. Schroth and K. E. van Holde, PNAS 91, 3525 (1994). 126. E. von Kitzing, D. M. J. Lilley and S. Diekman, NARes 18, 2671 (1990). 127. P. Varga-Weisz, K. E. van Holde and J. Zlatanova, BBRC 203, 1904 (1994). 128. R. Tsanev, G. Russev, G . Pashev and J. Zlatanova, in “Replication and Transcription of Chromatin,” p. 124. CRC Press, Boca Raton, FL, 1992. 129. R. Tjian and T. Maniatis, Cell 77, 5 (1994). 130. A. A. Travers, S. S. Ner and M. E. A. Churchill, Cell 77, 167 (1994). 131. S. Waga, S . Mizuno and M. Yosihida, BBRC 153, 334 (1988). 132. S. Waga, S. Mizuno and M. Yosihida, JBC 265, 19424 (1990). 133. B. M. Shykin, J. Kim and P. A. Sharp, Genes Deu. 9, 1354 (1995). 134. J. P. Wagner, C. Kunsch and D. E. Pettijohn, in preparation (1996). 135. S. Zwilling, H. Konig and T. Wirth, EMBO 1. 14, 1198 (1995). 136. Y. Ogawa, S. Aizawa, H. Shirakawa and M. Yoshida, J B C 270, 9272 (1995). 137. D. Landsman and M. Bustin, MCBiol 11, 4483 (1991). 138. T. Lund, J. Holtlund, M. Fredriksen and S. G. Laland, FEBS Lett. 152, 163 (1983). 139. T. H. Rabbits, Cell 67, 641 (1991). 140. F. Strauss and A. Varshavsky, Cell 37, 889 (1984). 141. M. Solomon, F. Strauss and A. Varshavsky, PNAS 83, 1276 (1986). 142. K. A. Johnson, D. A. Lehn and R. Reeves, MCBiot 9, 2114 (1989). 143. T. S. Elton and R. Reeves, A n d . Biochem. 157, 53 (1986). 144. K. R. Johnson, D. A. Lehn, T. S. Elton, P. J. Barr and R. Reeves, JBC 263, 18338 (1988). 145. G. Manfioletti, V. Giancotti, A. Bandiera, E. Buratti, P. Sautiere, P. Cary, C. Crane Robinson, B. Coles and G. A. Goodwin, NARes 19, 6793 (1991). 146. U. A. Patel, A. Bandiera, G. Manfioletti, V. Giancotti, K.-Y. Chau and C. Crane-Robinson, BBRC 201, 63 (1994). 147. R. Eckner and M. L. Birnstiel, NARes 17, 5947 (1989). 148. M. Friedmann, L. T. Holth, H. Y. Zoghibi and R. Reeves, NARes 21, 4259 (1993). 149. T. Lund, J. Holtlund and S . G. Laland, F E B S Lett. 180, 275 (1985). 150. T. Lund, B. S. Skalhegg, J. Holtlund, H. K. Blomhoff and S. G. Laland, EJB 166, 2 1 (1987). 151. R. Reeves, T. A. Langan and M. S. Nissen, PNAS 88, 1671 (1991). 152. M. S. Nissen, T. A. Langan and R. Reeves, JBC 266, 19945 (1991). 153. T. Lund and S. G. Laland, BBRC 171, 342 (1990). 154. L. Meijer, A.-C. Ostvold, S. I. Walaas, T. Lund and S. G . Laland, EfB 196, 557 (1991). 155. K. R. Johnson, S. A. Cook and M. T. Davisson, Genomics 12, 503 (1992).
HMG PROTEINS
97
156. X. Xiang, K. F. Benson and K. Chada, Science 247, 967 (1990). 157. K. F. Benson and K. Chada, Genet. Res. 64, 27 (1995). 158. X. Zhou, K. F. Benson, H. R. Ashar and K. Chada, Nature 376, 771 (1995). 159. A . Lanahan, J. B. Williams, L. K. Sanders and D. Nathans, MCBiol 12, 3919 (1992). 160. S. A. Ogram and R. Reeves, JBC 270, 14235 (1995). 161. L. T. Holth and R. Reeves, unpublished. 162. R. Reeves, Curr. Opin. Cell B i d . 4, 413 (1992). 163. R. Reeves and J. N. S. Evans, unpublished observations (1995). 164. J. R. Karlson, E. Mork, J. Holtlund, S. Laland and T. Lund, BBRC 158, 646 (1989). 165. M. Z. Radic, M. Saghbini, T. S. Elton, R. Reeves and B. Hamkalo, Chrornosoma 101,602 (1992). 166. E. Kas, L. Poljak, Y. Adachi and U. K. Laemmli, E M B O J . 12, 115 (1993). 167. M . Wegner and F. Grummt, BBRC 166, 1110 (1990). 168. J. N. S . Evans, M. S. Nissen and R. Reeves, Bull. M a g n . Reson. 14, 171 (1992). 169. J. N. S. Evans, J. Zajicek, M. S. Nissen, G. Munske, V. Smith and R. Reeves, Int. J. Pept. Protein Res. 45, 554 (1995). 170. B. H. Geierstanger, B. F. Volkman, W. Kremer and D. E. Wemmer, Bchern 33, 5347 (1994). 171. M . E. A. Churchill and A. A. Travers, T l B S 16, 92 (1991). 172. J. W. Brown and J. A. Anderson, JBC 261, 1349 (1986). 173. D. Thanos and T. Maniatis, Cell 71, 777 (1992). 174. J. E. Disney, K. R. Johnson, N . S. Magnuson, S. R. Sylvester and R. Reeves, JCBiol 109, 1975 (1989). 175. Y. Saitoh and U. K. Laemmli, Cell 76, 609 (1994). 176. Y. Saitoh and U. K. Laemmli, CSHSQB 58, 755 (1993). 177. S . M. Gasser and U. K. Laemmli, Trends Genet. 3, 16 (1987). 178. T. S. Elton, Ph.D. Thesis, Washington State University, Pullman (1986). 179. K. Zhoa, E. Kas, E. Gonzalez and U. K. Laemmli, EMBO J. 12, 3237 (1993). 180. R. Reeves, T. S. Elton, M. S. Nissen, 1).Lehn and K. R. Johnson, PNAS 84, 6531 (1987). 181. T. S. Elton, M. S. Nissen and R. Reeves, BBRC 143, 260 (1987). 182. R. H . Russnak, E. P. M. Candido and C. R. Astell, JBC 263, 6392 (1988). 183. C. Tuerk and L. Gold, Science 249, 505 (1990). 184. G. Schroth and R. Reeves, unpublished data (1991). 185. D. G. Skalnik and E. J. Nenfeld, RBRC 187, 563 (1992). 186. N. J. Zeleznik-Le, A. M. Harden and J. D. Rowley, PNAS 91, 10610 (1994). 187. P. Claus, E. Schultze and J . R. Wisniewski, JBC 269, 33042 (1994). 188. M. S. Nissen and R. Reeves, JBC 270, 4355 (1995). 189. R. Reeves and M. S. Nissen, JBC 268, 21137 (1993). 190. T. A. Langan, J. Gautier, M. Lohka, R. Hollingworth, S. Moreno, P. Nurse, M. Mallet and R. A. Sclafani, MCBiol9, 3860 (1989). 191. S. Moreno and P. Nurse, Cell 61, 549 (1990). 192. S. Siino, M . S. Nissen and R. Reeves, BBRC 207, 497 (1995). 193. D. A. Lehn, T. S. Elton, K. R. Johnson and R. Reeves, Biochem. Znt. 16, 963 (1988). 194. K. Wu, F. Strauss and A. Varshavsky, J M B 170, 93 (1983). 195. M. P. Crippa, P. J. Alfonso and M. Bustin, J M B 228, 442 (1992). 196. P. J. Alfonso, M. P. Crippa, J. J. Hayes and M. Bustin, J M B 236, 189 (1994). 197. R. Reeves and A. P. Wolffe, unpublished. 198. A. P. Wolffe and H. R. Drew, PNAS 86, 9817 (1989). 199. G. Sandeen, W. I. Wood and G. Felsenfeld, NARes 8, 3757 (1980).
98
MICHAEL BUSTIN AND RAYMOND REEVES
200. J. K. W. Mardian, A. E. Paton, G. J: Burnick and D. E. O h , Science 209, 1534 (1980). 201. Y. V. Postnikov, D. Lehn, R. C. Robinson, F. K. Friedman, J. Shiloach and M. Bustin, NARes 22, 4520 (1994). 202. A. E. Paton, S. E. Wilkinson and D. E. Olins, JBG258, 13221 (1983). 203. H. Schroter and J. Bode, EJB 127, 429 (1982). 203a. Y. Gu, T. Nakamura, H. Alder, R. Prasad, 0. Canaani, 6. Cimino, C. M. Croce and E. Canaani, Cell 71, 701 (1992). 203b. D. C. Tkachuk, S. Kohler and M. L. Cleary, Cell-71, 691 (1992). 203c. N. R. McCabe, R. C. Burnett, H. J. Gill, M. J. Thirman, D. Mbangkollo, M. Kipiniak, E. van Melle, S. Ziemin-van der Poel, J. D. Rowley and M. Diaz, PNAS 89, 11794 (1992). 203d. M. T. Brown, L. Goetsch and L. H. Hartwell, JCBiol 123, 387 (1993). 203e E. Winter and A. Varshavsky, EMBO J. 18, 1876 (1989). 203j. C. T. Ashley, C. G. Pendelton, W. W. Jennings, A. Saxena and C. V. C. Glover, JBC 264, 8394 (1989). 203g. D. L. Poccia and G. R. Green, TlBS 17, 223 (1992). 203h. M. Suzuki, E M B O J 8, 797 (1989). 203i. V. Delmas, D. G. Stokes and R. P. Perry, PNAS 90, 2414 (1993). 203j. T. Laux, J. Seurinck and R. B. Goldberg, NARes 19, 4768 (1991). 203k. G. Tjaden and 6 . M. Coruzzi, Plant Cell 6, 107 (1994). 2031. J. Nieto-Sotelo, A. Ichida and P. HY Quail, N A R ~ s22, 1115 (1994). 204. M. 2. Whitley, D. Thanos, M. A. Read, T. Maniatis and T. Collins, MCBiol 14, 6464 (1994). 205. H. Lewis, W. Kaszubska, J. F. DeLamarter and J. Whelan, MCBiol 14, 5701 (1994). 206. S. Chuvpilo, C Schomberg, R. Gerwig, A. Heinfling, R. Reeves, F. Grummt and E. Serfling, NARes 21, 5694 (1993). 207. J. Kim, R. Reeves, P. Rothrnan and M. Boothby, Eur. J . Zmmunol. 25, 298 (1995). 208. D. Thanos and T. Maniatis, Cell 80, 529 (1995). 209. W. Du and T. Maniatis, PNAS 91, 11318 (1994). 210. W Du, D. Thanos and T. Maniatis, Cell 74 887 (1993). 211. 6. Ghosh, G. van Duyne, S. Ghosh and P. B. Sigler, Nature 373, 303 (1995). 212. C. W. Mueller, F. A. Rey, M. Sodeoka, 6. L. Verdine and S. C. Harrison, Nature 373, 311 (1995). 213. K. R. Johnson, J. E. Disney, C. R. Wyatt and R. Reeves, Erp. Cell Res. 187, 69 (1990). 214. J. R. Lundherg, J. R. Karlson, K. Ingebrigtsen, J. Holtlund, T. Lund and S. G. Laland, BBA 1009, 277 (1989). 215. V. Giancotti, B. Pani, P. D'Andrea, M . T. Berlingieri, P. P. DiFiore, A. Fusco, G. Veccio, R. Philip, C. Crane Robinson, R. H. Nicolas, C. A. Wright and G. H. Goodwin, EMBO]. 6, 1981 (1987). 216. B. V. Giancotti, M. T. Berlingieri, P. P. DiFiore, A. Fusco, G. Vecchio and C. CraneRobinson, Cancer Res. 45, 6051 (1985). 217. V. Giancotti, E. Buratti, L. Perissin, S. Zorzet, A. Balmain, 6. Portella, A. Fusco and G . H. Goodwin, E r p . Cell Res. 184, 538 (1989). 218. V. Giancotti, A. Bandiera, E. Buratti, A. Fusco, R. Marzari, B. ColesandG. H. Goodwin, EJB 198, 211 (1991). 219. S. D. Goodman, S. C. Nicholson and H. A. Nash, PNAS 89, 11910 (1992). 220. M. J. G. Bussemakers, W. J. M. van de Ven, F. M. J. Debruyne and J. A. Schalken, Cancer Res. 51, 606 (1991). 221. T. Ram, R. Reeves and H. Hosick, Cancer Res. 53, 2655 (1993). 222. Y. Tamimi, H. G. van der Poel, M. Denyn, R. Umbas, H. F. M. Karthaus, F. M. J. Debruyne and J. A. Schalken, Cancer Res. 53, 5512 (1993).
HMG PROTEINS
99
223. M. T. Berlingieri, G. Manfioletti. M . Santoro, A. Bandiera, R. Visconti, V. Giancotti, and A. Fusco, MCBiol 15, 1545 (1995). 224. E. Vartiainene, J. Palvimo, A. Mahonen, A. Linnala Kankkunen and P. Maenpaa, FEBS Lett. 228, 45 (1988). 225. 6. Chiappetta, A. Bandiera, M. T. Berlingieri, R. Visconti, G. Manfioletti, S. Battistd, F. J. Martinez-Tello, M. Santoro, V. Giancotti and A. Fusco, Oncogene 10, 1307 (1995). 226. M. L. Cleary, Cell 66, 619 (1991). 227. H. R. Ashar, M. S. Fejzo, A. Tkachenko, X. Zhou, J. A. Fletcher, S. Weremowicz, C. C. Morton and K. Chada, Cell 82, 1 (1995). 228. G. E. Croston, L. A. Kerrigan, L. M. Lira, D. R. Marshak and J. T. Kadonaga, Science 251, 643 (1991). 229. P. J. Layhourn and J. T. Kadonaga, Science 254, 238 (1992). 230. M. Grunstein, Trends Genet. 6, 395 (1990). 231. G. Felsenfeld, Nature 355, 219 (1992). 232. U. K. Laemmli, E. Kas, L. Poljak and Y. Adachi, Curr. Opin. Genet. Deu. 2, 275 (1992). 233. W. T. Garrard, in “Nucleic Acids and Molecular Biology” (F. Eckstein and D. M. Lilley, eds.), p. 163. Springer-Verlag, Heidelherg, 1990. 234. T. D. Srikantha and M. Bustin, J M B 197, 405 (1987). 234a. T. D. Schneider and R. M. Stephens, NARes 18, 6097 (1990). 235. W. Gilbert, C S H S Q B 52, 901 (1987). 236. J. W. Lee, H. S. Choi, J. Gyuris, R. Brent and D. D. Moore, Mol. Endocrinol. 9, 243 (1995). 237. D. Landsman and M. Bustin, JBC 261, 16087 (1986). 238. S. C. Alhright, J. M. Wiseman, R. A. Lange and W. T. Garrard, JBC 255, 3673 (1980). 239. J. M. Barratt, C. A. Hazzalin, E. Cano and L. C. Mahadevan, PNAS 91, 4781 (1994). 240. S. W. Spaulding, N . W. Fucile, D. P. Bofinger and L. 6 . Sheflin, Mol. Endocrinol. 5, 42 (1991). 241. Y. V. Postnikov, L. Trieschmann, A. Rickers and M. Bustin, JMB 252, 423 (1995). 242. 6. R. Cook, M. Minch, G. P. Schroth and E. M. Bradhury, JBC 264, 1799 (1989). 243. L. Trieschmann, Y. Postnikov, A. Rickers and M. Bustin, Mol. Cell Biol. 15, 6663 (1995). 244. V. V. Shick, A. V. Belyavsky and A. D. Mirzabekov, J M B 185, 329 (1985). 245. M. Bustin, M. P. Crippa and J. M. Pash, JBC 265, 20077 (1990). 246. B. D. Ahercomhie, G. G. Kneale, C. Crane-Robinson, E. M. Bradbury, G. H. Goodwin, J. M. Walker and E. W. Johns, EJB 84, 173 (1978). 247. G. R. Cook, P. Yau, H. Yasuda, R. R. Traut and E. M. Bradbury, JBC 261, 16185 (1986). 248. J. V. Brawley and H. G. Martinson, Bchein 31, 364 (1992). 249. V. Graziano and V. Ramakrishnan, JMB 214, 897 (1990). 250. J. D. McGhee, D. C. Rau and G . Felsenfeld, NARes 10, 2007 (1982). 251. G. Almouzni and A. P. Wolffe, E x p . Cell Res. 205, 1 (1993). 252. S. Smith and B. Stillman, EMBO J. 10, 971 (1991). 253. G . Almouzni, M. Mechali and A. P. Wolffe, EMBO J. 9, 573 (1990). 254. J. Svaren and R. Chalkley, Trends Genet. 6, 52 (1990). 255. M. P. Crippa, L. Trieschmann, P. J. Alfonso, A. P. Wolffe and M. Bustin, EMBOJ. 12, 3855 (1993). 256. S. M. Paranjape, A. Krumm and J. T. Kadonaga, Genes Dev. 9, 1978 (1995). 257. M. Bustin, L. Trieschmann and Y. V. Postnikov, Sernin. Cell Biol. 6, 267 (1995). 258. J. S. Godde and J. Widom, J M R 226, 1009 (1992). 259. H . R . Drew, J M B 230, 824 (1993). 260. D. J. Tremethick and H. R. Drew, JBC 268, 11389 (1993). 261. J. C. Hansen and J. Ausio, TIES 17, 187 (1992).
100
MICHAEL BUSTIN AND RAYMOND REEVES
262. 263. 264, 265.
D. Z. Staynov and C. Crane-Robinson, EMBO J. 7, 3685 (1988). H. F. Ding, M. Bustin and U. Hansen, unpublished (1995). S. Weishrod and H. Weintrauh, PNAS 76, 630 (1979). D. Landsman, E. Mendelson, S. Druckmann and M. Bustin, Exp. Cell Res. 163, 95 (1986). T. W. Brotherton and G. D. Ginder, Bchem 25,3447 (1986). T. W. Brotherton, J. Reneker and 6 . D. Ginder, NARes 18,2011, (1990). N. Malik, M. Smulson and M. Bustin, JBC 259, 699 (1984). R. Westermann and U. Grosshach, Chromosoma 90, 355 (1984). L. Einck and M. Bustin, PNAS 80, 6735 (1983). T. Dorbic and B. Wittig, NARes 14, (1986). T. Dorbic and B. Wittig, EMBO J. 6, 2393 (1987). S. Druckman, E . Mendelton, D. Landsman and M. Bustin, Erp. Cell Res. 166, 486 (1986). Y. V. Postnikov, V. V. Shick, A. V. Belyavsky, K. R. Khrapko, K. L. Brodolin, T. A. Nikolskaya and A. D. Mirzabekov, NARes 19, 717 (1991). M. P. Crippa, J. M. Nikol and M. Bustin, JBC 266, 2712 (1991). J. M . Pash, J. S. Bhorjee, B. M. Patterson and M. Bustin, JBC 265, 4197 (1990). A. R. Shakoori, T. A. Owen, V. Shalhouh, J. L. Stein, M. Bustin, G. S. Stein and J. B. Lian, J . Cell. Biochem. 51, 479 (1993). M. P. Crippa, J. M. Pash, B. I. Gerwin, T. E. Smithgall, R. I. Glazer and M. Bustin, Cancer Res. 50, 2022 (1990). J. M . Pash, P. J. Alfonso and M. Bustin, JBC 268, 13632 (1993). J. M. Pash, T. Smithgall and M. Bustin, Erp. Cell Res. 193, 232 (1991). D. J. Tremethick, JBC 269, 28436 (1994). H. F. Ding, S. Rimsky, S. C. Batson, M. Bustin and U. Hansen, Science 265, 796 (1994). Y. Li and J. B. Dodgson, Mol. Cell Biol. 15, 5516 (1995). M. Bustin, N. Soares, D. Landsman, T. Srikantha and J. M. Collins, NARes 15, 3549 (1987). A. A. Travers, S. S. Ner and M. E. A. Churchill, Cell 77, 167 (1994). U . Grosshach, Sernin. Cell B i d . 6, 237, (1995). M. Bustin et al., DNA Cell Biol. 14, 997 (1995).
266. 267. 268. 269. 270. 271. 272. 273. 274. 275. 276. 277. 278. 279. 280. 281. 282. 283. 284. 285. 286. 287.
FIG.2. Tlie interaction ol‘ hSRY-HM(: with DNA as determineti bv solution N M R [reprochcetl with pcrniission from N’errier Pf d.( 6 0 ) ] .Three ~ i e w s(A-C) of‘the co-complex between the hSRY-HMG peptide and its specific recognition sequelice (5’tlCXACAAAC) are displayetl. The protein is &own as a sclirmatic iibhoii tlrawing in green, and the color coding iiscd for the DNA I~asesis red for A, lilac for T. dark blue tor G , and light blue for C. Side chains that contact the DNA haws are depicted in > e h w in (0. (1)) sliows the same view a s in (C)with the iiiolecular surfice of thcx protein sliowi in gray and the DNA atoms in yellow Tlw patclies ofblric 011 the protein siirfacr indicate the location o f the side chains of four of‘the seven residiw that interact witli the D N A biises.
FIG. 10. Surface representation of an X-ray cr)/stallographic image of the hitterfly-shaped NF-Kb p50 homodimer protein (composed of monomer subunits I and 11) bound to its recognition site in the major groove as viewed down the longituclinal axis of DNA. The unobstructed minor groove shown at the bottom of the figure (shown by an arrow) is the putative binding site for the DNA-binding domain of the HMG-I(Y) proteins in the human p-interferon promoter (13). Reprinted with permission from Nature (Ref’.209). Copyright 1995 Macrnillan Magazines Limited.
Homologous Genetic Recombination in Xenopus: Mechanism and Implications for Gene Manipulation’ DANACARROLL Department of Biochemistry Unioersity of Utah School of Medicine Salt Lake City, Utah 84132
I. Recombination of DNAs Injected into Xenopus Oocyte Nuclei . . . . . . . Mechanism of Recombination in Oocytes . . . . . . . . . . . . . . . . . . . . .
11. 111. IV. V. VI. VII.
Marker Recovery and Mismatch Repair . . . . . . . . . . . . . . . . . . . . . . . . . . . Recombination Activities during Xenopus Development . . . . . . . . . . . . . Natural Function of SSA . . . . . . . . . . ............... A Model Gene-targeting Experiment Summary . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
102 108 114 116 119
124
There are many styles and at least two functions of homologous recombination of chromosomal DNA. In meiosis, crossing over between homologs is required for proper chromosome alignment and segregation (1). In somatic or vegetative cells of many different organisms, recombination is one mode available for repair of damage incurred by DNA, particularly double-strand breaks (DSBs) (2). Although we seek common features in these processes, it is certainly true that recombination events detected in different settings may be mediated by many different mechanisms. In addition, within any particular cell the applicable mechanism depends on the substrates presented-the answer you get depends on how you phrase the question. In this essay, I describe the capabilities of oocytes and eggs from the South African clawed frog, Xenopus laevis, for causing the recombination of exogenous DNA molecules. The focus is on the mechanism of homologydependent recombination as elucidated mostly through experimental results obtained in my laboratory. Recombination in oocytes proceeds by exonuclease resection and the annealing of complementary strands. This mecha1 Abbreviations: DSB, double-strand break; GV, germinal vesicle (the oocyte nucleus); SSA, single-strand annealing.
Progr 3 kb). Recently, intramolecular recombination has been studied by examining deletion between tandem direct repeats within the tetA gene in pBR322 derivatives (Fig. 2). Surprisingly, these tandem direct repeats mediate efficient recAindependent recombination (2,3)whose homology requirement is very limited (3).As shown in Fig. 3, recornbination between tandem repeats (S,) increases sharply when the repeats are lengthened from 14 bp up to 100 bp, with virtually the same frequency in recA- and recA+ cells. Increasing the length of the repeats beyond 100-300 bp gradually induces recombination in recA+ cells without significantly affecting recombination in recA- cells. Therefore, it appears that recombination has a limited dependence on RecA when the repeats are large (>300 bp) (e.g., recombination in recA+ cells is about three- to fivefold more efficient than that in recA- cells when the 436317 8 6
(2535)
’9432
The D-7 region
FIG.2. Structure of pBR322-based plasmid substrates for recombination between direct repeats. The coordinates are those of pBR322. Open arrow, the open reading frame of the tetA gene; filled arrows, direct repeats within the tetA gene; thick line, intervening sequence between the direct repeats. See text for description of the D-7 region and ori. The plasmid substrate contains the blu gene for ampicillin (Ap) resistance (ApR),but the tetA gene is disrupted so that the host would be Ap resistant and tetracycline (Tc) sensitive (ApRTcS).Deletion between the direct repeats would regenerate the tetA gene and the host would be ApRTcR.
258
XIN BI AND LEROY F. LIU
S1
I
X
k 1
2
X I
I
1 oS3
1o
1o
-
- ~ - ~
recA-
1o-6
0
100 200 300 400 5 0 0 600 700
Length of the direct repeat (X) (bp) FIG.3. recA-independent recombination and recA-dependent recombination hetween direct repeats are differentially affected by the length of the repeat and the distance between the repeats. Direct repeats in a substrate are shown as open arrows. S, represents a series of pBR322-based plasmids with tandem direct repeats of various lengths (x) within the tetA gene (see Fig. 2). The vertical bars indicate any pair of homologous segments within the repeats. The distance between them is the length of the repeat (x). S, represents a series of plasmids derived from S, plasmids by inserting a 3872-bp sequence between the direct repeats (see Fig. 2). The frequency of recombination (in logarithm) of the S , and S, series of plasmids is plotted as a function of the length of the repeat. Thin lines, recombination in recA+ cells; thick lines, recombination in recA- cells. This figure is a summary of some of the results described (3).
repeats are -1 kb long). It is noteworthy that efficient reckndependent recombination can occur between tandem repeats of as short as 14 bp. RecA plays a central role in general homologous recombination both as a structural protein and as a reaction catalyst (reviewed in 23-26). It promotes homologous pairing of DNA molecules and catalyzes strand-exchange reac-
red-INDEPENDENT RECOMBINATION
259
tions leading to the formation of heteroduplex DNA in uitro (reviewed in 2326). It is not surprising that (illegitimate or nonhomologous) recombination, which requires little or no homology, is independent of RecA (reviewed in 31). However, efficient recA-independent recombination between substantial homologies (up to 1 kb) is unexpected. Moreover, recombination between tandem direct repeats is also independent of other functions that are important for general recombination, including RecBCD and RecF (2).
C. recA- independent
Recombination between Direct Repeats of DNA Is Reduced by Increasing the Distance between the Repeats
One important feature about recA-independent recombination between direct repeats is that it is affected by the distance separating the repeats (3, 7, 8). In a recA- strain, recombination between direct repeats of various lengths (from 14 to 606 bp) is sharply reduced (to less than 2%)by inserting a 3872-bp-long sequence between the repeats (3) (Fig. 3; compare the thick curves). Shorter insertions exerted a lesser effect (3). This strongly indicates that recA-independent recombination between direct repeats, long or short, is dependent on the distance between the repeats. In a recA+ strain, however, no such distance effect was observed when the repeats were larger than 300 bp, but increasingly greater distance effect was observed as the repeats were shortened (3) (Fig. 3; compare the thin curves). This is probably because recombination between direct repeats in a recA+ strain consists of two components, one recA independent and the other mediated by RecA. When the repeats are long (>300 bp), RecA-mediated recombination is at least as efficient as, if not predominant over, recA-independent recombination and is insensitive to the distance. When the repeats are short (4 kb), and therefore the overall recombination appeared recA dependent.
D. recA-independent Recombination between Direct Repeats of DNA Yields Multiple Forms of Products by an Intramolecular Mechanismjs) 1. THREEBASICFORMS OF PRODUCTS OF WCA-INDEPENDENT RECOMBINATION BETWEEN PLASMID-BORNE DIRECTREPEATS Intramolecular deletion of plasmid-borne direct repeats is predicted to generate a monomeric product with one of the repeats plus any intervening
261
WCA-INDEPENDENT RECOMBINATION
sequence deleted (see M in Fig. 4).Indeed, M has been found to be a major product of recA-independent recombination of plasmid substrates in several studies (2, 3, 12). However, besides M , dimeric forms of plasmids have also been observed as the result of recA-independent recombination (2,3,9-12). In some cases, surprisingly, dimers are the only products of recombination (10-12). Two special dimeric forms named 1 + 2 and 1+3, respectively (1012; Fig. 4), have been observed. Form 1 + 2 is a head-to-tail type of dimer consisting of a monomeric substrate and a monomeric product (M), whereas 1+3 is structurally identical to the product of an intermolecular unequal crossover between the direct repeats. The products M, 1+2, and 1+3 may be formed by different mechanisms or through a common pathway. RECOMBINATION LEADING FORMATION OF THE DIMERICPRODUCTS
2. Ted-INDEPENDENT TO
Is INTRAMOLECULAR
The dimeric nature of the products 1+ 2 and 1+3 may indicate that their formation involves intermolecular recombination. In theory, 1+3 can be formed by an unequal crossover between the direct repeats of two substrate plasmids, whereas 1 + 2 can be formed through recombination between M and a substrate molecule. However, intermolecular recombination is unlike-
f
@
FIG. 4. Structures of the products of recA-independent recombination between plasmidborne direct repeats. Filled arrows. direct repeats; thick line, the intervening sequence hetween the repeats. The open arrow indicates the orientation of the sequence of the plasmid outside of the direct repeats and the intervening sequence; M, the monomeric product of intramolecular deletion; l + 2 and 1+3, the dimeric products, each with deletion and other rearrangements.
262
XIN BI AND LEROY F. LIU
ly to be responsible for the formation of 1+2 and 1+3 for the following reasons.
1. Intermolecular recombination is rare in recA- strains. Virtually no intermolecular conjugational recombination has been observed in recA strains (32).Oligomer formation from monomeric plasmid is recA-dependent (13, 14),and recombination between compatible plasmids is greatly reduced in recA- strains (13, 17). 2. If the hybrid dimer is formed by an intermolecular recombination event between the repeats, increasing the length of the intervening sequence should not have any effect on their formation. However, as discussed above, recombination between direct repeats is greatly reduced as the intervening sequence increases. 3. By examining recombination of two compatible plasmid substrates in the same cell, more evidence has been obtained for the intramolecular nature of recA-independent formation of dimeric products (2, 32a).
E. Structural Factors That Can Influence the Formation of Various Products of re cA - independent Recombination
As discussed in Sections I,B and I,C, the overall frequency of recAindependent recombination between direct repeats is affected by both the length of the repeats and the distance between them. These factors also differentially influence the relative abundance of each form of product of recA-independent recombination (32b) as follows. (1) Recombination between very short tandem repeats (e.g., 14 bp in length) yields exclusively the monomeric product (M). (2) Lengthening the tandem repeats gradually increases the abundance of the dimeric products, most of which is 1+2. For example, when the length of the repeat is in the range of 100-600 bp, 6070% of the products is M, 20-30% is 1+2, and only 0-3% is 1+3. (3) Increasing the distance separating the repeats sharply reduces the abundance of M and increases the abundance of 1+2. When tandem repeats of 559 bp are separated by intervening sequences of 100 bp or longer, the abundance of M is reduced to only a few percent or to zero, whereas 1+2 becomes the predominant product (>go%), and the abundance of 1+3 remains low ( - A
E
C
0
C
-
b
--?Ic
P
3 -aA B
F
G
H
C
A
a
Q’ -bspecial H circle dimer
- A
0
C
-b-c
0
A
-
red-INDEPENDENT RECOMBINATION
289
the requirement that the intervening sequence be relatively short for the RSS model (discussed in Section 11,C). The RSS model also explains the formation of the special dimer of H circle. As shown in Fig. 15D-H, in a replicating H circle monomer, reciprocal switching of the leading and lagging strands between the inverted repeats flanking segment b (or a), and subsequent resolution of the junction, divide the replication bubble into two parts (Fig. 15F and G). Completion of replication of the entire circle generates the special dimer of H circle (Fig. 15H), which has been observed (81).Theoretically, 2n-mers can be formed in this manner but trimers and other oligomers cannot (see Fig. 14). This fits remarkably well with the previously unexplained result that only the special dimers and tetramers of H circles have been observed, but trimeric circles have never been found (81). Our RSS model also explains why H circles can be formed in wild-type cells of Leishmania receiving no drug treatment (81), whereas gene amplification in normal mammalian cells is rare but occurs in tumor cells and drugtreated cells. According to the RSS model, the preexisting inverted repeats at each end of the H locus can mediate the formation of the H circles. Without such extensive inverted repeats, switching may occur only when the replication forks are arrested under abnormal physiological conditions, such as tumorigenesis and drug treatment of the cells. In conclusion, reciprocal switching of the leading and lagging strands of DNA synthesis may underlie the mechanism(s) of certain genome rearrangement and gene amplification events.
ACKNOWLEDGMENTS We thank Shanhong Wan for assistance in preparing the manuscript and Jiaxi Wu for a critical reading of it. Research on D N A recombination in our laboratory is supported by National Institutes of Health Grant GM27731 to L. F. L. FIG. 15. Reciprocal switching of the leading and lagging strands of D N A replication: a model for formation of the H circles in Leishmania. (A) The two strands of the H locus of the chromosome of Leishmania. Thick bars. inverted repeats flanking the segments a and h. (B) The H locus in a replication bubble. Both pairs of the inverted repeats are shown to be near a replication fork. The symbols are as described in Fig. 12 except that the template strands are also drawn as thin lines here for clarity (C) Reciprocal switching of the leading and lagging strands of D N A replication within the pair of the inverted repeats at each fork of the bubble. Note that the two switching events at the two forks do not have to occur simultaneously. ( D ) Resolving the junctions formed due to strand switching leads to excision of the H circle from the chromosome. (E-H) During replication of the H circle (monomer), strand switching within any pair of the inverted repeats can eventually lead to the formation of the special dimer of the H circle. See Fig. 12 for detailed illustration.
290
XIN BI AND LEROY F. LIU
REFERENCES I . R. D. Porter, in “Genetic Recombination” (R. Kucherlapati and 6 . R. Smith, eds.), p. 1. American Society for Microbiology, Washington, DC, 1988. 2 . S. T. Lovett, P. T. Drapkin, V. A. Sutera, Jr. and T. J. Gluckman-Peskind, Genetics 135,631 (1993). 3. X. Bi and L. F. Liu, J M B 235, 414 (1994). 4. X. Bi and L. F. Liu, PNAS 93, 819 (1996). 5. G. R. Smith, Microbiol. Reu. 52, l(1988). 6. 6 . R. Smith, Cell 58, 807 (1989). 7. S. T. Lovett, T. J. Gluckman, P. J. Simon, V. A. Sutera, Jr. and P. T. Drapkin, MGG 245, 294 (1994). 8 . F. Chkdin, E. Dervyn, R. Dervyn, S. D. Ehrlich and P. Noirot, Mol. Microbiol. 12, 561 (1994). 9 . T.-M. Yi, D. Stearns and B. Demple, J. B a t . 170, 2898 (1988). 10. G . L. Dianov, A. V. Kuzminov, A. V. Mazin and R. I. Salganik, MGG 228, 153 (1991). 11. A. V. Mazin, A. V. Kuzminov, G. L. Dianov and R. I. Salganik, MGG 228, 209 (1991). 12. X. Bi, Y. L. Lyu and L. F. Liu, J M B 247, 890 (1995). 13. J. R. Bedbrook and F. M. Ausubel, Cell 9, 707 (1976). 14. H. Potter and D. Dressler, PNAS 74, 4168 (1977). 15. R. A. Fishel, A. A. James and R. Kolodner, Nature 294, 184 (1981). 16. A. A. James, P. T. Morrison and R. D. Kolodner, J M B 160, 411 (1982). 17. A. Laban and A. Cohen, MGG 184, 200 (1981). 18. A. Laban and A. Cohen, MGG 189, 189 (1983). 19. R. Kolodner, PNAS 77, 4847 (1980). 20. C. Luisi-DeLuca, S. T. Lovett and R. D. Kolodner, Genetics 122, 269 (1989). 21, S. K. Mahajan, in “Genetic Recombination” (R. Kucherlapati and 6. R. Smith, eds.), p. 87. American Society for Microbiology, Washington, DC, 1988. 22. A. J. Clark and K. B. Low, in “The Recombination of Genetic Material” (K. B. Low, ed.), p. 155. Academic Press, San Diego, CA, 1988. 23. M. M. Cox and I. R. Lehman, ARB 56, 229 (1987). 24. C. M . Radding, in “Genetic Recombination” (R. Kucherlapati and 6. R. Smith, eds.), p. 193. American Society for Microbiology, Washington, DC, 1988. 25. S. C. West, ARB 61, 603 (1992). 26. S. C. Kowalczykowski and A. K. Eggleston, ARB 63, 991 (1994). 27. S. D. Hall, M. F. Kane and R. D . Kolodner, J. Bact. 175, 277 (1993). 28. S. D. Hall and R. D. Kolodner, PNAS 91, 3205 (1994). 29. R. D. Kolodner, S. D. Hall and C. Luisi-DeLuca, Mol. Microbiol. 11, 23 (1994). 30. M. J. Doherty, P. T. Morrison and R. Kolodner, J M B 167, 539 (1983). 31. N. D. Allgood and T J. Silhavy, in “Genetic Recombination” (R. Kucherlapati and G. R. Smith, eds.), p. 309. American Society for Microbiology, Washington, DC, 1988. 32. B. Low, PNAS 60, 160 (1968). 32a. X. Bi and L. F. Lin, unpublished result. 32b. X. Bi and L. F. Liu, unpublished result. 33. 6. Streisenger, Y. Okada, J. Emrich, J. Newton, A. Tsngita, E. Terazaghi and M. Inouye, C S H S Q B 31, 77 (1966). 34. N. C. Franklin, in “The Bacteriophage Lambda” (A. D. Hershey, ed.), p. 175. CSHLab, CSH, NY, 1971. 35. F.-L. Lin, K. Sperle and N . Sternberg, MCBiol. 4, 1020 (1984).
reCA-INDEPENDENT RECOMBINATION
29 1
S. N. Cohen and A. J. Clark, 1. B a t . 167, 327 (1986). R. Seelke, B. Kline, R. Aleff, P. D. Porter and M. S. Shield, J. B a t . 169, 4841 (1987). J.-F. Viret, A. Bravo and J. C. Alonso, Microbiol. Reo. 55, 675 (1991). M. Matfield, R. Badawi and W. J. Brammar, MCG 199, 518 (1985). P. Balbas, X. Soberon, F. Bolivar and R. L. Rodriguez, in “Vectors: A Survey of Molecular Cloning Vectors” (R. L. Rodriguez and D. T. Denhardt, eds.), p. 5. Butterworth, Boston 1987. 41. D. Brutlag, K. Fry, T. Nelson and P. Hung, Cell 10, 509 (1977). 42. R. D. Wells and R. R. Sinden, in “Genome Analysis” (K. Davies and S. Warren, eds.), Vol. 7, p. 107. CSHLah, CSH, NY, 1993. 43. S. N. Thibodeau, G. Bren and D. Schaid, Science 260, 816 (1993). 44. Y. Ionov, M. A. Peinado, S. Malkhosyan, D. Shibata and M. Perucho, Nature 363, 558 (1993). 45. L. Thompson, in “Genetic Recomhination” (R. Kucherlapati and G. R. Smith, eds.), p. 597. American Society for Microbiology, Washington, DC, 1988. 46. 0. Oishi, in “The Recombination of Genetic Material” (K. B. Low, ed.), p. 445. Academic Press, San Diego, CA, 1988. 47. S. Wolff and P. Perry, Chromosonm 48, 341 (1974). 48. N. L. Craig and N. Kleckner, in “Escherichia coli and Salmonella typhimurium. Cellular and Molecular Biology” (F. C. Neidhardt, et al., ed.), p. 1054. American Society for Microbiology, Washington, DC, 1987. 49. A. C. Glasgow, K. T. Hughes and M. I. Simon, in “Mobile DNA” (D. E. Berg and M. M. Howe, eds.), p. 637. American Society for Microbiology, Washington, DC, 1989. 50. D. J. Savic, /. Bact. 140, 311 (1979). 51. D. J. Savic, S. P. Romac and S. D. Ehrlich, J . B a t . 155, 943 (1983). 52. N. Kleckner and D. G . Ross, JMB 144, 215 (1980). 53. D. 6. Ennis, S. K. Amundsen and G . R. Smith, Genetics 115, 11 (1987). 54. J. M. Louam, J. P. Bouche, F. Legendre, J. Louarn and J. Patte, MGG 201, 467 (1985). 55. J.-E. Robello, V. Fransois and J.-M. Louarn, PNAS 85, 9391 (1988). 56. Y. Komoda, M. Enomoto and A. Tominaga. Genetics 129, 639 (1991). 57. M . Faelen and A. Toussaint, J . Buct. 142, 391 (1980). 58. C. W. Hill and B. W. Harnish, PNAS 78, 7069 (1981). 59. C. W. Hill and J. A. Gray, Ge~ietica119, 771 (1988). 60. M. A. Schofield, R. Agbunag and J. H. Miller, Genetics 132, 295 (1992). 61. J. Zieg, M. Hilmen and M. Simon. Cell 15, 237 (1978). 62. R. C. Johnson, M. B. Bruist, M . B. Glaccum and M. I. Simon, CSZISQB 49, 751 (1984). 63. P. C. Weber, M. Levine and J. C. Clorioso, J . Bact. 170, 4972 (1988). 64. K. E. Kennedy, S. Iida, J. Meyer, M . StBlhammar-Carlemalm, R. Hiestand-Nauer and W. Arber, MGC 189, 413 (1983). 65. F. Bolivar, Gene 4, 121 (1978). 66. P. Prentki, F. Karch, S. Iida and J. Meyer, Gene 14, 289 (1981). 67. S. Iida, J. Meyer and W. Arher. C S l I S Q B 45, 27 (1980). 68. A. Kornberg and T. Baker, “DNA Replication,” 2nd ed. Freeman, New York, 1992. 69. K . J. Marians, ARB 61, 673 (1992). 70. R. Hnlliday, Genet. Res. 5, 282 (1964). 71. L. T. Chow, N . Davidson and D. E. Berg, J M B 86, 69 (1974). 72. D. E. Berg, JMB 86, 59 (1974). 73. D. K. Nag and D. E. Berg, MGG 207, 395 (1987). 74. J. Nalbantoglu and M . Meuth, NARes 14, 8361 (1986).
36. 37. 38. 39. 40.
292
XIN BI AND LEROY F. LIU
75. 0. Hyrien, M. Debatisse, 6. Muttin and B. Robert de Saint Vincent, EMBO J. 7, 407 (1988). 76. 6. R. Stark, M . Debatisse, E. Giulotto and G. M. Wahl, Cell 57, 901 (1989). 77. M. Fried, S. Feo and E. Heard, BBA 1090, 143 (1991). 78. M. Fried, S. Feo and E. Heard, in “Gene Amplification in Mammalian Cells” (R. E. Kellems, ed.), p. 447. Dekker, New York, 1993. 79. G . H . Nonet, S. M. Carroll, M. L. DeRose and G. M. Wahl, Genomics 15, 543 (1993). 80. S. M. Beverley, J. A. Coderre, D. V. Santi and R. T. Schimke, Cell 38, 431 (1984). 81. T. C. White, F. Fase-Fowler, H. van Lumen, J. Calafat and P. Borst, JBC 263, 16977 (1988). 82. M. Quellette, E. Hettema, D. Wust, F. Fase-Fowler and P. Borst, EMBO J. 10, 1009 (1991). 83. M. I. Aladjem and S. Lavi, Mutat. Xes. 276, 339 (1992). 84. C. Passananti, 6. Davies, M. Ford and M. Fried, E M B O J . 6, 1697 (1987). 85. A. B. Futcher, J . Theor. B i d . 119, 197 (1986). 86. A. W. Sjostedt, M . Alatalo, J. Wahlstrom, U. von Dobeln and R. Olegard, fiereditas 111, 115 (1989).
The Elongation Phase of Protein Synthesis CZWORKOWSKI PETERB. MOORE
JOHN AND
Department of Chemistry Department of Molecular Biophysics and Biochemistry Yale University New Haven, Connecticut 06520
I. The Elongation Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. The Two-site Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. How Many Sites Are There? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. Alternatives to the Two-site Model . . . . . . . . . . . D. The Fidelity of Protein Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E. Factor-free Translation . . . . , . , . . . . . . . . . . . . . . . . . . . . . . . . . . . F. Rates, States, and Ener ............................. .... ................. . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . B. The EF-G Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. Antibiotic Inhibitors . . . . . . . . . . . D. Elongation Factor Interactions: The 1060 Region . . . . . . . . . . . . . . . . ............,.... E. Elongation Factor Interactions: The SRL F. Elongation Factor Interactions: The 30-S Subunit . . . . . . . . . . . . . . . G. Structures of the Factors . . . . . . . 111. On the Mechanism of' Elongation . . . . . . . A. On the Placement of Ribosomal Sites 8. Factor and tRNA Orientations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. Models for Translocation . . . . . . . . IV. Concluding Remarks , . . . . . . , . , , . , . . . . . . . , , , . , . . . . . . . . . . . . . , . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I
294 294 296 297 300 303 304 306 306 308 311 313 314 315 316 320 320 322 323 326 326
Elongation is the phase of the protein-synthesis pathway responsible for the growth of nascent polypeptide chains. Not surprisingly, many reviews of this critically important area of biochemistry have appeared in the four decades since it was discovered (1-3). The reason another review is appropriate now is that the structures of several of the macromolecules involved in elongation have been solved recently: the sarcinlricin loop from 23-S128-S rRNAs ( 4 , 5), the elongation factor Tu.GTP complex (6, 7), the elongation factor Tu.GTP*(aminoacyl-tRNA) complex (8), and elongation factor G in Progrrss tn Nucleic Acid Resrarch and Molecular Biology, Val. 54
293
Copyright 0 19% by Academic Press, Inc All rights of reproduction in any form reserved.
294
JOHN CZWORKOWSKI AND PETER B . MOORE
both its nucleotide-free (9)and GDP forms (10).Our goal is to integrate this new information with what is already known about elongation.
1. The Elongation Cycle The emergence of the elongation cycle from obscurity began in the late
1950s as investigations of the mechanism of protein synthesis got under way. By the early 1960s, ribosomes, tRNA, and mRNA-the principal components of the protein-synthesis apparatus-had all been discovered, and their roles understood. The “two-site model” for protein synthesis, elaborated by Watson in 1964 (11), summarized what had been learned, and has provided the conceptual framework for discussions of protein synthesis ever since.
A. The Two-site Model The two-site model postulates that the ribosome has two sites for tRNA binding: one that binds aminoacyl-tRNAs preferentially [the Afmino acid) site] and a second that is specific for peptidyl-tRNAs [the P(eptide) site]. In the earlier literature, the A site is sometimes called the acceptor site, and the P site the donor site. It was proposed that elongation of a polypeptide chain by a single amino acid is accomplished by the sequence of steps shown in Fig. 1. The cycle starts with a peptidyl-tRNA bound to the ribosome’s P site and with its A site vacant. The A site then is filled by an aminoacyl-tRNA that has an anticodon complementary to the mRNA codon that programs the A site. Peptide transfer then ensues. Attack of the amino group of the aminoacyltRNA bound to the A site on the carbonyl group of the ester bond that links the peptide chain to the P-site tRNA transfers the nascent chain to the A-site tRNA and thus extends it by one residue. In the final (translocation) step, the deacylated P-site tRNA is ejected from the ribosome, the peptidyl-tRNA in the A site moves to the P site, and the mRNA advances by one codon. This returns the system to its initial state, and the elongation cycle is repeated as long as the advance of mRNA across the ribosome presents “sense” codons to the A site. Four important properties of the elongation system were recognized in the years immediately following Watson’s review. First, it was discovered that the large ribosomal subunit has peptidyl transferase activity; the ribosome is an enzyme (12, 13). Second, the division of labor between the two subunits of the ribosome was clarified. The small subunit binds mRNA, and mediates mRNA-tRNA interactions. In addition to being the peptidyl transferase, the large subunit is involved in all aspects of the ribosome-tRNA interaction that are not strictly mRNA dependent. Third, peptidyl-tRNA
ELONGATION PHASE OF PROTEIN SYNTHESIS
295
t Pi
U
FIG. 1. The two-site model for elongation. This diagram of the two-site model for elongation employs an iconography used in other figures in this chapter. The ribosome is represented as a rectangular shape, the upper two-thirds ofwhich is the 5 0 4 subunits, and the lower third is the 304 subunit. The tRNA binding sites on the 50-S subunit are explicitly identified, in this instance the A site and the P site. When the two subunits are aligned, as they are here, the corresponding 30-Ssites lie immediately below their 5 0 4 counterparts. tRNAs are represented as bars, and are distinguished by their shading. Amino acids are small circles, usually found associated with the tops of tRNAs (= aminoacyl-tRNA);they are shaded the same as the tRNAs with which they are associated. mRNA is a line that crosses the 3 0 4 subunit horizontally. In some instances, it includes shaded segments that stand for distinct codons. Note that when a tRNA interacts with a codon shaded the way it is, a cognate codon-anticodon interaction is implied. Factors are squares (EF-Tu) or circles (EF-G). Their shading indicates whether they are in the GTP or GDP conformation.
and mRNA move across the ribosome in synchrony during translocation, as the model requires (14-16). Fourth, two soluble proteins promote elongation in the cell: elongation factor Tu (EF-Tu) ( E F - l a in eukaryotes), which delivers aminoacyl-tRNAs to the A site, and elongation factor G (EF-G) (EF-2 in eukaryotes), which catalyzes translocation (1 7). Both consume GTP.
296
JOHN CZWORKOWSKI AND PETER B . MOORE
6. How Many Sites Are There? By the late 1960s, there were voices arguing in favor of additional tRNA sites. Before reviewing these claims, it is important to remember that “sites” and “states” have always been conflated in the protein-synthesis field. A “site” is a place where tRNA binds to the ribosome. A “state” is a set of biochemical properties characterizing a ribosome-bound tRNA. tRNAs change state during elongation; they enter as aminoacyl-tRNAs, are transformed into peptidyl-tRNAs, and exit as uncharged tRNAs. The number of sites they occupy in the process is less obvious, and the use of “site” as a synonym for “state” has tended to confuse phenomena with their explanations. Any ribosome-bound peptidyl-tRNA, or peptidyl-tRNA surrogate that reacts with puromycin (an aminoacyl-tRNA analog) to form peptidyl puromycin, is in the P state, by definition. Again by definition, A-state aminoacyl-tRNAs accept peptides from P-state peptidyl-tRNAs, but because peptide transfer normally occurs as soon as an aminoacyl-tRNA enters the A state, it is observable only under conditions that prevent peptide transfer. For that reason, “A site” is often no more than a synonym for “unreactive with puromycin.” There is compelling evidence that the A state and the P state correlate with distinct ribosomal sites. Nascent peptides are attached to the ribosome at all times through the tRNAs to which they are covalently bonded, and peptidyl-tRNAs must interact with aminoacyl-tRNAs to form new peptide bonds. Because the ribosome must accommodate (at least) two tRNA molecules simultaneously, there must be (at least) two nonoverlapping sites for tRNA binding on the ribosome. As early as 1965, there was evidence for a third site, which is specific for deacylated tRNA (18).It is called the E(xit) site, and there is clear reason for thinking of it as a site, not just a state; poly(U)-programmed 7 0 3 ribosomes bind three equivalents of deacylated phenylalanine tRNA (19, 20). The existence of the E state/site has been challenged (21, 22), but is now generally accepted (see 23, 24). The deacylated, P-site-bound tRNAs created by peptide transfer enter the E site during translocation, and leave the ribosome from that site (25). Some feel that there is a fourth site, through which aminoacyl-tRNAs pass on their way to the A site. It has been called the E(ntry) site (26),or the R(ecognition) site (27),or the T(ransfer) site (28).There are certainly grounds for believing in a T (or R or E) state; aminoacyl-tRNAs bound to ribosomes complexed with EF-Tu do not accept peptides from peptidyl-tRNA (29). It follows that tRNAs in this condition are not in the A state. However, it is not obvious that tRNAs in the T state occupy a T site.
ELONGATION PHASE OF PROTEIN SYNTHESIS
297
The amino-acid end of aminoacyl-tRNA makes such intimate contact with protein in EFTu*GTP*aa-tRNAcomplexes (8; see Section 11,G) that it might not participate in peptide transfer even if T-state aminoacyl-tRNAs occupy the A site. The argument that EF-Tu binds to the ribosome far from the A site (27) is also questionable (see Section 111,A). Furthermore, chemical protection data suggest that the anticodon end of T-state tRNAs occupy the A site on the 30-S subunit (28).The T state and the A state are distinguished entirely by differences in acceptor-end protections, which are likely to be affected by the presence of EF-Tu in any case. The only evidence for the T site that we find at all persuasive is the observation that the initial phase of EFTu-GTP-aa-tRNA binding to the ribosome is not inhibited when the A site is occupied by tRNA (30),but it is unclear (to us) that the transient state seen when the A site is empty has the same spectral properties as that observed when the A site is occupied by a tRNA. Nor is it obvious that the transient in question is stable enough to justify identifying it with a distinct site. Entry into both the A site and the P site is controlled by mRNA interactions, which implies that small subunit interactions contribute to both (31, 32). Because both P-site and A-site tRNAs must interact with the peptidyl transferase, those sites must have large subunit components also. All are agreed that the large subunit contributes much of the E site; the binding of tRNA to 50-S subunits reported in the early 1960s (33) is almost certainly E-site binding (34).Furthermore, E-site binding is not seen if alterations are made to the CCA end of tRNA, which interacts with the large subunit during elongation (35, 36). It does appear, however, that E-site binding is stabilized by mRNA interactions, which implies some degree of small subunit involvement (25, 37, 38). A fifth site, the S site, was adumbrated recently, which is specific for deacylated tRNA, but because this site does not appear to be involved directly in elongation, it is not considered further (39).
C. Alternatives to the Two-site Model Although it is evident that the two-site model for elongation is no longer adequate, no consensus has emerged about what should replace it. Two models are currently under active consideration: the “hybrid-sites model” (28) and the “allosteric three-sites model” (24). Study of the effects of tRNA and elongation factor binding on the reactivity of rRNA bases has generated the findings on which the hybrid-sites model is based (40).tRNAs appear to interact with ribosomes primarily at their ends. The 16-S-rRNA-reactivity changes observed when tRNAs bind to ribosomes depend entirely on interactions involving the anticodon stem/ loop of tRNA, and the 23-S-rRNA alterations seen reflect interactions with
298
JOHN CZWORKOWSKI AND PETER B. MOORE
the CCA end of tRNA only. Further, tRNAs bound to the ribosome in the A, P, E, and T states can be distinguished by their protection patterns. The A, P, and T states have signatures on both subunits whereas the E state has a large subunit fingerprint only. [“State” is used here because there is no evidence that the reactivity patterns observed report directly on tRNA position.] Finally, elongation involves hybrid states, i.e., states that can be explained by postulating that tRNAs bind to one site on the small subunit while they bind to a different site on the large subunit (28). Just prior to peptide-bond formation, for example, the aminoacyl-tRNA is A state at both ends, and a peptidyl-tRNA is P state at both ends. Immediately following peptidyl transfer, the acceptor end of the newly deacylated tRNA is in the E state on the large subunit while its anticodon stem/loop remains in the P state on the small subunit. At the same time, the CCA end of the new peptidyl-tRNA is P state on the large ribosomal subunit, while its anticodon end remains in the A state on the small subunit. EF-G catalyzes the resolution of these hybrid states; after translocation, the deacylated tRNA is exclusively E state while the peptidyl-tRNA is now P state at both ends. If one equates “states” with “sites,” the hybrid-sites model emerges (Fig. 2). There is physical evidence that tRNAs move when peptide transfer occurs, but that the peptide hardly moves at all, as the hybrid-sites model requires (41-43). It should also be noted that it was proposed almost a decade ago that the large ribosomal subunit contains a “tunnel” connecting the peptidyl transferase region with the back side of the subunit where nascent peptides first become exposed to solvent (44-46). If nascent peptides must be threaded through this tunnel, they ought not to move very much during a single iteration of the elongation cycle. The hybrid-sites model explains the processivity of protein synthesis, which the two-site model does not. Transfer RNAs are attached to the ribosome at one end or the other throughout their passage across the surface of the ribosome. In addition, if one hypothesizes that the creation and annihilation of hybrid sites depend on relative motions of the two ribosomal subunits, the two-subunit architecture of the ribosome is rationalized as well. However, it is not clear that ribosomal subunits move during elongation. On the one hand, two states of the ribosome have been identified functionally, a pretranslocational state and a posttranslocational state (see Section I, F), and there is physical evidence that they differ conformationally (47). On the other hand, there is almost nothing in the protection data that correlates with that difference in state; the protection data suggest that tRNAs “waIk across a stationary ribosomal surface rather than being moved by subunit rearrangements. Another virtue claimed for the hybrid-sites model is its capacity to ex-
ELONGATION PHASE OF PROTEIN SYNTHESIS
299
FIG.2. A three-site, hybrid-sites model for elongation. This diagram of the three-site, hybrid-sites model for elongation conforms to the iconography described in the legend for Fig. 1 in most respects. Note that three tRNA sites are identified on the 50-S subunit-an A site, a P site, and an E site. Note also that the pretranslocational and posttranslocational states of the ribosome are distinguished. The former is represented as a ribosome whose two subunits do not align, and the latter is represented by a ribosome whose subunits do align. The post- to pretranslocational transition is postulated to occur as aminoacyl-tRNA is delivered to the ribosome, preserving the parallelism that is presumed to exist between the mechanisms of action of EF-Tu and EF-G (see Section 11,G). This distorts the way tRNAs associate with the ribosome; the tRNAs are slanted in the diagram. This distortion is relieved on peptide-bond formation; the tRNAs return to the upright position. Release of deacylated tRNA from the E site is postulated to occur as the ET-Tu ternary complex binds to the ribosome. Obviously other three-site, hybrid-sites model for elongation could be proposed consistent with existing data.
300
JOHN CZWORKOWSKI AND PETER B . MOORE
plain why puromycin reacts with A-site-bound peptidyl-tRNA, albeit very slowly (48,49). It reacts, the argument goes, because the CCA end of A-sitebound peptidyl-tRNA occupies the P site on the 50-S subunit immediately following peptide transfer, leaving the 50-S A site open for puromycin. However, until we know where puromycin is when it reacts with peptidyl tRNA, both before and after translocation, the interpretation of this observation will remain unclear. The allosteric three-site model was formulated to account both for the existence of the E site, and for evidence that occupancy of the E site by deacylated tRNA reduces the affinity of the A site for aminoacyl-tRNA, and vice versa (24, 50). Among other things, the negative cooperativity of the two sites explains why the number of tRNAs bound to elongating ribosomes never exceeds two. When aminoacyl-tRNA binds to the A site of a ribosome that already has tRNAs in its P and E sites, the affinity of the E site for deacylated tRNA drops and the tRNA leaves the ribosome. However, in vitro protein-synthesizing systems are notoriously sensitive to experimental conditions, and others have found it dimcult to confirm that the negative cooperativity on which the model depends actually exists (51). Although advocates of the hybrid-states and allosteric three-sites models appear to regard them as being in competition, there is no reason for doing so. The two models explain nonoverlapping bodies of data: chemical probing data in the case of the hybrid-sites model, and enzymatic data in the case of the allosteric three-sites model. Even if tRNAs bind to hybrid sites during elongation, there is no reason why the A site and the E site might not interact the way the allosteric three-site model requires. The problems that need to be addressed are whether tRNAs really do occupy hybrid sites during elongation, and whether the A site and E site really do interact. How robust are the Moazed-Noller protection experiments (40), for example? Surprisingly, this group’s conclusions about prokaryotic elongation factor binding (52)are not supported by mammalian data reported recently (53). In addition, they trapped the intermediates they characterized under a wide variety of conditions. To what extent do protection patterns depend on conditions? Finally, until the groups studying interactions between the E site and the A site resolve their differences, we should reserve judgement about the allosteric component of the allosteric threesites model.
D. The Fidelity of Protein Synthesis It has been understood since the early 1960s that the accuracy with which mRNA sequences are translated into amino-acid sequences depends ultimately on base-pairing between mRNA codons and tRNA anticodons. If the rate at which elongating ribosomes make errors [about one wrong amino
30 1
ELONGATION PHASE OF PROTEIN SYNTHESIS
acid incorporated for every lo3- 104 residues of protein synthesized (%)I were determined entirely by codon-anticodon affinities, the difference in binding free energy between cognate tRNAs and closely related, noncognate tRNAs for mRNA-programmed ribosomes would have to be at least 18 kJ/mol (at 310 K). In solution, the entire free energy of codon-anticodon binding is barely that large (55), and the differences in free energy between cognate and near-cognate pairings are 10 kJ/mol or less (56). However, basepairing energies are context dependent, and in the context provided by the ribosome, codon-anticodon discriminations exceeding 18 kJ/mol can be observed, but only if measurements are made under conditions that prevent peptide bond formation (57). Although the capacity of mRNA-programmed ribosomes to bind tRNAs differentially is fundamental to fidelity, error rates cannot be predicted on the basis of binding constants alone. Fidelity is a kinetic phenomenon, not a thermodynamic one, and no translating system that makes protein at a finite rate and that uses a single binding step to discriminate between tRNAs can possibly have an error rate as favorable as that predicted from differences in equilibrium binding constants (58, 59). Indeed, as expected, actively translating ribosomes bind tRNAs with a discrimination that is much lower than the thermodynamic ideal (57). It is easy to understand why fidelity depends on the rate of peptide-bond formation. Suppose a Michaelis-Menten mechanism explained amino-acid incorporation:
A,
k,
+ R,
k-
A,
k2
R, + P,
1
where A, is a charged tRNA, R, is a ribosome carrying a peptidyl-tRNA in its P site that is programmed to accept A,, and P is the ribosome after its nascent polypeptide chains has been extended by one residue. Suppose in addition that the corresponding rate constants for a noncognate aminoacyltRNA, A,, binding to the same ribosome are k ; , k l , , and k,. If the two kinds of aminoacyl-tRNA compete for the same site on the ribosome and the rate of peptide-bond formation is the same for all aminoacyl-tRNAs once they reach the A site, then when [A,] = [A,], the steady-state ratio of “wrong” amino acids incorporated to “right” amino acids incorporated, E , will be:
E
=
k; (k-]
+ k,)/k,(k‘, + k2).
If k, is slow compared to the off-rates for cognate and noncognate aminoacyltRNAs, E = (kik-,)/(k,kL,), which is the ratio of the equilibrium binding constants for the two aminoacyl-tRNAs, “the thermodynamic limit.” If k2 is
302
JOHN CZWORKOWSKI AND PETER B. MOORE
fast compared to the off-rates for the two aminoacyl-tRNAs, the error rate will become the ratio of on-rates (kl/kl), which is likely to be close to 1 because on-rates depend on the frequency of tRNA encounter with ribosomes. Thus, the faster such a system makes peptide bonds, the less accurately it translates. These considerations notwithstanding, mechanisms can be proposed that would enable the translation system to achieve a fidelity exceeding the thermodynamic limit (58, 59). All such “proofreading” mechanisms include one or more intermediate steps between the formation of the initial ribosome.aminoacy1-tRNA complex and the stage at which peptide-bond formation becomes possible. Provided noncognate aminoacyl-tRNAs leave the ribosome faster than cognate aminoacyl-tRNAs at each such step, and provided also that these dissociations are made irreversible by coupling to freeenergy-releasing reaction (e.g., GTP hydrolysis), fidelities significantly greate r than the thermodynamic limit can be achieved. For mechanisms with n energy-dissipating branch steps, the maximum fidelity possible-which is still achievable only in the slow synthesis limit-is [(k;k-l)/(klk’-l)].+’. Some advocates of the concept that proofreading occurs during protein synthesis have proposed that the steps in the elongation cycle responsible for fidelity are (1) the binding of EFTu-GTP-aa-tRNA to the ribosome and (2) peptide transfer (60).Because aminoacyl-tRNAs cannot form peptide bonds when complexed with EF-Tu (see Section I,B), and GTP cleavage is required for EF-Tu release, the two steps are cleanly separated, as they must be if proofreading is to occur. Plausible as this hypothesis may seem, it is unlikely to be true. The error rate for the peptidyl-transfer step will be (k, + k,)/(kj+ k,), where k, is the rate constant for peptide-bond formation, and k , and k j are the rate constants for the dissociation of cognate and noncognate aminoacyl-tRNAs from the A site, respectively. Discrimination will be achieved only if k, is slow compared to k, and k j is fast compared to k,. Unfortunately, the half-lives of (cognate) tRNAs bound to the A and P sites are measured in hours (61)whereas the half-life for peptide-bond formation is tens of milliseconds (see Section 1,F). If noncognate tRNAs had half-lives 10-6 of that of cognate tRNAs, as they would have to if peptide transfer is be selective, there would be no need for proofreading in the first place (see 62 for further discussion). Recently, advocates of the allosteric three-site model have suggested that the reason the proofreading step in the elongation cycle has not been identified is that proofreading does not occur (24). Evidence has been advanced that the allosteric decrease in A-site &nity for aminoacyl-tRNAs induced by E-site occupancy affects generic tRNA-ribosome interactions, not the codon-dependent interactions on which tRNA discrimination depends. If the
ELONGATION PHASE OF PROTEIN SYNTHESIS
303
strength of these nonspecific interactions are reduced, the argument goes,
the capacity of the A site to discriminate between cognate and noncognate tRNAs should go up. Consistent with this argument, E-site occupancy does indeed appear to increase fidelity (63). It would be premature to conclude proofreading does not occur, however. If E-site binding resulted in an inrelative to k,, k;, and k,, fidelity could indeed increase of k-, and ,k crease, but the tRNA dissociation rate data just discussed implies that the effect would have to be huge in order for ribosomes to operate as close to the thermodynamic limit as they must to achieve observed error rates. Those skeptical of proofreading must also explain why misincorporation is associated with increased GTP consumption by EF-Tu, as the proofreading hypothesis predicts. In nontranslocating systems, incorporation of nearcognate amino acids is associated with a 10-fold increase in GTP hydrolysis per amino acid incorporated, relative to cognate incorporation, and, interestingly, aminoacyl-tRNAs whose anticodon sequences that do not pair at all with an mRNA do not stimulate GTP consumption (64). For systems translocating normally, the level of stimulation of GTP cleavage by near-cognate tRNAs is about 50-fold relative to that of cognate tRNAs (65). Furthermore, the “extra” GTP hydrolysis associated with miscoding ceases in the presence of streptomycin, which is known to stimulate miscoding (60, 66). We conclude that proofreading occurs during translation, but are acutely aware of our inability to specify how.
E. Factor-free Translation In the 1960s, it was discovered that translation can occur in the absence of factors (67-69). Ribosomes bind mRNAs such as poly(U) spontaneously; once an mRNA is in place, the A and the P sites readily accept tRNAs in an mRNA-encoded fashion in the absence of EF-Tu. If the P site is filled with a peptidyl-tRNA or a peptidyl-tRNA surrogate (the P site always fills first) and the A site is filled by an aminoacyl-tRNA, the ribosome will catalyze peptidebond formation, and translocation can then ensue in the absence of EF-G. This process, which is called “factor-free” or “nonenzymatic” translation, is much slower than factor-assisted translation at physiological temperatures, and it occurs only in the presence of a few mRNAs; poly(U) works, most other messengers do not. However, it is sensitive to antibiotics that inhibit protein synthesis except, predictably, those that interfere with factor function. In addition, nonhydrolyzable GTP analogs, which inhibit elongation factors, have no effect, and the length of the peptides produced depends on mRNA length in the normal manner: one amino acid for every mRNA triplet (70). It is likely, therefore, that factor-free translation is mechanistically similar to normal translation. One concludes that elongation factors must not
304
JOHN CZWORKOWSKI AND PETER B. MOORE
endow ribosomes with (qualitative) properties that they would otherwise lack, e.g., the capacity to translocate. Factors facilitate processes that are inherent in the ribosome. Curiously, the rate of factor-free translation increases significantly if ribosomes are pretreated with p-(ch1oromercuri)benzoate (PCMB). Ribosomal protein S12 is the target of PCMB action, and ribosomes lacking S12 also show a high rate of factor-free translation (71).It is interesting that S12 can be cross-linked to EF-G (72, 73),and there is also a connection between S12 and fidelity. Many mutants resistant to streptomycin, an antibiotic that stimulates miscoding, are mutants in ribosomal protein S12, and some of these make coding errors at rates below normal (74).Even more curious, PCMBstimulated, factor-free translation is characterized by a miscoding level well below that seen when factors are present (70).This could be due to the slow rate of factor-free translation, but it is far from obvious that this is the case.
F. Rates, States, and Energies Ideally, one would like to know the rate constants for all of the steps of the elongation cycle in a translation system that is making protein at physiological rates. Unfortunately, most of the kinetic data in the literature were obtained under suboptimal conditions with systems inhibited from making protein in some way so that individual steps could be studied. The interpretation of those data is further complicated by the difficulty that attends the preparation of ribosome populations in which even a majority of particles are active, and by the sensitivity of rate constants to ionic conditions. The only in vivo rate we know is the overall rate of polypeptide elongation; it is between 10 and 20 residues per second. We also know that elongation rates of that order can be achieved in vitro (75). Nevertheless, important aspects of protein synthesis have been illuminated by kinetic and thermodynamic measurements. For example, it has long been believed that ribosomes alternate between a pretranslocational state and a posttranslocational state during elongation. The classic experiments done to validate this concept demonstrated the expected variation in the factor-binding properties of ribosomes at different stages in the elongation cycle (76).However, because they were done using ribosomes that had tRNA bound, and because the occupancy of the A and P sites changes during elongation, it was possible that tRNA rearrangements, not ribosomal conformation changes per se, were responsible for what was seen. Both EF-G and EF-Tu have GTPase activities that are stimulated by tRNA-free ribosomes, their so-called uncoupled (to translation) GTPase activities (17; see Sections II,A and 11,B), and the recent finding that the uncoupled GTPase activities of the two factors interact synergistically has
ELONGATION PHASE OF PROTEIN SYNTHESIS
305
provided persuasive evidence for the validity of the two-state hypothesis. When elongation factors are purified so that cross-contamination is eliminated and the GTPase activities of the two factors are measured under multiturnover conditions where ribosomes are limiting, the activity observed when both EF-Tu and EF-G are mixed with ribosomes is greater than the sum of the activities of the same amounts of the two factors measured separately (77). Because ET-Tu and EF-G compete for the same ribosomal binding site (78-81), the only way they can enhance each other's activities is if each prepares the ribosome to accept the other. Thus EF-Tu must drive empty ribosomes from the posttranslocational state to the pretranslocational state, and EF-G must accomplish the reverse. Furthermore, in order for either factor to have an uncoupled GTPase activity in the absence of the other when ribosomes are limiting-which they do-empty ribosomes must cycle between the two states spontaneously at an appreciable rate. Existing data suggest that at 3 T C , the spontaneous, post- to pretranslocational transition rate is of the order of 1.5 sec-1(77). The data are less clear for the pre- to posttransition, which may proceed at a rate as low 0.15 sec-1. This implies that empty ribosomes prefer the pretranslocational state, consistent with earlier observations (82), and the stimulatory effect of EF-G on EF-Tu activity is greater than the reverse, as expected. Data exist for eukaryotic ribosomes that could be interpreted the same way, but the picture is not as clear at this point (83). The same isomerization has been studied using ribosomes that have mRNA and tRNA bound. The equilibrium constant for the pre- to posttransition has not been measured directly under these conditions, but it is known that the activation energy for the pre- to posttransition in the absence of elongation factors is about the same as that for the reverse. This indicates that the free energy ddference between the two states is small relative to the activation energy, but the activation energy is quite large, about 85 kJ/mol (69, 84, 85). The rate of factor-free translocation is about the same as that estimated for the pre- to posttranslocational transition in empty ribosomes, about 0.08 sec-1 at 37°C (85). The activation energy for EF-G-catalyzed translocation is about 30 kJ/mol (69). EF-Tu delivers aminoacyl-tRNAs to the A site of posttranslocational ribosomes. Once this happens, peptide-bond formation usually follows immediately, and is accompanied by a shift from the post- to the pretranslocational conformation. Which comes first is unknown. The activation energy associated with placement of aminoacyl-tRNA in the A site by EF-Tu has been measured using ribosomes carrying a deacylated tRNA in the presumed canonical P site so that peptide-bond formation will not occur. It is about 35 kJ/mol (85). It appears that this step is rate limiting in elongation (85-87).
306
JOHN CZWORKOWSKI AND PETER B. MOORE
II. Elongation Factors A. The EF-TU Cycle EF-Tu is the most abundant protein in bacterial cells (5 to 10% of the total). There are about 10 copies per ribosome, about as much as there is aminoacyl-tRNA, and most of the EF-Tu in the cell is found complexed with aminoacyl-tRNA (88, 89). In addition to catalyzing the delivery of aminoacyltRNAs to mRNA-programmed ribosomes, the complexation of aminoacyltRNA with EF-Tu protects it from deacylation (90). Experiments done in the 1960s led to the formulation of an EF-Tu cycle (91, 92) that has endured as a paradigm, much as Watson’s two-state model has for elongation (Fig. 3). EF-Tu is a member of the regulatory GTPase, or “G protein,” family; its N-terminal domain is a guanine-nucleotide-binding domain (93, 94). As with other G proteins, the affinity of EF-Tu for its macromolecular ligands is determined by whether it has GDP or GTP bound to it. When complexed with GTP, EF-Tu binds to aminoacyl tRNA, and that ternary complex has high affinity for the A site (or T site?) of the ribosome, provided its tRNA interacts properly with the mRNA codon present in the small subunit A site. The ribosome functions as a GTPase activator protein (GAP) for EFTu-GTP-aa-tRNA. Thus shortly after a cognate EFTu-GTP-aa-tRNA binds to the ribosome, its GTP is cleaved. EFTueGDP dissociates from the ribosome because it has relatively low affinity both for ribosomes and for aminoacyl-tRNA. The ribosomal phase of the EF-Tu cycle has been dissected kinetically into the following steps: (1) initial binding of the ternary complex, which is not codon dependent; (2) codon recognition, which triggers an alteration in the conformation of the tRNA D loop and anticodon; (3) GTP hydrolysis, which is activated by conformational changes induced by codon-anticodon recognition; (4) release of EF-Tu from the ribosome; and (5) full tRNA entry into the A site, which involves a further conformational change in the tRNA (51, 95). One presumes, tentatively, that proofreading occurs at some point during this sequence of steps. The replacement of EF-Tu-bound GDP by GTP is catalyzed by a specific guanine-nucleotide exchange factor (GEF) called EF-Ts. Without it, EF-Tu does not recycle because the affinity of EF-Tu for GDP is considerably higher than for GTP. [The kinetic and thermodynamic constants for the interactions of the EF-Tu cycle have been determined (96-98).] The GTPase activity of EF-Tu, which is normally activated by the ribosome in the presence of aminoacyl-tRNA and mRNA, can also be induced by monovalent cations (99), free ribosomes (100), 5 0 3 ribosomal subunits, 5 0 3 core particles to which ribosomal protein L7/L12 has been added (lo]), and
ELONGATION PHASE OF PROTEIN SYNTHESIS
307
GTP
FIG. 3. The EF-Tu cycle. The only new iconography introduced in this diagram is the symbol for EF-Ts, which is represented a5 an oval. This rendering of the EF-Tu cycle assurne~ that only one GTP is cleaved per aminoacyl-tRNA delivered to the ribosome, but otherwise conforms to the version of the elongation cycle shown in Fig. 2.
the antibiotic kirromycin (102).The ribosome-dependent GTPase of EF-Tu is stimulated by 3’ fragments of aminoacyl-tRNA as small as aminoacyladenosine, and even by unacylated tRNA missing its 3’ CCA end (103-108). If kirromycin and aminoacyl-tRNA are present, either ribosomal subunit can additionally stimulate EF-Tu’s GTPase (109). As expected, the G domain of EF-Tu has nucleotide binding and GTPase activities in the absence of the rest of the protein, but it does not interact productively with tRNA, EF-Ts, or the ribosome. However, the G domain of EF-Tu can be cross-linked to 23S rRNA (40, 93, 110). Although it has long been believed that 1:lcomplexes between EF-Tu, GTP, and aminoacyl-tRNA deliver aminoacyl-tRNA to the ribosome (e.g., 1 1 1 , 1 12), there is evidence that at least under some circumstances it takes two EF-Tus and two GTPs to deliver a single aminoacyl-tRNA. This has been
308
JOHN CZWORKOWSKI AND PETER B. MOORE
demonstrated for poly(U)-programmed translation systems both kinetically and by experiments done using EF-Tu mutants specific for xanthosine triphosphate (75, 113-116). There is physical evidence for the existence of complexes of the form (EFTu.GTP),.aa-tRNA; the stoichiometry of the EFTuetRNA complex depends on temperature (117). In this connection, it is interesting that EF-Tu can form large, filamentous polymers (118, 119)that interact with both nucleotides and EF-Ts (120). Furthermore, dimers of EFTu-EFTShave been observed in Thermus thermophilus (121, 122). It has been suggested that multiple GTP cleavages by EF-Tu occur only in response to specific mRNA sequences (123). There may be a relationship between messages that include homopolymeric stretches, “extra” cleavage of GTP, and frameshifting. Be that as it may, it is essential that additional studies be done to determine how many GTPs per amino acid incorporated are normally consumed by EF-Tu. Until this issue is fully resolved, it will be impossible to give a satisfying account of the mechanism of elongation.
6. The EF-G Cycle
EF-G, like EF-Tu, is a G protein that has a single site that binds both GDP ( K , = 6.7 x 10-7 M ) and GTP ( K D = 1.2 x 10-5 M ) . [The parameters quoted are for T. thermophilus (121).] Under physiological conditions, that site has a GTPase activity that is strongly stimulated when EF-G interacts with empty ribosomes. The uncoupled rate is about 1 GTP per EF-G molecule per second in T. therrnophilus. Estimates of the degree of its stimulation by ribosomes differ enormously because contaminating GTPase activities make it hard to measure the unstimulated rate; it may be as large as 100,000fold (124). Both 70-S couples and 50-S subunits can stimulate this activity, provided ionic conditions are appropriate (see 1).The 303 subunits appear to enhance the 50-S effect by stabilizing the interaction between EF-G and the 50-S subunit (125). Because the GTPase activity of EF-G is also stimulated by solvents like isopropanol, it is probable that EF-G, like EF-Tu, contains all of the groups responsible for catalyzing GTP hydrolysis (126, 127). It is also likely that the uncoupled GTPase activity of EF-G is related to its translocase activity. When protein synthesis is in progress, one GTP is cleaved by EF-G per translocation event (128). G nucleotides modulate the affinity of EF-G for the ribosome, by controlling its conformation. The &nity of EF-G for the ribosome is much higher when GTP analogs such as GMPPNP, GMPPCP, and presumably GTP are bound (k, = 3.6 x 10-5 M ; 129), compared to when it is complexed with GDP (KD too large to measure). Furthermore, the binding of EFG.GMPPCP to pretranslocational ribosomes causes translocation (1,130). Because GMPPCP cannot be hydrolyzed, EF-G remains bound to the ribo-
ELONGATION PHASE OF PROTEIN SYNTHESIS
309
some, and because EF-Tu cannot bind to a ribosome that has EF-G bound (78, 81), elongation stops. The picture of translocation that emerges is outlined in Fig. 4. It begins classically with a ribosome in the pretranslocational state with an empty E site, a discharged tRNA in its P site, and a peptidyl-tRNA in its A site. [The ribosome-bound tRNAs may well be in hybrid states, of course.] The binding of EFG-GTP causes translocation, the resulting shift in ribosome conformation to its posttranslocational state is sensed by EF-G, and its GTPase is activated. Activation of the GTPase activity of EF-G may be associated with the entry of discharged tRNA into the E site: ribosomes carrying deacylated tRNA in their P sites stimulate the GTPase activity of EF-G more strongly than empty ribosomes (131),but this stimulation disappears if the CCA end of tRNA in the P site is damaged (or missing) so that it cannot bind to the E site (132,133).Pairing between the CCA end of tRNA in the E site with 23-S rRNA may be part of the mechanism that triggers the GTP hydrolysis of EFG. In addition, EF-G does not stimulate P-site tRNA release unless the A site is occupied, and the E site does not release tRNAs properly unless EF-G is able to cleave GTP and alter its conformation properly (134).As judged by their anticodon-to-anticodon separation, A-site and P-site tRNAs appear to be translocated simultaneously, but after translocation, the distance between the anticodons of what are now E-site and P-site tRNAs slowly increases (82,
135). GTP cleavage stimulates dissociation of EF-G from the ribosome by enabling EF-G to adopt its low-affinity, GDP conformation in solution. The binding of G D P to EF-G is loose enough so that passive exchange will replace it with fresh GTP, readying it for another round of translocation. Note that EF-G, unlike EF-Tu and all other G proteins except the EF-G eukaryotic homolog EF-2, has no guanine nucleotide exchange protein. Replacement of GDP by GTP occurs spontaneously. As is the case for many other large proteins, domains of EF-G can be isolated from the intact molecule by partial proteolysis (136). Systematic investigation of the enzymatic properties of these domains has recently shown that the N-terminal domain of EF-G, which is its G domain, binds GTP, as expected, but lacks GTPase activity both in the presence and absence of ribosomes. Remarkably, the C-terminal half of the molecule, the part that remains after the G domain has been removed, which does not interact with G nucleotides, promotes (slow) translocation (137). It is important to note that peptidyl tRNAs can be translocated by EF-G in the absence of mRNA; ribosome-catalyzed, factor-dependent synthesis of poly(1ysine) from lysyl-tRNA can be achieved in the absence of poly(A) (138).
310
JOHN CZWORKOWSKI AND PETER B. MOORE
U FIG. 4. EF-6-driven translocation. The events that occur when EF-C interacts with the ribosome are shown in this diagram (see Section 11,B).At least one inhibitor is known for each of the steps identified, and one such inhibitor is identified to the right of each step (see Section 11,C).
This suggests that the translocation mechanism operates on tRNAs; mRNA is dragged along passively. Also, even though there is no evidence that the free energy of EF-G-associated GTP hydrolysis is captured by the elongating
ELONGATION PHASE OF PROTEIN SYNTHESIS
31 1
ribosome directly (see 2), the coupling of the activation of the GTPase activity of EF-G to the posttranslocational state ensures that the posttranslocational state is favored thermodynamically by the presence of EFG-GTP. This guarantees that the factor binding site will be vacated by EF-G when the ribosome is in the posttranslocational state, ready to receive the next EFTu-aa-tRNA complex.
C. Antibiotic Inhibitors Important insights into the factor function have been obtained from studies of antibiotics that inhibit EF-G and EF-Tu. Arguably, the most interesting are fusidic acid and kirromycin (139, 140). Fusidic acid is an EF-G inhibitor. In its presence, a single translocation step occurs, GTP is hydrolyzed, but then elongation stops because EFG.GDP*ribosome.fusidic acid complexes will not dissociate (141). Fusidic acid binds neither to ribosomes nor to EF-G separately, and it will not bind in the presence of GMPPCP. It binds only to ribosome*EFG*GDPcomplexes (142).Furthermore, all known fusidic acid-resistant mutations are EF-G mutations (143).Thus, at the stage fusidic acid binds, the conformation of EF-G must be different from that of ribosome-free EFGeGTP or EFGaGDP, and the transition from the fusidic acid-competent state to the EFG.GDP state must be required for EF-G discharge. [Note that some believe that fusidic acid acts after EF-G has altered its conformation in response to GTP cleavage (134).]It is interesting that fluoroaluminates, which are inorganic phosphate analogs, have the same effect on elongation as fusidic acid; EFGaGDP-AlF, sticks to the ribosome (144). The effect of kirromycin on EF-Tu mirrors the effect of fusidic acid on EF-G. It inhibits protein synthesis by preventing the release of EFTueGDP from the ribosome after it has delivered aminoacyl-tRNA to the A site (145, 146). The amino acid of an aminoacyl-tRNA delivered to the ribosome in the presence of kirromycin cannot participate in peptide transfer; apparently it remains protected by EF-Tu. Nevertheless, puromycin can still react with the peptide moiety of the peptidyl-tRNA in the P site (109).In the absence of ribosomes, kirromycin activates the latent GTPase activity of EF-Tu, and greatly reduces the affinity of EFTu-GTP for aminoacyl-tRNA (102, 147). EF-Tu activity is also inhibited by pulvomycin. It alters the affinity of EF-Tu for G nucleotides, promoting exchange of GDP for GTP, it inhibits EF-Tu’s GTPase, and it weakens the affinity of EFG-GTPfor aminoacyl-tRNAs (148). Thiostrepton, siomycin, micrococcin, and their relatives have also received a great deal of attention. The members of this family of antibiotics bind to a site that is part of the 1060 loop in the 50-S subunit (see Section 11,D) (149). In the presence of thiostrepton, EFGaGTP does not form a stable complex with the ribosome, and the stimulatory effect that ribosomes
312
JOHN CZWORKOWSKI AND PETER B. MOORE
normally have on its GTPase activity is abolished. The effects are reciprocal; thiostrepton does not affect ribosomes to which EF-G is already bound (150). Thiostrepton also inhibits the ribosome binding activity of the EFTu-aatRNA*GTP complex, and the nonenzymatic binding of aminoacyl-tRNA to the A site, but, revealingly, it has no effect on the uncoupled GTPase activity of EF-Tu, either in the presence or absence of aminoacyl tRNA (104, 140, 151). Finally, both EF-G-catalyzed and nonenzymatic translocation are inhibited by thiostrepton (152). Thus it appears that the thiostrepton site not only interacts with elongation factors, but is part of the molecular machinery that enables ribosomes to translocate in the first place. Micrococcin competes with thiostrepton for binding to the ribosome, and has many of the same effects. It differs in one interesting respect, however. Micrococcin stimulates the uncoupled GTPase activity of EF-G rather than inhibiting it (153).This may not be as surprising as it sounds. By weakening the interaction of EFG*GDPwith the ribosome more than it weakens the binding of EFGeGTP to the ribosome, micrococcin could stimulate turnover rather than inhibiting it. On this hypothesis, the only difference between the two drugs would be that thiostrepton is a stronger inhibitor of binding. (It is worth pointing out that “does not bind” or “does not interact” in one report can be equivalent to “binds weakly” in another. Furthermore, there are species-dependent variations in the sensitivity of translation systems to inhibitors. For both reasons, reports about the physiological effect of an antibiotic can differ quatitutively from species to species, even though it affects the translation system in fundamentally the same way in all.) Viomycin has binding sites on both ribosomal subunits (154);in its presence both initiation and elongation are abnormal (139).In protein-synthesizing systems poisoned with viomycin, peptidyl-tRNA is found in the A state; translocation is inhibited (155). However, EF-G still interacts with the ribosome, and its uncoupled GTPase activity is undiminished. It appears that viomycin allows EFG-GTP to interact with the ribosome, but uncouples translocation from activation of the GTPase activity of EF-G. Viomycin also inhibits EF-Tu-dependent delivery of aminoacyl-tRNA to ribosomes whose E sites are occupied, but not those whose E sites are empty (152). Thus, viomycin may inhibit the pre- to posttranslocational transition in both directions in the presence of factors; it does so in factor-free translocation systems (156). Spectinomycin is a 30-S-related aminoglycoside antibiotic that appears to inhibit the 30-S subunit. Although it does not induce translational errors, many of its resistance mutations map to ribosomal protein S5, which is a member of the S4-S5-S12 group that influences the fidelity of translation. Its mechanism of action has something to do with fusidic acid. Some mutants resistant to spectinomycin show enhanced sensitivity to fusidic acid, and
ELONGATION PHASE OF PROTEIN SYNTHESIS
313
some fusidic acid-resistant mutants have reduced sensitivity to spectinomycin (157). What this phenomenology emphasizes is that translocation involves both subunits.
D. Elongation Factor Interactions: The 1060 Region The observations that led to the identification of the 1060 region with factor-binding began with the discovery that ribosomes lacking proteins L7/L12 do not stimulate the uncoupled GTPase activities of the two elongation factors (158-160). Note, however, that EFTu-aa-tRNA*GTPbound to appropriately programmed 30-S subunits will hydrolyze its GTP if 50-S subunits lacking L7/L12 are added, but turnover does not occur (161). Ribosomes normally contain a tetramer of L7/L12, which is the only ribosomal protein present in multiple copies. Not only is L7/L12 important for factor activity, cross-links have also been observed between EF-Tu and L7/L12 (162), and monoclonal antibodies against L7/L12 prevent both EFTu*GTP*aa-tRNAbinding to ribosomes and the ribosome stimulation of EF-Tu GTPase activity (163, 164). L7/L12 can also be cross-linked to EF-G in the presence of GMPPCP (165), but not in the presence of fusidic acid and GDP (166,167). In parallel with this observation, L7/L12 is not required for EF-G binding in the presence of fusidic acid, but is required for formation of the EFG.GMPPCP.ribosome complex (168). There is additional evidence that the conformation of L7/L12 is affected by the state of EF-G. The L7/L12 in EFGefusidic acid.ribosome complexes is resistant to trypsin, but in a GMPPCP-stabilized complex it is not (169). The interaction of EF-G with L7/L12 can also be visualized spectroscopically. L7/L12 contributes sharp resonances to the otherwise broadline spectrum of free ribosomes, indicative of independent mobility. In the presence of EF-G and GMPPCP, that signal disappears (170); L7/L12 becomes immobilized. Could it be that conformational changes in L7/L12 induce the GTPase in EF-G and EF-Tu? Ribosomes contain a tetramer of L7/L12 complexed with L10, another of the proteins that is sometimes picked up in factor cross-linking experiments (171),and L10, in turn, binds directly to the 1060 region of 23-S rRNA (140, 172). Strong evidence exists that factor activity depends directly on the 1060 region. Thiostrepton binds directly to the 1060 region of 23-S rRNA, as pointed out above (Section II,C), and its binding is enhanced by the presence of L11, which binds to the same region. In addition, EF-G has been cross-linked to RNA residues belonging to the 1060 region (173). It seems likely that the 1060 region interacts directly with elongation factors and is directly involved in ribosomal state switching.
314
JOHN CZWORKOWSKI AND PETER B. MOORE
E. Elongation Factor Interactions: The SRL The major RNA in all large ribosomal subunits includes a 12-nucleotide sequence that is totally conserved across taxonomic groups: A2654-A2665 in the 23-S rRNA of Escherichia coli (174)and nucleotides A4318-A4329 in the 28-S rRNA from the rat (175). It is called the “sarcinlricin loop” (SRL) because most of what we know about its role in protein synthesis has emerged from studies of the toxicity of two proteins, ricin and a-sarcin. Ricin is toxic because it inactivates ribosomes by catalyzing the depurination of a single adenosine residue in 23-Sl28-S rRNAs (A2660 in E . coli, A4324 in the rat) (176). a-sarcin is similarly fastidious in uiuo; it catalyzes the hydrolysis of a single phosphodiester bond in 23-Sl28-S rRNA, the one between G2661 and A2662 in E . coli (G4325-A4326 in the rat) (I75). The effect of a-sarcin on ribosome activity is the same as that of ricin, and cells exposed to both die because they can no longer make protein. Their large ribosomal subunits are inactive. Neither eukaryotic (177, 178) nor prokaryotic (179)ribosomes bind elongation factors properly following ricin or a-sarcin treatment, but they are normal in virtually every other respect. Chemical protection results suggest that factors interact with the SRL, the 1060 region, and remarkably little else in bacterial ribosomes. EF-G binding protects A1067, G2655, A2660, and G2661 ( E . coli); EF-Tu protects G2655, A2660, and A2665 (52).Further, the SRL is protected from a-sarcin by the prior binding of EFGeGPD and fusidic acid (180). [It should be noted that quite different results have been obtained with EF-2 in eukaryotes. It protects residues in 5-S rRNA, in 18-S rRNA, in the 2 8 3 rRNA homolog of the 1060 loop, in the helix 72-75 region of 28-S rRNA, and near the peptidyl transferase loop (53).Nothing is seen in the SRL, which is reported to be inaccessible to small reagents, even though the SRL in eukaryotic ribosomes reacts with ricin.] Although generally referred to as a “loop,” the SRL is highly structured (4, 5). Its GAGA sequence (G2659-A2662), where ricin and a-sarcin attack, is part of a GNRA tetraloop, which is closed by a Watson-Crick CG. The remaining bases in the sequence, four on the 5’ side and three on the 3’ side, form a tightly organized structure in which A2657 pairs side-by-side with G2664, U2656 makes a reversed-Hoogsteen pair with A2665, and G2655 (the “bulged G”) reaches across the major groove so that its imino proton can hydrogen-bond to an oxygen on the phosphate that links G2664 to A2665. [The same bulged G motif occurs in eukaryotic 5-S rRNA in loop E (181).] Ricin recognizes the GAGA tetraloop of the SRL (182),and a-sarcin recognizes its bulged G motif (A. Gluck and I. G. Wool, personal communication). The data obtained on SRL function by site-directed mutagenesis in viuo are puzzling. Replacement of G2661 by a C is tolerated without much effect,
ELONGATION PHASE OF PROTEIN SYNTHESIS
315
except in strains that include a streptomycin-resistant mutation in ribosomal protein S12, where the interaction of the ribosome with EFTu-tRNA-GTP is abnormal (183).The lethal effect of this combination is abolished by mutations in EF-Tu (184). Even more curious is the recent observation that G2665, the bulged G, can be replaced by an A (but not by a pyrimidine) without killing cells (A. Gluck and I. G. Wool, personal communication). If ribosomes tolerate sequence changes like these, why have variants of the SRL sequences never appeared in nature? SRL function has also been examined by exposing ribosomes to DNA oligonucleotides having complementary sequences. Ribosomes bind such sequences grudgingly, and then only when they are actively translating (185).Binding is an inactivating event (186),but the reproducibility of recent observations that oligonucleotides complementary to the 3’ half of the SRL and its associated helical stem bind strongly to large ribosomal subunits and have dramatic effects on activity (84, 187)has been questioned (I. G. Wool, personal communication). On balance, it seems unlikely that the SRL is part of the machinery that enables ribosomes to switch between the pretranslocational and posttranslocational states in the absence of factors. a-Sarcin-treated ribosomes are active in factor-free translation (179). However, an intact SRL is obviously required for elongation-factor activity. It could be that factor-induced changes in its conformation affect the rate of state-switching, as Wool and colleagues have long proposed (188). Nierhaus has recently proposed that the conformational change in question might be a melting of the secondary structure of the SRL (189). It could also be that changes in the SRL’s tertiary or quaternary interactions with other ribosomal components might be critical for factor-induced state switching. The SRL may be part of the factorbinding site, but as far as we know, no one been able to demonstrate an interaction between EF-G (or EF-Tu) and SRL sequences in isolation (I. G. Wool, personal communication). It has been reported recently that the SRL is located in the vicinity of the peptidyl transferase site, not the 1060 region (190).
F. Elongation Factor Interactions: The 30-S Subunit As we have repeatedly remarked, there are many lines of evidence indicating that the factor site on the ribosome includes 30-S components, and that elongation involves interaction between the two subunits. In the presence of aminoacyl-tRNA and an appropriate mRNA, EF-Tu binds so tightly to isolated 3 0 3 subunits that the complex can be visualized in the electron microscope (191). It binds to the side of the head of the 3 0 4 subunit away from the platform, in the vicinity of S4, S5, and S12. Consistent with this
316
JOHN CZWORKOWSKI AND PETER B . MOORE
finding and also with the biochemical observation that factors compete for a single binding site, the reactive cysteine in E . coli EF-G can be cross-linked to S12 in high yield by mild oxidation of the complexes of EFGmGTP-fusidic acid-ribosome (72). S12 can also be cross-linked to EF-G under the same conditions in lesser yield using iminothiolane (167). [Aryl azides attached to that same cysteine also efficiently cross-link EF-G to 23-S rRNA (192).] Skold, too, obtained cross-links between GMPPCP-stabilized complexes of EF-G and ribosomes to both 50-S (L6, L7/L12, L14) and 30-S components (S12, S19) (73). There is also genetic evidence linking S12 with the SRL. As mentioned in Section II,E, sequence changes in the SRL interact with mutations in S12 (183).In addition, some mutants in S12 are abnormal with respect to their capacity to stimulate the GTPase activity of EF-Tu, and others affect the response of translating systems to kirromycin (193, 194). A functional connection between S12 and elongation is also indicated by the observation that reaction of cysteines in S 12 with p-chloromercuribenzoate activates factorfree translation (see Section 1,E). It should also be noted that mutation of 1 6 3 rRNA at position 530 (in a universally conserved region) prevents EFTu-dependent binding of aminoacyl-tRNA to the ribosome, but not EF-Tuindependent (i.e. nonenzymatic) association of aminoacyl-tRNA with the classical A site (195). ~
G. Structures of the Factors Although crystals were first reported in the 1960s, it was not until recently that crystal structures for both elongation factors became available. As mentioned earlier, structures are available today for EF-Tu complexed with GDP (196), GTP (6, 7), and both GTP and aminoacyl-tRNA (8), and for nucleotide-free EF-G (9), and EF-G complexed with GDP (10). It had long been realized that their GTP-binding domains are homologous. It is now clear that their homology is far more extensive than that. EF-Tu is composed of three domains. Its large, N-terminal domain is a classic G-nucleotide binding domain, of which numerous examples are known both in prokaryotes and eukaryotes (197). Its small second and third domains are composed of beta sheet, and their placement relative to the G domain is determined by the nucleotide bound to the G domain. If that nucleotide is a GTP, the three domains assume the compact conformation shown on the right-hand side in Fig. 5 (1972,b).If the nucleotide is GDP, the protein opens up dramatically. This conformational change is triggered by (relatively) small alterations in G-domain conformation that are coupled to the placement of residues in its nucleotide binding site (6, 7). It is interesting that mutations of EF-Tu that confer resistance to kirromycin cluster at the interface between the G domain and the third do-
ELONGATION PHASE OF PROTEIN SYNTHESIS
317
FIG.5 . The structures of EF-C, and EFTu.GTP. This figure compares ribbon diagrams of the structures of EFG.GDP (left) and EFTu.GTP (right). The two molecules are shown with their nucleotide binding sites facing the reader and their nucleotides in the same orientation. The EFG.GDP coordinates used are those of the Yale group (10). Coordinates for EFTu.GTP were obtained from the group at the University of Aarhus (6). This figure was created using MolScript and Raster3D (197a, b).
main. Perhaps kirrornycin “glues” the two domains together, forcing EF-Tu to maintain a GTP-like conformation, regardless of the state of its nucleotide (198, 199). Pulvomycin-resistant mutants cluster in a different region, the region of EF-Tu where its three domains come together (200, 201). tRNA binds to the nucleotide-binding-site side of EF-Tu (8) (Fig. 6, top) (201a). The third domain binds the elbow of the tRNA, and the anticodon stem runs across the third domain toward the second domain. The CCA end tucks into the gap between the second domain and the G domain, where it is hidden from solvent, which explains why EF-Tu inhibits the hydrolysis of
318
JOHN CZWORKOWSKI AND PETER B. MOORE
FIG.6. The EF-Tu ternary complex compared with EF-G. A surface of space-filling models of EF-G (top) and EFTu.GTP-aa-tRNA (bottom) are compared, oriented so that their nucleotide binding sites would superimpose if the two molecules were laid on top of each other, in about the same orientation as shown in Fig. 5.The structure that projects down from the body of the EF-Tu ternary complex is the anticodon stemiloop of the tRNA. Its acceptor stem form is included in the “bottom” of the body of the complex. The structure of the ternary complex is described in Ref. 8. We thank Morten Kjeldgaard for enabling us to make use of its coordinates prior to publication. This figure was drawn using Grasp (ZOla).
ELONGATION PHASE OF PROTEIN SYNTHESIS
319
aminoacyl-tRNAs. This structure also explains most prior chemical and enzymatic protection and cross-linking studies (202, 203). EF-Ts appears to interact with all of the domains of EF-Tu (204-207), but its binding does not interfere with the formation of the EFTu.GTP.aa-tRNA complex (205)nor, in E . coli, with the association of the EFTu-GTP-aa-tRNA complex with the ribosome up to the point where GTP is hydrolyzed (208). EF-Ts must bind to EF-Tu somewhere on the surface of the molecule that is distal to the tRNA binding site, and that region must be unimportant for other aspects of EF-Tu activity. EF-G, which is considerably larger than EF-Tu, consists of five domains, the relative arrangement of which is essentially the same in the nucleotidefree protein as it is in the EFG.GDP complex. Preliminary data suggest that the conformational difference between EFG-GTP and EFGmGDP is much smaller than the corresponding difference in EF-Tu (J. Czworkowski and P. B. Moore, unpublished data). This conclusion has been questioned, but it should be pointed out that in the case of EF-Tu, GTP cleavage must reduce not only the &nity of EF-Tu for the ribosome, it must also reduce the &nity of that protein for tRNA. For EF-G, GTP cleavage controls only its affinity for the ribosome. The N-terminal domain of EF-G, like that of EF-Tu, is a G domain, but it is distinguished from others of its class, including that of EF-Tu, by a 90residue insert, which forms a number of “extra” secondary structure elements at the “top” of the molecule. The second domain of EF-G is homologous to the second domain of EF-Tu, but its remaining three domains are alpha-beta domains that are unrelated to the third domain of EF-Tu. The fourth domain of EF-G has a fold like that of ribosomal protein S5 (209), whereas the fifth domain closely resembles ribosomal protein S6 (9). (The C terminal fragment of EF-G, whose activity in translocation was noted earlier, corresponds to domains 2, 3, 4, and 5.) A moderate-sized literature exists describing the properties of EF-G molecules that have been modified chemically one way or another. The mechanistic interpretation of many of those data is problematic at this point because most of it speaks to the details of the interactions between EF-G and ribosomes. Suffice it to say that these data provide ample reason for believing that the nucleotide-binding-site face of the G domain of EF-G interacts with the ribosome ( 1 , 210-213). Furthermore, experiments done on the mechanism of action of diphtheria toxin on EF-2 (214-216) and on the effect of tyrosine-modifying reagents on EF-G (217) indicate that the distal end of the fourth domain is vital for EF-G function also. Mutations in EF-G that confer fusidic acid resistance are found in its G domain, and in domains 3 and 4,predominantly surrounding an interdomain gap in the structure (143). The failure of EFGeGDP to release from ribo-
320
JOHN CZWORKOWSKI AND PETER B. MOORE
somes when fusidic acid is present could be the product of a kirromycin-like locking of the relationship between its domains. Fusidic acid may inhibit the conformational change responsible for release, which normally follows GTP cleavage. The most remarkable discovery to emerge from these studies so far is the finding that EF-G resembles the EF-Tu ternary complex; at low resolution the two are effectively isosteric (8). When EF-G and EFTueaa-tRNA-GTP are aligned to maximize the overlap of the conserved regions of their nucleotide binding sites, their second domains superimpose (Fig. 6). More than that, domains 3, 4,and 5 of EF-G fill nearly the same space as the tRNA of the ternary complex. Domain 4 of EF-G corresponds to the anticodon stem/loop of the tRNA, and domains 5 and 3 correspond to the elbow and the acceptor arm, respectively. The correspondence it so close that it is inconceivable that it is coincidental. EF-G must be an all-protein analog of the EF-Tu ternary complex. EF-G and EFTu-GTPeaa-tRNA must bind to the ribosome similarly, and their functions must be related in some way also.
111. On the Mechanism of Elongation A. O n the Placement of Ribosomal Sites Figure 7 (217a) shows the large ribosomal subunit, with the face of the subunit that interacts with small subunits in ribosomal couples oriented toward the viewer. High-resolution electron-microscopy analyses show that a gap exists between the two subunits large enough to accommodate tRNAs (45, 46), and it is generally agreed that protein synthesis occurs in that gap. The anticodon ends of ribosome-bound tRNAs interact with mRNA at the base of the small subunit cleft (218), and the peptidyl transferase site is located on the 50-S subunit in the region between L1 and the central protuberance (219). It is likely that the anticodon stems of ribosome-bound tRNAs run along the 504 face of the head of the 30-S subunit, and that the codon-anticodon interaction occurs at the cleft of the 30-S subunit. Their acceptor stems cross the intersubunit gap, placing their CCA ends in the neighborhood of the peptidyl transferase site. There is strong evidence that the anticodon ends of A-site and P-site tRNAs are also closely associated by virtue of their binding to adjacent mRNA codons (220,221). A recent modeling study supports the now generally accepted conclusion that the two tRNAs are arranged relatively in the “S” configuration (222, 223). Thus if the A site is located to the right of the P site (in Fig. 7), then tRNAs in those sites must be oriented so that their anticodons point either toward the 504 subunit or toward the body
ELONGATION PHASE OF PROTEIN SYNTHESIS
50 S subunit
32 1
30 S subunit
FIG. 7. Ribosomal sites. The 50-S subunit is shown with its subunit-interface surface oriented toward the reader. The subunit-interface surface of the 3 0 4 subunit is oriented away from the reader. The shapes of both subunits have been sketched, using models obtained recently by electron microscopy as a guide (45,46).(Authority for the positions assigned binding sites on the two subunits may be found in Ref. 217a.)
of the 30-S subunit, depending on their rotational orientations about an axis running through their elbows, normal to the molecular plane. Most commentators believe, as we do, that the A site does indeed lie to the right of the P site in Fig. 7 (217a, 223a). This arrangement is required, among other things, by what is known about the elongation-factor binding site. One speaks in terms of a “site” rather than “sites” because EF-G and EF-Tu compete with each other, and the structural similarity of EF-G and the EF-Tu ternary complex implies that they bind to the ribosome in a similar, if not identical fashion. There is overwhelming evidence that EF-G binds to ribosomes at the base of the L7/L12 stalk (219), and that EF-Tu binds to the 30-S subunit on the side that faces the L7/L12 stalk in 70-S couples (191), to which EF-Tu can be cross-linked (224). The L7/L12 stalk lies well to the right of the peptidyl transferase site. [Note, however, that demonstrations that something can be cross-linked to L7/L12 do not constrain ribosomal placements very strongly. The L7/L12 stalk is flexible, and one of the two dimers of L7/L12 is associated with the central protuberance of the 50-S subunit, not the stalk itself (167).]Because EF-Tu delivers tRNAs to the ribosome, tRNAs must enter the A site from the right. Monoclonal antibodies against L2 and L9, which are in the peptidyl
322
JOHN CZWORKOWSKI AND PETER B. MOORE
transferase neighborhood, also interfere with EF-Tu function (164,225),and EF-Tu can be cross-linked to proteins in the same region: L1, L5,and L15 (162, 226). EF-G also cross-links to components associated with the same region (160). In our view, these data are not in conflict with the conclusion just drawn. Because the second domain of EF-Tu interacts with the CCA end of aminoacyl-tRNAs, that part of EF-Tu can hardly fail to approach the peptidyl transferase region when the EF-Tu ternary complex binds to the ribosome; EF-Tu must reach deep into the A site. That domain 2 is important in EF-Tdribosome interactions is supported by the existence of a domain 2 mutant that interferes with ribosome binding (227). The steric similarity of the EF-Tu ternary complex and EF-G argues that its second domain ought to interact with the ribosome in the same region. If the A site lies to the right of the P site, the site where aminoacyl-tRNA first encounters the ribosome, the T site, if it exists, ought to be the right of the A site. The position of the E site follows almost by elimination. It must be immediately to the left of the P site, on the L1 side of the peptidyl transferase site.
B. Factor and tRNA Orientations There is ample evidence that EF-Tu delivers tRNAs to the ribosome oriented the same way they are in the A site, and binds in a manner that is compatible with peptide transfer. EFTu.GTP*aa-tRNAcomplexes in which amino acids are cross-linked to EF-Tu are active in mRNA-directed aminoacyl-tRNA binding and in GTP hydrolysis (227a), and a 20-A cross-link between the variable loop of the aminoacyl-tRNA and EF-Tu permits message-dependent binding, GTP hydrolysis, and peptide transfer (228). The EF-Tu ternary complex has an RNA-rich side, which includes its tRNA and its GTP binding site, which must face the peptidyl transferase when bound to the ribosome. The opposite side of the ternary complex, its protein-rich side, must point toward the L71L12 stalk. Note that if the tRNA side of the ternary complex was oriented the other way, the CCA end of tRNA would point away from the peptidyl transferase center. Although T-site enthusiasts might find this geometry gratifying, because that rotational difference would definitively distinguish the T site and the A site, it is hard to understand how a tRNA could interact satisfactorily with the same codon in both sites, as it must. Thus we believe that tRNA is delivered to the ribosome in an A-sitelike orientation, and works its way across the subunit interface from right to left during elongation. In addition to moving across the face of the ribosome during elongation, tRNAs rotate. The plane of the L of an A-site-bound tRNA intersects the corresponding plane of its P-site-bound neighbor at an angle of about 50" (229-231). Thus after peptide-bond formation, but before the next amino
ELONGATION PHASE OF PROTEIN SYNTHESIS
323
acid is incorporated, not only must an A-site-bound peptidyl-tRNA move to the P site, it must also rotate about an axis joining its 3’ end and its anticodon. It is generally assumed that this happens during translocation, but, as pointed out recently, rotation could occur when EF-Tu delivers the next aminoacyl-tRNA to the ribosome (K. H. Nierhaus, personal communication). It is generally believed-on the basis of little real evidence-that the rotational orientation of E-site-bound tRNA resembles that of P-site-bound tRNA. These arguments require that EF-G bind to the 70-S ribosome in same place as the EF-Tu ternary complex, and oriented so that its tRNA-like parts fill the same ribosomal region as the tRNA of the ribosome-bound ternary complex. The experimental data that speak most directly to the orientation of EF-G on the ribosome come from cross-linking studies in which both EFG residues and ribosomal components have been identified (see Sections II,D and II,F), but it is not decisive. Some findings suggest that the nucleotide binding face of EF-G contacts the 30-S subunit, but other findings could be interpreted as proving the opposite.
C. Models for Translocation The elongation cycle is driven by a switching of the ribosome between its pre- and posttranslocational states, which is catalyzed by the two elongation factors. The two factors do not operate in a perfectly parallel manner, however. Catalysis of state switching is the sole function of EF-G whereas EF-Tu both catalyzes state switching and delivers aminoacyl tRNAs to the ribosome. However, the fact that EF-G and the EF-Tu ternary complex are isosteric (at low resolution) makes it plausible to propose that they facilitate state switching the same way. If during switching the ribosome were to adopt a conformation stabilized by the binding of either the EF-Tu ternary complex or EF-G, and if that intermediate conformation were the transition state for the conformational isomerization in question, factor-binding would lower the activation energy for state switching, and hence increase its rate, as observed. If this concept is correct, then EF-G and EF-Tu need differ as conformational catalysts in only one way. The GTPase of EF-Tu must be triggered when the ribosome is in its pretranslocational state, whereas that of EF-G must be activated in response to the posttranslocation state of the ribosome. (Note that the elongation scheme presented in Fig. 2 shows ternary complex binding to posttranslocational ribosomes, causing the post- to pretranslocational conformation change, consistent with this proposal.) It could be argued that the component of both factors critical for the catalysis of state switching is the second domain. The tRNA portion of the EF-Tu ternary complex cannot be critical because EF-Tu alone appears to be able to trigger the conformational switching of ribosomes in its absence
324
JOHN CZWORKOWSKI AND PETER B . MOORE
(Section 11,F). In addition, as mentioned earlier, polypeptides consisting of domains 2, 3, 4, and 5 of EF-G catalyze state switching. Domain 2 is the only structure EF-Tu and this fragment of EF-G have in common. (The third EFTu domain partially overlaps domain 5 of EF-G when the two molecules are superimposed on their G domains, but they show no structural homology.) Domain 2 ought to be the portion of both factors that approaches the peptidy1 transferase site of the 50-S subunit most closely. Note also that the SRL is believed to be located in that neighborhood (190). It is not hard to devise proposals for translocation consistent with these ideas in the context of the hybrid-sites model for elongation. EF-G binds to the ribosome immediately after peptide-bond formation, when the new peptidyl-tRNA occupies the P site on the 50-S subunit and the A site on the 30-S subunit, and the CCA end of the newly deacylated tRNA is in the 50-S E site and its anticodon end occupies the 3 0 4 P site. If EF-G were to bind so that its third domain occupied the 5 0 3 A site, its fourth domain could displace the anticodon stem/loop of the peptidyl tRNA from the 30-S A site (Fig. 8). If tRNAs are driven from site to site by motions of the ribosomal subunit, it is easy to understand how tRNA displacement might occur. When EF-G binds to the ribosome, the A site of the 30-S subunit is not aligned with the A site on the 504 subunit, either because of conformational changes occurring when EF-Tu binds or because of changes that accompany peptide-bond formation. Domain 4,by hypothesis, occupies the region the 30-S A site will arrive at after translocation is complete. If interactions between domain 4 and the 30-S subunit were to stabilize the posttranslocational conformation of the ribosome, its presence would facilitate translocation. The anticodon end of the peptidyl-tRNA would be forced to migrate to the P site because the A site is filled by domain 4. Its displacement, which drags the mRNA with it, would favor migration of the anticodon end of the deacylated tRNA the same way. It is harder to visualize what happens if tRNAs work their way across a fixed ribosome surface. In that case, EF-G would be unable to bind to the ribosome immediately after peptide-bond formation because the 3 0 3 A site, where domain 4 must go, would be occupied by the anticodon stemlloop of the new peptidyl-tRNA. The only way this would work is if translocation and EF-G binding were simultaneous, and mechanistic proposals of this type are necessarily vague about why EF-G affects the displacement required. Mechanisms of this class have one attractive feature, however; they make EF-G binding the same as translocation, which is consistent with the enzymology of EF-G insofar as we now know it. No one has isolated an EFG-ribosome.mRNA*tRNAcomplex that was not in the posttranslocational state; the subunit motion model implies that such things could exist. In both models, the rearrangement of the ribosome brought about by translocation triggers
ELONGATION PHASE OF PROTEIN SYNTHESIS
325
FIG. 8. The mechanism of elongation. This figure depicts the events that m y transpire during the elongation cycle. It is a hybrid-sites model premised on the hypotheses that there is relative motion of the subunits during elongation, and that EF-G is an all-protein mimic of the EF-Tu ternary complex. The elongation cycle is dissected into seven steps. Step 1 is the binding of the EF-Tu ternary complex to a posttranslocational ribosome. This binding induces an adjustment in the relationship between the two subunits (heavy, broken arrow; step 2) that alters the ribosome from its posttranslocational to its pretranslocational state. This conformational change induce; the GTPase of EF-Tu; its GTP is cleaved, and EF-Tu leaves the ribosome (step 3). Peptide transfer occurs (step 4), and this causes the two tRNAs bound to the ribosome to enter hybrid states. EFG.GTP then binds (step 5). Its binding causes the reverse of the conformational change depicted in step 2 (step 6). The ribosome enters the posttranslocational state, which induces the GTPase of EF-G. The GTP bound to E F - 6 is hydrolyzed, and EF-G leaves the ribosome (step 7 ) so that the cycle can begin again.
the latent GTPase activity of EF-G. Cleavage of the GTP bound to EF-G favors a conformation of EF-G that has low a n i t y for the ribosome, and EFG departs, leaving the ribosome ready to accept the next EF-Tu ternary complex. We note that because the chirality of the elongation factors is now established beyond question, our analysis of their placement and orientation on
326
JOHN CZWORKOWSKI AND PETER B. MOORE
the ribosome depends on the handedness of the ribosome model we have used. This model is derived from projection images obtained by transmission electron microscopy, and left hands look like right hands in projection. Experiments have been done to determine the absolute hand of the ribosome (232; Joachim Frank, personal communication); the enantiomorph shown here is the one believed to be correct. If the chirality of current ribosome models were found to be incorrect, however, our proposal would have to be reformulated.
IV. Concluding Remarks When we decided to write this review, we hoped that insights gained from new elongation factor structures combined with facts gleaned from close reading of the literature would lead us to a definitive proposal for the mechanism of elongation. It is obvious to us now that the existing data are not sufficient to permit a result that grandiose to be achieved. We hope that speculations we have indulged in to fill the gaps will provoke experiments needed to solve this fascinating problem.
ACKNOWLEDGMENTS We are indebted to Morten Kjeldgaard for supplying us with coordinates for the EF-Tu ternary complex prior to publication, and we thank our many colleagues who responded to our request for reprints and preprints. Those acquainted with the elongation field will have noted that we have not cited all the references relevant to each point raised. Considerations of length made this impossible from the outset, and we apologize to all who feel their contributions have been slighted. We also acknowledge the many discussions about elongation we have had with Anders Liljas and Arnthor Aevarrson. Their insights have shaped our thinking, but they bear no responsibility for the opinions expressed. This review was prepared with support from a grant from the National Institutes of Health (AI09167).
REFERENCES 1. Y. Kaziro, B B A 505, 95 (1978). 2 . A. S. Spirin, This Series 32, 75 (1985).
3. 0. Nygard and L. Nilsson, EJB 191, 1 (1990). 4. A. A. Szewczak, P. B. Moore, Y.-L. Chan and I. G. Wool, PNAS 90, 9581 (1993) 5. A. A. Szewczak and P. B. Moore, J M B 247, 81 (1995). 6. M. Kjeldgaard, P. Nissen, S. Thirup and J. Nyborg, Structure 1, 35 (1993).
ELONGATION PHASE OF PROTEIN SYNTHESIS
327
7. H. Berchtold, L. Reshetnikova, C. 0. A. Reiser, N. K. Schirrner, M. Sprinzl and R. Hilgenfeld, Nature 365, 126 (1993). 8. P. Nissen et al., Science 270, 1464 (1995). 9. A. Aevarsson, E. Brazhnikov, M. Garber, J. Zheltonorova, Yu. Chirgadze, S. AI-Karadaghi, L. A. Svensson and A. Liljas, EMBO J. 13, 3669 (1994). 10. J. Czworkowski, J. Wang, T. A. Steitz and P. B. Moore, E M B O J . 13, 3661 (1994). 11. J. D. Watson, Bull. Soc. Chim.Biol. 46, 1399 (1964). 12. R. E. Monro, J M B 26, 147 (1967). 13. B. E. H. Maden, R. R. Traut and R. E. Monro, JMB 35, 333 (1968). 14. S. L. Gupta, J. Waterson, M. L. Sopori, S. M. Weissrnan and P. Lengyel, Bchem 10,4410 (1971). 15. S . S. Thacb and R. E. Thach, PNAS 6S, 1791 (1971). 16. D. Beyer, E. Skripkin, J. Wadzack and K. H. Nierhaus, J B C 269, 30713 (1994). 17. F. Lipmann, Science 164, 1024 (1969). 18. F. 0. Wettstein and H. Noll, J M B 11, 35 (1965). 19. H.-J. Rheinberger and K. H. Nierhaus, Biochem. Znt. 1, 297 (1980). 20. H.-J. Rheinberger, H. Sternback and K. H. Nierhaus, PNAS 78, 5310 (1981). 21. A. S. Spirin, FEBS Lett. 165, 280 (1984). 22. V. I. Baranov and L. A. Rybova, Biochimie 70, 259 (1988). 23. K. H. Nierhaus, Bchem 29, 4997 (1990). 24. H.-J. Rheinberger, U. Geigenrnuller, A. Gnirke, T.-P. Hausner, J. Remrner, H. Saruyama and K. H. Nierhaus, in ”The Ribosome: Structure, Function and Genetics” (W. E. Hill, A. Dahlberg, R. A. Garrett, P. B. Moore, D. Schlessinger and J. R. Warner, eds.), p. 318. American Society for Microbiology, Washington, DC, 1990. 25. A. Gnirke, U. Geigenrnuller, H.-J. Rheinberger and K. H. Nierhaus, JBC 264, 7291 (1989). 26. B. Hardesty, W. Culp and W. McKeenan, C S H S Q B 34, 331 (1969). 27. J. A. Lake, PNAS 74, 1903 (1977). 28. D. Moazed and H. F. Noller, Nature 342, 142 (1989). 29. L. Skogerson and K. Moldave, ABB 125, 497 (1968). 30. M. V. Rodnina, R. Fricke and W. Wintermeyer, in “The Translational Apparatus: Structure, Function, Regulation and Evolution” (K. H. Nierhaus, F. Franceschi, A. R. Subramanian, V. A. Erdmann and B. Wittrnan-Liebold, eds.), p. 317. Plenum, New York, 1993. 31. S. V. Kirillov, V. I. Makhno and Y. P. Semenkov, NARes 8, 183 (1980). 32. M. C. Ganoza, C. Cunningham, D. G. Chung and T. Neilson, Mol. Biol. Rep. 15, 33 (1991). 33. M. Cannon, R. Krug and W. Gilbert, J M B , 7, 360 (1963). 34. A. Gnirke and K. H. Nierhaus, JBC 261, 14506 (1986). 35. R. Lill, A. Lepier, F. Schwagele, M. Sprinzl, H. Vogt and W. Wintermeyer, J M B 203,699 (1988). 36. D. Moazed and H. F. Noller, PNAS 88, 3725 (1991). 37. H.-J. Rheinberger, H. Sternbach and K. H. Nierhaus, JBC 261, 9140 (1986). 38. R. Lill and W. Wintermeyer, J M B 196, 137 (1987). 39. M. V. Rodnina and W. Wintermeyer, J M B 228, 450 (1992). 40. H. F. Noller, A R B 60, 191 (1991). 41. B. Hardesty, 0. W. Odom and H.-Y. Deng, in “Structure, Function and Genetics of Ribosomes’”(B. Hardesty and 6 . Krarner, eds.), p. 495. Springer-Verlag, New York, 1986. 42. 0. W. Odorn, W. D. Picking and 9. Hardesty, Bchem 29, 10734 (1990). 43. 0. W. Odom and 8. Hardesty, JBC 267, 19117 (1992).
328
JOHN CZWORKOWSKI AND PETER B . MOORE
44. A. Yonath, K. R. Leonard, S. Weinstein and H. G . Wittman, C S H S Q B 52, 729 (1987). 45. H. Stark, F. Mueller, E. V. Orlova, M. Schatz, P. Dube, T. Erdemir, F. Zemlin, R. Brimacornbe and M. van Heel, Structure 3, 815 (1995). 46. J. Frank, J. Zhu, P. Penczek, Y. Li, S. Srivastava, A. Verschoor, M. Rademacher, R. Grassucci, R. K. Lata and R. K. Agrawal, Nature 376, 441 (1995). 47. I. Serdyuk, V. Baranov, T. Tsalkova, D. Gulyarnova, M. Pavlov, A. S. Spirin and R. May, Biochimie 74, 299 (1992). 48. Y. Sernenkov, T. Shapkina, V. Makhno and S. Kirillov, FEBS Lett. 296, 207 (1992). 49. K. H. Nierhaus, in preparation (1995). 50. H.-J. Rheinberger and K. H. NierhausJBC 261, 9133 (1986). 51, M. V. Rodnina, R. Fricke and W. Wintermeyer, Bchem 33, 12267 (1994). 52. D. Moazed, J. M. Roberston and H. F. Noller, Nature 334, 362 (1988). 53. L. Holrnberg and 0. Nygard, Bchem 33, 15159 (1994). 54. C. G . Kurland and M. Ehrenberg, Annu. Reu. Biophys. 16, 291 (1987). 55. J. Eisinger, B. Feuer and T. Yamane, Nature N B 231, 120 (1071). 56. H. J. Grosjean, S. de Henau and D. M. Crothers, PNAS, 75, 610 (1978). 57. R. C. Thompson and A. M. Karim, PNAS 79, 4922 (1982). 58. J. J. Hopfield, PNAS 71, 4135 (1974). 59. J. Ninio, Biochimie 57, 587 (1975). 60. R. C. Thompson, D. B. Dix, R. B. Gerson and A. M. Karim, J B C 256, 6676 (1981). 61. R. Lill, J. M. Robertson and W. Wintermeyer, Bchem 25, 3245 (1986). 62. C. G. Kurland and M. Ehrenberg, This Series 31, 191 (1984). 63. U. Geigenmuller and K. H. Nierhaus, E M B O ] . 9, 4527 (1990). 64. R. C. Thompson and P. J. Stone, PNAS 74, 198 (1977). 65. T. Ruusala, M. Ehrenberg and C. 6. Kurland, E M B O J . 1, 741 (1982). 66. T. Ruusala and C. G . Kurland, MGG 198, 100 (1984). 67. J. Gordon and F. Lipmann, JMB 23, 23 (1967). 68. S. Pestka, JBC 243, 2810 (1968). 69. S. Pestka, JBC 244, 1533 (1969). 70. L. P. Gavrilova, 0. E. Kostiashkina, V. E. Koteliansky, N. M. Rutkevitch and A. S. Spirin, J M B 101, 537 (1976). 71. L. P. Gavrilova and A. S. Spirin, FEBS JAt. 39, 13 (1974). 72. A. S. Girshovich, E. S. Bochkareva and Y. A. Ovchinnikov, J M B 151, 229 (1981). 73. S. E. Skold, EJB 127, 225 (1982). 74. P. Strigini and L. Gorini, JMB 47, 517 (1970). 75. M. Ehrenberg, N. Bilgin and C. G. Kurland, in “hbosomes and Protein Synthesis: A Practical Approach (G. Spedding, ed.), p. 101. IRL Press, Oxford, 1990. 76. C. Nombela and S. Ochoa, PNAS 70, 3556 (1973). 77. J. R. Mesters, A. P. Potapov, J. M. d e Graaf and B. Kraal, JMB 242, 644 (1994). 78. J. W. Bodley and L. Lin, Nature 227, 60 (1970). 79. D. Richter, BBRC 46, 1850 (1972). 80. D. L. Miller, PNAS 69, 752 (1977). 81. N. Richman and J. W. Bodley, PNAS 69, 686 (1972). 82. J. M. Robertson, C. Urbanke, G. Chinali, W. Wintermeyer and A. Parmeggiani,JMB 189, 653 (1986). 83. 0. Nygard and L. Nilsson, EJB 179, 603 (1989). 84. K. H. Nierhaus, S. Schilling-Bartetzko and T. Twardowski, Biochimie 74, 403 (1992). 85. S. Schilling-Bartetzko, A. Bartetzko and K. H. Nierhaus, JBC 267, 4703 (1992). 86. R. Mikkola and C. 6. Kurland, Biochimie 73, 1061 (1991). 87. I. Tubulekas and D. Hughes, Mol. Microbiol. 7, 275 (1993).
ELONGATION PHASE OF PROTEIN SYNTHESIS
329
A. V. Furano, PNAS 72, 4780 (1975). M. Gouy and R. Grantham, FEBS Lett. 115, 151 (1980). L. Beres and J. Lucas-Lenard, Bchem 12, 3998 (1973). J. Lucas-Lenard and F. Lipmann, ARB 40, 409 (1971). D. L. Miller and H. Weissbach, in “Nucleic Acid-Protein Recognition” (H. J. Vogel, ed.), p. 409. Academic Press, New York, 1977. 93. A. Weijland, K. Harmark, R. H. Cool, P. H. Anborgh and A. Parmeggiani, Mol. Microbid. 6 , 683 (1992). 94. M. Sprinzl, Trends Biochem. Sci. 19, 245 (1994). 95. M. V. Rodnina, R. Fricke, L. Kuhn and W. Wintermeyer, E M B O J . 14, 2613 (1995). 96. G. Romero, V. Chau and R. L. Biltonen, JBC 260, 6167 (1985). 97. K. L. Manchester, Biochem. Znt. 5, 929 (1991). 98. K. L. Manchester, Biochem. Znt. 27, 311 (1992). 99. 0. Fasano, E. De Vendittis and A. Parmeggiani, JBC 257, 3145 (1982). 100. J. Gordon, JBC 244, 5680 (1969). 101. G. Sander, R. C. Marsh, J. Voigt and A. Parmeggiani, Bchem 14, 1805 (1975). 102. H. Wolf, G. Chinali and A. Parmeggiani, PNAS 71, 4910 (1974). 103. G. Parlato, R. Pizzano, D. Picone, J. Guesnet, 0. Fasano and A. Parmeggiani, J B C 258, 995 (1983). 104. K. Takahashi, S. Ghang and S. Chladek, Bchem 25, 8330 (1986). 105. M. Tezuka and S. Chladek, BBA 950, 463 (1988). 106. S. Campuzano and J. Modolell, PNAS 77, 905 (1980). 107. G. Sander, EJB 75, 523 (1977). 108. D. Picone and A. Parmeggiani, Bchem 22, 4400 (1983). 109. H. Wolf, 6 . Chinali and A. Parmeggiani, EJB 75, 67 (1977). 110. A. Parmeggiani, G. W. Swart, K. K. Mortensen, M. Jensen, B. F. Clark, L. Dente and R. Cortese, PNAS 84,3141 (1987). 111. K. Bensch, U. Pieper, G. Ott, N. Schirmer, M. Sprinzl and A. Pingoud, Biochimie 73, 1045 (1991). 112. R. Leberman, FEBS Lett.358, 71 (1995). 113. A. Weijland and A. Parmeggiani, Trends Biochem. Sci. 19, 188 (1994). 114. J. Scoble, N. Bilgin and M. Ehrenberg, Biochimie 76, 69 (1994). 115. A. Weijland, G. Parlato and A. Parmeggiani, Bchem 33, 10711 (1994). 116. V. Dincbas, N. Bilgin, J. Scoble and M. Ehrenberg, FEBS Lett. 357, 19 (1995). 117. N. Bilgin and M. Ehrenberg, Bchem 34, 715 (1995). 118. B. D. Beck, P. G. Arscott and A. Jacobson, PNAS 75, 1250 (1978). 119. M. Wurtz, G. R. Jacobson, A. C. Steven and J. P. Rosenbusch, EJB 88, 593 (1978). 120. B. D. Beck, EJB 97, 495 (1979). 121. K. Arai, Y. Otga, N. Arai, S. Nakamura, C. Henneke, T. Oshima and Y. Kaziro, EJB 92, 521 (1978). 122. K. Arai, Y. Otga, N. Arai, S. Nakamura, C. Henneke, T. Oshima and Y. Kaziro, EJB 92, 509 (1978). 123. M . V. Rodinina and W. Wintermeyer, PNAS 92, 1945 (1995). 124. M. S. Rohrbach, M. E. Dempsey and J. W. Bodley, JBC 249, 5094 (1974). 125. A. Parmeggiani and G. Sander, Mol. Gen. Biochem. 31, 129 (1981). 126. E. De Vendittis, M. Masullo and V. Bocchini, JBC 261, 4445 (1986). 127. M . Masullo, G. Parlato, E. de Vendittis and V. Bocchini, BJ 261, 725 (1989). 128. A. R. Dahlfors and C. G. Kurland, J M B 216, 311 (1990). 129. Y. Kaziro, N. Inoe, Y. Kuriki, K. Mizumoto, M. Tanaka and M. Kawakita, C S H S Q B 34, 385 (1969). 88. 89. 90. 91. 92.
330
JOHN CZWORKOWSKI AND PETER B. MOORE
N. Inoue-Yokosawa, C. Ishikawa and Y. Kaziro, JBC 249, 4321 (1974). G. Chinali and A. Parmeggiani, EJB 125, 415 (1982). R. Lill, J. M. Robertson and W. Wintermeyer, EMBO J. 8, 3933 (1989). W. Wintermeyer, R. Lill and J. M. Robertson, in “The Ribosome: Structure, Function and Evolution” (W. E. Hill, A. Dahlberg, R. A. Garrett, P. B. Moore, D. Schlessinger and J. R. Warner, eds.), p. 348. American Society for Microbiology, Washington, DC, 1990. 134. J. M. Robertson and W. Wintermeyer, J M B 198, 133 (1987). 135. H. Paulsen and W. Wintermeyer, Bchem 25, 2749 (1986). 136. Y. B. Alakhov, 0. A. Stengrevics, V. V. Filirninov and S. Yu. Venyaminov, EJB 99, 585 (1979). 137. C. Borowski, C. Niess and W. Wintermeyer, in preparation (1995). 138. N. V. Belitsina, G. Z. Tnalina and A. S. Spirin, FEBS Lett. 131, 289 (1981). 139. E. F. Gale, E. Cundlifle, P. E. Reynolds, M . H. Richmond and M. J. Waring, “The Molecular Basis of Antibiotic Action.” Wiley, London, 1981. 140. E. Cundliffe, in “The Ribosome: Structure, Function, and Genetics” (W. E. Hill, A. Dahlberg, R. A. Garrett, P. B. Moore, D. Schlessinger and J. R. Wanner, eds.), p. 479. American Society for Microbiology, Washington, DC, 1990. 141. J. W. Bodley, F. J. Zieve, L. Lin and S. T. Vieve, JBC 245, 5656 (1970). 142. G. R. Willie, N. Richman, W. 0. Godtfredsen and J. W. Bodley, Bchem 14, 1713 (1975). 143. U. Johanson and D. Hughes, Gene 143, 55 (1994). 144. J. R. Mesters, J. M. de Graafand B. Kraal, FEBS Lett. 321, 149 (1993). 145. A. Parmeggiani and G. W. M. Swart, Annu. Reu. Microbiol. 39, 557 (1985). 146. A. Weijland, K. Harmark, P. H. Anborgh and A. Parmeggiani, in “The Translational Apparatus: Structure, Function, Regulation, Evolution” (K. H. Nierhaus, F. Franceschi, A. R. Subrarnanian, V. A. Erdmann and B. Wittmann-Liebold, eds.), p. 295. Plenum, New York, 1993. 147. J. P. Abraham, M. J. van Raaij, G. Ott, 8. Kraal and L. Bosch, Bchem 30, 6705 (1991). 148. H. Wolf, D. Assmann and E. Fischer, PNAS 75, 5324 (1978). 149. E. Cundliffe and P. D. Dixon, Antimicrob. Agents Chemother. 8, 1 (1975). 150. J. H . Highland, L. Lin and J. W. Bodley, Bchem 10, 4404 (1971). 151. J. Modolell, B. Cabrer, A. Parmeggiani and D. Vazquez, PNAS 68, 1796 (1971). 152. T. P. Hausner, U. Geigenmuller and K. H. Nierhaus, JBC 263, 13103 (1988). 153. E. CundlifFe and J. Thompson, EJB 118, 47 (1981). 154. M. Misumi, N. Tanaka and T. Shibata, BBRC 82, 971 (1978). 155. J. Modolell and D. Vazquez, EJB 81, 491 (1977). 156. T.-P. Haussner, U. Geigenmuller and K. H. Nierhaus, JBC 263, 13103 (1988). 157. U. Johanson and D. Hughes, NARes 23, 464 (1995). 158. K. W. Kischa, W. Moller and G. Stoefiler, Nature NB 233, 62 (1971). 159. E. Hamel, M. Koka and T. Nakamoto, JBC 247, 805 (1972). 160. W. Moller and J. A. Maassen, in “Structure, Function, and Genetics of Ribosomes” (B. Hardesty and G. Kramer, eds.), p. 309. Springer-Verlag. New York, 1986. 161. J. A. Langer, F. Jurnak and J. A. Lake, Bchem 23, 6171 (1984). 162. C. San Jose, C. G. Kurland and G. StoefBer, FEBS Lett. 71, 133 (1976). 163. B. Nag, D. S. Tewari, A. Somrner, H. M. Olson, D. G. Glitz and R. R. Traut,JBC 262, 9681 (1987). 164. B. Nag, S. S. Akella, P. A. Cann, D. S. Tewari, D. G. Glitz and R. R. Traut, JBC 266, 22129 (1991). 165. A. S. Acharya, P. B. Moore and F. M. Richards, Bchen 12, 3108 (1973). 166. J. A. Maassen and W. Moller, PNAS 71, 1277 (1974).
130. 131. 132. 133.
ELONGATION PHASE OF PROTEIN SYNTHESIS
331
167. R. R. Traut, D. S. Tewari, A. Sommer, G. R. Gavino, H. M. Olson and D. G. Glitz, in “Structure, Function and Genetics of Ribosomes” (B. Hardesty and G. Kramer, eds.), p. 286. Springer-Verlag, Berlin, 1986. 168. P. I. Schrier, Ph.D. Thesis, University of Leiden (1977). 169. A. T. Gudkov and G. M. Gongadze, FEBS Lett. 176, 32 (1984). 170. L. A. Ryabova, 0. M. Selivano, V. I. Baranov, V. D. Vasiliev and A. S. Spirin, FEBS Lett. 226, 255 (1988). 171. A. Liljas, Prog. Biophys. Mol. Biol. 40, 161 (1982). 172. J. Egehjerg, N. Larsen and R. A. Garrett, in “The Ribosome: Structure, Function, and Evolution” (W. E . Hill, A. Dahlherg, R. A. Garrett, P. B. Moore, D. Schlessinger and J. R. Warner, eds.), p. 168. American Society for Microbiology, Washington, DC, 1990. 173. S. E. Skold, NARes 11, 4923 (1983). 174. H. F. N o h , J. Kop, V. Wheaton, J. Brosius, R. R. Gutell, A. M. Kopylov, F. Dohme, W. Herr, D. A. Stahl, R. Gupta and C. R. Woese, NARes, 9, 6167 (1981). 175. Y. Endo and I. G. Wool, JBC 257, 9054 (1982). 176. Y. Endo, M. Mitsui, M. Motizuki and K. Tsurugi, JBC 262, 5908 (1987). 177. C. Fernandez-Puentes and D. Vazquez, FEBS Lett. 78, 143 (1977). 178. A. N. Hohden and E. Cundliffe, BJ 170, 57 (1978). 179. T.-P. Hausner, J. Atmadja and K. H. Nierhaus, Biochimie 69, 911 (1987). 180. S. P. Miller and J. W. Bodley, NARes 19, 1657 (1991). 181. B. Wimherly, G . Varani and I. Tinoco, Jr., Bchem 32, 1078 (1993). 182. A. Gluck, Y. Endo and I. G. Wool, J M B 226, 411 (1992). 183. W. E. Tapprich and A. E. Dahlherg, E M B O J . 9, 2649 (1990). 184. S. Tapio and L. A. Issaksson, EJB 202, 981 (1991). 185. C. A. White, T. Wood and W. E. Hill, NARes 16, 10817 (1988). 186. W. E. Hill, J. Weller, T. Gluick, C . Merryman, R. T. Marconi, A. Tassanakajohn and W. E. Tapprich, in “The Ribosome: Structure, Function, and Evolution” (W. E. Hill, A. Dahlherg, R. A. Garrett, P. B. Moore, D. Schlessinger and J. R. Warner, eds.), p. 253. American Society for Microbiology, Washington, DC, 1990. 187. K. H. Nierhaus, R. Adlung, T.-P. Hausner, S. Schilling-Bartetzko, T. Twaerdowski and F. Triana, in “The Translational Apparatus: Structure, Function, Regulation, Evolution” (K. H. Nierhaus, F. Franceschi, A. R. Suhramanian, V. A. Erdmann and B. WittmannLiehold, eds.), p. 263. Plenum, New York, 1993. 188. I. G. Wool, A. Gluck and Y. Endo, Trends Biochem. Sci. 17, 266 (1992). 189. K. H. Nierhaus, T.-P. Adlung, S. Hausner, S. Schilling-Bartetzko, T. Twardowski and F. Triana, in “The Translational Apparatus: Structure, Function, Regulation, Evolution” (K. H. Nierhaus, F. Franceschi, A. R. Suhramanian, V. A. Erdmann and B. Wittman-Liebold, eds.), p. 263. Plenum, New York, 1993. 190. B. S . Cooperman, P. Muralikrishna and R. W. Alexander, in preparation (1995). 191. J. A. Langer and J. A. Lake, J M B 187, 617 (1986). 192. A. S. Girshovich, E. S. Bochkareva and A. T. Gudkov, FEBS Lett. 150, 99 (1982). 193. N. Bilgin, F. Claesens, H. Pahverk and M. Ehrenherg, J M B 224, 1011 (1992). 194. I. Tuhulekas, R. H. Buckingham and D. Hughes, J. Bacterial. 173, 3635 (1991). 195. T. Powers and H. F. Noller, PNAS 90, 1364 (1993). 196. M. Kjeldgaard and J. Nyhorg, J M B 223, 721 (1992). 197. H. R . Bourne, D. A. Sanders and F. McCormick, Nature 349, 117 (1975). 197a. P. J. Kraulis, J. Appl. Crystallogr. 24, 946 (1991). 197b. D. J. Bacon and W. F. Anderson, J. Mol. Graphics 6, 219 (1988). 198. F. Ahdulkarim, L. Liljas and D. Hughes, FEBS Lett. 352, 118 (1994).
332
JOHN CZWORKOWSKI AND PETER B. MOORE
199. J. R. Mesters, L. A. H. Zeef, R. Hilgenfeld, J. M. de Graaf, B. Kraal and L. Bosch, E M B O J. 13, 4877 (1994). 200. A. Pingoud, W. Block, C. Urbanke and H. Wolf, EJB 123, 261 (1982). 201. L. A. H . Zeef, L. Bosch, P. H. Anborgh, R. Cetin, A. Parmeggiani and R. Hilgenfeld, E M B O J . 13, 5113 (1994). 201a. A. Nicholls, K. A. Sharp and B. Honig, Proteins 11, 281 (1991). 202. A. E. Johnson, F. Janiak, V. A. Dell and J. K. Abrahamson, in “Structure, Function and Genetics of Ribosomes” (B. Hardesty and G. Kramer, eds.), p. 541. Springer Verlag, New York, 1986. 203. F. P. Wikman, G. E. Siboska, H. U. Petersen and B. F. Clark, E M B O J . 1, 1095 (1982). 204. M. Jensen, R. H. Cool, K. K. Mortensen, B. F. Clark and A. Parmeggiani, EJB 182,247 (1989). 205. N. K. Schirmer, C. 0. Reiser and M. Sprinzl, EJB 200, 295 (1991). 206. Y. W. Hwang, M. Carter and D. L. Miller, JBC 267, 22198 (1992). 207. M. E. Peter, C. 0. A. Reiser, N. K. Schirmer, T. Kieihaber, G. Ott, N . W. Grillenbeck and M . Sprinzl, NARes 18, 6889 (1990). 208. M. G. Bubunenko, M. L. Kireeva and A. T. Gudkov, Biochimie 74, 419 (1992). 209. A. G. Murzin, Nat. Struct. Biol. 2, 25 (1995). 210. Yu. B. Alakhov, L. P. Motuz, 0. A. Stengrevics, L. M. VinokurovandYu. A. Ovchinnikov, Bioorg. Khim. 3, 1333 (1977). 211. Yu.A. Ovchinnikov, Yu.B. Alakhov, Yu.P. Bundulis, M. A. Bundule, N. V. Dovgas, V. P. Kozlov, L. P. Motuz and L. M. Vinokurov, FEBS Lett. 139, 130 (1982). 212. N. Arai, K. Arai, S. Nakamura and Y. Kaziro, J . Biochem. 82, 695 (1977). 213. D. Guillot, J.-P. Lavergne and J.-P. Reboud, JBC 268, 26082 (1993). 214. R. J. Collier, Bacteriol. Rev. 39, 54 (1975). 215. E. A. Robinson, 0. Henriksen and E. S. Maxwell, JBC 249, 5088 (1974). 216. B. G . Van Ness, J. B. Howard and J. W. Bodley, JBC 255, 10710 (1980). 217. Yu. B. Alakhov, I. K. Zalite and I. A. Kashparov, EJB 105, 531 (1980). 21 7a. J. Wower, P. Scheffer, L. A. Sylvers, W. Wintermeyer and R. A. Zimmermann, E M B O J . 12, 617 (1993). 218. P. Gonichi, K. Nurse, W. Hellmann, M. Boublik and J. Ofengand, JBC 249, 10493 (1984). 219. G . StoefAer, and M. Stoeffler-Meilicke, in “Structure, Function, and Genetics of Ribosomes” (B. Hardesty and G. Kramer, eds.), p. 28. Springer-Verlag, New York, 1986. 220. R. H. Fairclough and C. R. Cantor, J M B 132, 575 (1979). 221. A. J. M. Matzke, A. Bartd and E. Kuechler, PNAS 77, 5110 (1980). 222. T. R. Easterwood, F. Major, A. Malhotra and S. C. Harvey, NARes 22, 3779 (1994). 223. V. Lim, C. Venclovas, A. S. Spirin, R. Brimacombe, P. Mitchell and F. Muller, NARes 20, 2627 (1992). 223a. J. Wower and R. A. Zimmermann, Biochimie 73, 961 (1991). 224. A. S. Girshovich, E. S. Bochkareva and V. D. Vasiliev, FEBS Lett. 197, 192 (1986). 225. B. Nag, D. S. Tewari, J. R. Etchinson, A. Sommer and R. R. Traut, JBC 261, 13892 (1986). 226. U. Fabian, FEBS Lett. 71, 256 (1976). 227. I. Tubulekas and D. Hughes, J. Bact. 175, 240 (1993). 227a. A. E. Johnson, D. L. Miller and C. R. Cantor, PNAS 75, 3075 (1978). 228. T. Kao, D. L. Miller, M. Abo and J. Ofengand, J M B 166, 383 (1983). 229. A. E. Johnson, H. J. Adkins, E. A. Matthews and C. R. Cantor, J M B 156, 113 (1982). 230. H . Paulsen, J. M. Robertson and W. Wintermeyer, J M B 167, 411 (1983). 231. R. Rigler and W. Wintermeyer, Annu. Reu. Biophys. Bioeng. 12, 475 (1983). 232. K. R. Leonard and J. A. Lake, J M B 129, 155 (1979).
Signals in Eukaryotic DNA Promote and Influence Formation of Nucleosome Arrays ARNOLDSTEIN Department of Biological Sciences Purdue University West Lafayette, Indiana 47906-1392
I. Detection and Analysis of Nucleosome Arrays 11. Models for the Formation of Periodic Nucleoso Their Implications . . . . . . . . . . . . . . . . . . . . . . . . 111. In Vitro Chromatin Assembly Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Chromatin Assembly Using Crude Extr B. Chromatin Assembly in a System Consisting of Purified Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............ Array and Chromatin Higher IV. Relationship bet Order Structure ............................ V. Signals in Geno cleosome Alignment . . . . . . . . . A. Introns of the Chicken Ovalbumin Gene Promote Nucleosome Alignment in Vitro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. Signals in Chicken P-Clobin DNA Influence Chromatin A Vitro . . . . . . . . . . . . . . . . . . . . . . . . C. Rat Growth Hormone Gene Introns Stimulate Nucleosome Alignment in Vitro and in Transgenic Mice and Increase Transcription Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VI. Chromatin Assembly on Plasmids in Transfected Cells . . . . . . . . . . . . . . VII. Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References .... ..................................
334 338 343 344 346 356 358 359 363 366 374 377 378
One of the hallmarks of chromatin structure is the periodic arrays of nucleosomes revealed in electron micrographs of chromatin spread at low ionic strength, or by the ladders of bands in gels following micrococcal nuclease (MNase) digestion and electrophoresis of the purified fragmented DNA. The periodic nucleosome arrays must in some way reflect or influence the higher order chromatin structures in which they reside, even though they persist when the higher order structure is unfolded, because of the very tight association of the core histone octamers with DNA. In turn, the higher order structures of chromatin are thought to be responsible both for organizing the folding of DNA in chromosomes and, in cooperation with transcription factors, for regulating access to the genetic material in a cell-type speProgress in Nucleic Acid Research
and Molecular 0iology. Vol. 54
333
Copyright 0 1996 by Acadc~nicYrecs, Inc.
All nghts of epruduchon rn any form reserved.
334
ARNOLD STEIN
cific way (1-3). It is known from in vitro studies, for example, that direct addition of linker histone H1 to chromatin that was gently depleted of histone H1 regenerates the native “30-nm fiber,” provided that the native internucleosome spacings were not disturbed (4, 5). Perturbation of the native nucleosome spacings along the DNA leads to nonnative chromatin structures and, generally, nucleoprotein aggregation when the highly basic H I histone is added. Despite being a fundamental property of chromatin, which might be intimately associated with chromatin structure and function, nucleosome-array formation is still poorly understood. It seems that there are several reasons why this problem has not been solved. First, it is not obvious what the typical MNase ladder is really telling us about the apparently well-ordered nucleosome arrays. As discussed in Section II,A, the bands on a gel are much broader than DNA restriction fragments, there is a considerable background signal between the bands, and the periodic signal generally damps out (bands fail to be resolved from adjacent ones) at about the decamer of the ladder. These characteristics allow for several interpretations concerning the degree of order actually present in arrays of nucleosomes and how the order arises. Additionally, there are technical difficulties in examining the chromatin structures of single-copy DNA sequences in higher cells by Southern hybridization. Signals are weak, necessitating the application of large amounts of genomic DNA to gel lanes, which can affect band resolution and DNA fragment mobility, and long exposure times are required, which can lead to high background signals. Also, cross-hybridization to similar sequences or the presence of repetitive DNA in the probe can result in detection of essentially the bulk chromatin repeat, masking the signal arising from the desired genomic region. Finally, completely defined in uitro systems capable of assembling any DNA into chromatin with the properties of bulk cellular chromatin have not been developed. The systems that have been developed will be described here. It seems likely that nucleosome arrays can be formed through several different mechanisms, even though the final products might appear to be very similar (see Section 111). In this essay, I review what is currently known about nucleosome-array formation, discuss the implications of different models, and present some speculative new ideas concerning the genomic organization of chromatin.
1. Detection and Analysis of Nucleosome Arrays Nucleosomes in chromatin consist of histone cores (octamers of two each of the four core histones), about which 146 bp of DNA is tightly wrapped,
SIGNALS IN EUKARYOTIC DNA
335
and DNA linkers that connect adjacent particles (1). The 146 bp associated with the nucleosome core particle is considerably more resistant to cleavage by MNase (as well as by most other nucleases and DNA-cleaving reagents) than is the linker DNA. Hence, limited digestion generates a mixture of differently sized oligo-nucleosomes, and the purified DNA fragments isolated from these nucleosome oligomers appear to be multiples of a unit repeat, corresponding to the average nucleosome size (core plus linker). Because such experiments involve chromatin or nuclei from a large number of cells, each of which contains a large amount of DNA, and all linkers can be cleaved with approximately equal probability, each band of an “MNase ladder” on a gel contains a large number of different DNA sequences, all derived from a particular oligomer size (i.e., trimer). Thus, an average oligomer size (in base pairs) is obtained. Of course, attention can be focused upon a particular region of the genome using Southern blotting and specific hybridization to a selected DNA probe. In this case only the oligomers that contain the probe sequence or a portion of the probe sequence will be detected. Generally, the relative nucleosome positions on the sequences corresponding to the probe, as well as contributions to the pattern extending up to approximately 1.5 kb on either side from the midpoint of the probe, will be assessed in such an experiment. The nucleosome spacing periodicity (repeat length) exhibits differences among some species and cell types ( 1 ) . For example, the repeat length in baker’s yeast is only about 160 bp, indicating that the nucleosomes are very close together (average linker length of 14 bp); in most animal tissues it is about 200 bp (54-bp average linker), and in sea urchin sperm the repeat is about 240 bp, indicating that the average DNA linkers associated with nucleosomes in this case are approximately 94 bp. Interestingly, nucleosome periodicity variations have also been detected on different DNA sequences within the same cells (6-9). Measurement of the nucleosome spacing periodicity is best done by first measuring very carefully the mean sizes of the nucleosome oligomer DNA bands on an extended ladder from a gel lane run adjacent to a lane with size standards, and then plotting size versus oligomer number (see later, for example, Fig. 16C). The best straight line through the points gives the repeat length (average number of base pairs per nucleosome). This method (10) largely corrects for the end-trimming of the excised oligonucleosomes, which increases with the extent of the digestion. Thus, as the digestion proceeds, the DNA length contained in each nucleosome oligomer decreases by approximately the same amount, leading to a straight line with the same slope as one obtained from a less extensive digestion, but shifted downward. The negative intercept corresponds to the amount of DNA trimmed from both ends of an oligomer. In general, positive intercepts
336
ARNOLD STEIN
should not be obtained; if present, they might reflect some type of heterogeneity in the sample, or the presence of nonnucleosomal protein-DNA complexes close to the hybridization probe. It is also best to omit the monomer from the analysis, because mononucleosomes readily loose histone H1 and become trimmed to 146-bp core particles. Thus far we have only considered the arrangement of nucleosomes relative to each other. However, even for a highly ordered nucleosome array, there might not necessarily be any particular relationship between the nucleosome positions and the underlying DNA sequence. To examine nucleosome positioning with respect to DNA, the indirect end-label method (11, 12) can be used. In this type of experiment, DNA purified from MNasedigested chromatin is cut to completion with a restriction enzyme, providing a known reference point. Southern hybridization with a short probe abutting the restriction site then allows mapping of the MNase cuts that occurred at particular distances (in one direction) from the reference point. Figure 1 shows such an experiment. Differences between the cutting patterns of chromatin (lanes 1 and 2) and a naked DNA control (lane D) provide information on nucleosome positioning with respect to the base sequence. For example, observing protection of chromatin sites that are preferred cutting sites on naked DNA, and observing distances between chromatin cuts that are consistent with the size of a nucleosome, both indicate nucleosome positioning with respect to DNA. For the data shown in Fig. 1, the region of the gel marked by the vertical bar is evidence for the existence of an array of five positioned nucleosomes that formed on the 3' end of the rat growth hormone gene in an in uitro chromatin assembly system (13); the cuts in the chromatin were 200 bp apart. On the other hand, nucleosomes that are randomly positioned with respect to the DNA sequence should generate essentially the same digestion patterns for chromatin and naked DNA (Fig. 1, lower region of gel). This is because each of the preferred MNase cutting sites on naked DNA should be accessible on some molecules. The failure to detect nucleosome arrays that are uniquely positioned with respect to the DNA sequence by the indirect end-label method, however, does not imply that nucleosomes are randomly arranged. For example, the formation of nucleosomes spaced at apparently regular intervals with respect to each other, but not with respect to the DNA sequence, could occur in two ways. First, it could be accomplished through multiple, mutually exclusive, positioning frames on different molecules. In this case, any particular molecule would possess a highly ordered positioned array, using one of the frames. In the hypothetical example illustrated in Fig. 2A, about half of the molecules have frame a and half have frame b (14). Cutting would be detected at all four of the preferred MNase sites (arrows) shown for naked (n)
SIGNALS IN EUKARYOTIC DNA
337
FIG. 1. Indirect end-label analysis for nucleosome positioning with respect to the rat growth-hormone gene DNA sequence. (Referring to Fig. 13, construct a was assembled into chromatin in uitro. Samples were digested lightly with MNase, deproteinized, then digested to completion with Xhol; the Southern blot was probed with probe 11). Here, lane D is the naked DNA control; lane M, labeled size markers. The thick line at the left directs attention to the region of the gel (lanes 1 and 2) where cleavage sites differ from those of the naked DNA. These sites are separated by 200 bp, consistent with a regularly spaced, positioned nucleosome array. The positions of these cutting sites on the gene are indicated by arrows on the map in Fig. 13a. (Reprinted with permission from Ref. 13 )
DNA, leading to a result similar to that expected for random nucleosome positioning. Alternatively, the degree of order might be high enough to generate an extended MNase ladder, but not high enough to satisfy the stringent criteria for nucleosome positioning with respect to DNA (complete protection of all of the preferred MNase sites on naked DNA that should be contained in nucleosomes), as illustrated in Fig. 2B. Here, the nucleosome arrangement shown in frame a protects all of the preferred MNase cleavage sites (arrows) shown on the naked (n) DNA. However, in the other arrangements shown (frames b-e), one nucleosome is either missing or displaced, leading to exposure of MNase sites 1-4, respectively, in frames b-e. For a mixture of such arrangements, MNase digestion would generate many fragment lengths that would be multiples of, say, 200 bp, which would generate a ladder in a simple probing experiment. Nevertheless, cutting at all four of
338
ARNOLD STEIN
A
++
1 n
+
ab
H
H
i
H
CI
B 1
4
n a + b - f
c
n
I-!
+ H
4
H
H
H
+
3
4
l-
H
H
H
H
H
H
H
H
H
H
H
H
H
d +
H
H
1
e +
H
H
H
1
H
H
H
+
2
1
1
+
3
2
H
H
l-
k
t-
FIG. 2. Hypothetical nucleosome arrangements illustrating how simple probing might detect ordering of nucleosomes with respect to each other, but indirect end-label analysis would indicate a lack of nucleosome positioning with respect to the base sequence. Arrows indicate preferred MNase sites on naked DNA (n). (A) The sample consists of a mixture of molecules with two different “phasing frames” (a and b). (B) The sample consists of a mixture of molecules including a perfectly ordered positioned array (a) and imperfect arrays with one nucleosome either missing or displaced (b-e). (Reprinted with permission from Ref. 14.)
the MNase sites shown would be detected in an indirect end-label experiment, again leading to a result similar to that expected from random nucleosome positioning.
II. Models for the Formation of Periodic
Nucleosome Arrangements and Their Implications The finding that nucleosome linker lengths vary in chromatins from different sources suggested that linker lengths might vary within a particular cell type. This result was consistent with the breadths of the bands seen in MNase ladders. However, an alternative explanation for the appearance of the ladder was that the linkers were really homogeneous, but that cleavage left DNA tails of various sizes on the nucleosome oligomers excised from chromatin. For example, the dirner excised from a homogeneous 200-bp array might contain DNA sizes ranging from 346 bp (two 146-bp completely trimmed cores plus one 54-bp internal linker) to 454 bp (346-bp trimmed dimer plus two 54-bp full-length linker tails), with a mean size of 400 bp
339
SIGNALS IN EUKARYOTIC DNA
(346-bp trimmed dimer plus two 5412-b~centrally cut linker tails). This possibility was eliminated in a classic experiment by Prune11 and Kornberg (15). Using exonuclease 111 plus subsequent digestion of the remaining single-stranded DNA, under conditions where the initially broad monomer trimmed to a sharp 146 bp, it was shown that the dimer band remained broad, indicating that linkers in rat liver chromatin must in fact be heterogeneous in length. It has been more difficult to determine whether the linker length variation is simply of a statistical nature, with the full range of variation possible within any group of consecutive nucleosomes, or whether different types of arrays exist that have fairly uniform linker lengths with particular values, or whether linker lengths vary in a way defined by the DNA base sequence.
A. Statistical Positioning of Nucleosomes It has been argued that, although a degree of sequence specificity exists in the histone octamer-DNA interaction, the sequence specificity cannot be very high because essentially all DNA sequences in eukaryotes are packaged into nucleosomes. Moreover, the nucleosome spacing periodicity can vary among certain cell types of an organism, which contain the same DNA. For example, in chicken liver the nucleosome repeat is 195 5 bp, whereas in chicken erythrocyte it is 207 + 5 bp ( 1 ) . The bands on MNase ladders prepared from these two tissues are clearly seen to go out of phase with each other as one proceeds up the ladder, consistent with this repeat difference. To account for these observations, along with the observations mentioned above that MNase ladders contain a background signal and that the bands generally damp out at about the decamer, it was proposed that nucleosomes might form randomly on DNA, constrained only by the ratio of the total histone to the DNA in the chromatin (16). Apparent ordering then arises simply from the relatively high density of nucleosomes on DNA, even though individual linker lengths can have any value. Only the average linker value, when averaged over a very large number of nucleosomes, corresponds to the physiological value observed for that tissue-for example, 50 bp (196146 bp) for a typical 196-bp nucleosome repeat. To demonstrate the validity of this idea, a statistical mechanical formulation was used to derive expressions that give the probabilities for obtaining DNA fragments of any length, for small values of a simulated random cutting frequency (16).The probability vs. length curves computed can be equated with densitometer scans of MNase ladders, obtained from real experiments. Figure 3 shows simulated “densitometer scans” from statistically positioned nucleosomes for four different average linker lengths, corresponding approximately to those determined experimentally from a range of organisms ( 1 ) :35 bp (HeLa cells), 50 bp (rat liver), 65 bp (chicken erythrocyte), and
*
340
ARNOLD STEIN
0
0.6
0
FIG. 3. Simulation of an MNase digestion experiment for low extents of digestion. The relative number of DNA fragments (ordinate) with lengths in the range between 0 and 2000 bp (abscissa)produced from a random nucleosome arrangement on a very long DNA molecule was computed for average linker lengths of 35, 50, 65, and 95 bp in a-d, respectively. These plots can be taken to represent densitometer scans of MNase ladders, except that the length scale is linear instead of logarithmic, as would be obtained from an actual gel. [Reproduced from Nucleic Acids Res. 16, 6677-6690 (1988), by permission of Oxford University Press.]
SIGNALS IN EUKARYOTIC DNA
341
95 b p (sea urchin sperm), in panels a-d, respectively. The simulated scan for the 50-bp linker length (panel b), which is close to the value found in most animal tissues, looks remarkably similar to data from an actual nuclease digestion experiment. Therefore, statistically positioned nucleosomes could account for the periodicities observed experimentally from a typical chromatin. I t is important to keep in mind that according to this model, any particular nucleosome array would possess little order. Order is illusory, arising from the large number of nucleosomes being analyzed in the sample. Despite the fact that completely random nucleosome positioning (constrained by the overall nucleosome density on DNA) can account for the appearance of a typical “200-bp” MNase ladder, there are indications that this model may not, in general, be true. This model necessarily predicts that as the average repeat length becomes shorter, the background signal must diminish and more bands (peaks) should be resolved. Conversely, as the average repeat length becomes longer, the background signal must increase and fewer bands should be resolved. This effect is clearly demonstrated in Fig. 3 for the 35-bp (panel a) and 95-bp (panel d) linker lengths on comparison with the 50-bp linker length (panel b). We have performed a large number of MNase digests on nuclei isolated from HeLa cells and other cell lines where the average linker length is approximately 35 bp, and have never observed a significant reduction in the background signal or resolved any more bands than from rat liver chromatin (Fig. 3b). Also, I am not aware of any published work where this effect has been observed in chromatin from cultured cells. Perhaps, more significantly, we also do not see an increase in the background signal or resolve fewer bands from sea urchin sperm chromatin, as simulated in Fig. 3d. Scans from actual MNase ladders from HeLa and sea urchin sperm chromatin are shown in Fig. 4. In each case, the number of bands resolved and the background signals are about the same as that from rat liver chromatin. An alternative to the “statistical positioning” model, consistent with all of the data, is a “mosaic” model in which well-ordered regions of chromatin, generally containing less than about 10 nucleosomes, exist, interrupted with less ordered regions. Well-ordered chromatin regions possessing different nucleosome spacing periodicities would also contribute to the damping of the MNase ladder signal. I should point out that the experimental conditions chosen are important for obtaining good ladders from sea urchin sperm chromatin. When digestions are performed in the absence of added sodium chloride, relatively poor ladders are obtained, apparently caused by nonspecific binding of excess histone H1, released from small nucleosome oligomers excised during the digestion, to the remaining chromatin. Sea urchin histone H 1 binds particularly strongly to DNA.
342
ARNOLD STEIN
Sea Urchin
Sperm
-
Electrophoresis
FIG. 4. Densitometer scans of actual MNase ladders. The HeLa and sea urchin sperm chromatins had nucleosome spacing periodicities of 185 ? 5 and 241 5 bp, corresponding to a and d, respectively, in Fig. 3. Samples were run on different gels.
*
B. DNA Sequence-directed Chromatin Structures Why should it matter how nucleosome arrays are formed, as long as the DNA gets packaged? It matters because if information in the DNA can direct the formation of particular chromatin structures in different chromosomal regions, or if different chromatin structures can be induced to form in d8erent cell types, these structures can have functional significance. For example, the pairing of homologous chromosomes that occurs during meiosis might conceivably require the formation of and the interactions between specialized chromosomal structures that are encoded in DNA. Similarly, centromere function might require a specialized chromosomal structure encoded in centromeric DNA. Also, it is very plausible that certain types of more condensed chromatin higher order structures can be induced to spread (or, alternatively, spreading can be inhibited) over adjacent DNA, as is thought to mediate the phenomenon of position effect variegation in Drosophila (17). Here, heterochromatin appears to spread its structure to varying extents over (potentially active) euchromatin, which adventitiously became joined to heterochromatin as a result of a chromosomal rearrangement or an insertion. The extent of heterochromatin spreading in a particular cell determines whether adjoining genes
SIGNALS IN EUKARYOTIC DNA
343
are encompassed and thereby become inactivated. The active or inactive state then becomes incorporated into the chromatin structure, and it is clonally inherited, giving rise to patches of cells in which a gene is either active or inactive-the “variegated phenotype. Direct molecular evidence supporting this model has recently been obtained (18).Chromatin analysis by MNase digestion revealed a more regular nucleosome array for a transgene that was inserted next to heterochromatin (and could not be activated when induced) than when the transgene was inserted into euchromatin. This apparent spreading of a more regular chromatin structure over a transgene from the adjoining heterochromatin suggests that DNA sequence context effects can in some way influence chromatin structure and gene expression. Additionally, the nature of the chromatin higher order structure formed in a particular DNA domain or bounded region could be strongly dependent on the nucleosome linker length and the degree of linker heterogeneity, for the nucleosome array contained therein (see Section IV). There is considerable evidence for the existence of cell type-specific DNA domains (19-21). Domains are demarcated by the interactions of the domain boundary sequences with specialized proteins. Hence, if there are signals in DNA that can influence nucleosome array formation, then by apportioning particular DNA regions (or different DNA lengths) into different domains in different cell types, the signals contained in that DNA region could serve as inputs to direct the formation of a particular type of chromatin higher order structure in that domain. This model is attractive because it requires only a small number of regulatory proteins (those that interact with domain boundary sequences) to induce the formation of relatively large chromatin domains with either more “open” or more condensed chromatin structures. This mechanism could be of fundamental importance in gene regulation. In contrast, if the DNA sequence does not influence nucleosome array formation or higher order chromatin structure, as would be the case if nucleosome arrangements were entirely statistical in nature, then nucleosomes could not transmit information throughout an array, and the type of mechanism discussed above could not operate.
111. In Vitro Chromatin Assembly Systems A few in vitro systems have been developed. Some assemble chromatin with physiologically spaced nucleosomes (average linker lengths around 50 bp), and generate MNase ladders comparable to or more extended than what is observed in native chromatin. These systems are useful for studying what
344
ARNOLD STEIN
factors affect nucleosome array formation, and provide information on the mechanisms involved.
A. Chromatin Assembly Using Crude Extracts Chromatin containing ordered, physiologically spaced nucleosomes was first assembled in a cell-free system more than 18 years ago (22). It was shown that a high-speed supernatant fraction from Xenopus laevis eggs could assemble relaxed SV40 DNA (a 5.2-kb circle) into chromatin. The egg supernatant contained endogenous histones, which are present in Xenopus eggs as a stored histone pool. Significantly, chromatin assembly did not require DNA replication. Further progress with this system was limited because the reaction worked best with the extract in its crudest form. In fact, the very small reaction volumes used could not even be scaled up without deleterious effects, owing to a requirement for maintaining a large surface-to-volume ratio in order to maintain the proper pH. Some years later, reproducible scaled-up reaction conditions were defined for the Xenopus oocyte supernatant, after the realization that the reaction requires Mg2+ and ATP and is strongly affected by the concentrations of these components, as well as by temperature (23).It was then demonstrated that chromatin can be assembled well on small plasmids, irrespective of the DNA sequence, in reactions that require incubation times of about 6 hours at 27°C. Interestingly, the oocyte extract does not contain histone H1, and (at 27°C) generates nucleosomes spaced, on average, at approximately 180-bp intervals. At 37"C, nonphysiological 160-bp spacings result. When exogenous histone H1 is added to the extract (at the beginning of the reaction), significantly longer nucleosome repeats, up to 220 bp, are generated; the value of the repeat length depends upon the amount of H1 added. Apparently, the number of nucleosomes present on the plasmid determines the average value of the repeat. Histone H1-containing chromatin with longer repeats has correspondingly fewer nucleosomes on the plasmid. However, the number of nucleosomes contained on the plasmid, with or without H1, is heterogeneous, and is distributed about a mean value, following a Gaussian distribution. It was suggested (23)that this heterogeneity in nucleosome number is responsible for the smearing (loss of band resolution) that is generally observed about half way up the extended MNase ladders produced. Thus, circular plasmid molecules containing d a e r e n t numbers of nucleosomes generate ladders with different numbers of bands in such a way that at the extremes of the ladder, the length differences between corresponding oligomers (i.e., 2 or n - 2, where n is the total nucleosome number on the plasmid) are small compared with the length difference between those and the next oligomer (i.e., 3 or n - 3). Thus, bands arising from oligomers 2 and 3 or n - 2 and n - 3 are resolved. However, toward
SIGNALS IN EUKARYOTIC DNA
345
the center of the ladder, the length differences between corresponding oligomers are too great, and the overlap with the next oligomer is too extensive, to produce discrete bands. This effect has been referred to as a vernier effect by analogy with what occurs on a precision measuring device when the main scale and the vernier scale zero division marks are aligned and the two scales are compared. Recently, efficient cell-free chromatin assembly systems have been developed from Drosophila embryos (24,25).Embryos are homogenized in a small volume of extraction buffer in order to minimize dilution of cytoplasmic components and to remain as close as possible to physiological conditions. As with the Xenopus oocyte system, the extract utilized endogenous histones bound to specialized carrier proteins, although extracts can be supplemented with purified embryonic histones to some extent. The requirements for ATP, Mg2+, and a 26°C incubation temperature, as well as the characteristics of the assembly reactions, are very similar to the Xenopus oocyte system, suggesting that the assembly mechanism is the same in the two systems. The characteristics of these reactions suggest that nucleosomes simply tend to become distributed over the plasmid, constrained only by the nucleosome density on the DNA, in just the way predicted by the statistical positioning model. Perhaps the ATP requirement is associated with a mechanism that induces nucleosome “sliding” along the DNA, allowing the nucleosomes to overcome DNA sequence effects and distribute evenly (possibly randomly) throughout the plasmid. This idea is consistent with the recent exciting findings that ATP-driven nucleosome “sliding” may be involved in generating nuclease-hypersensitive sites in chromatin, important for gene activation (3, 26, 27). A limitation of these systems is that the chromatin assembly extracts are crude, containing a large number of proteins and enzymatic activities. Fractionation of the extracts, while maintaining good chromatin assembly activity, is far from trivial, and has not yet been accomplished. Additionally, the histones of Xenopus oocyte and Drosophila embryo extracts contain some unusual variants that might be required for the reaction to proceed, and there is no histone H 1 in these cells. These differences from the ordinary chromatin of somatic cells may be adaptations for the extremely rapid replication that occurs in both Xenopus oocytes and Drosophila embryos. For example, the fly genome is replicated and packaged once every 9 minutes using a maternal pool of stored histones (28). In contrast, most somatic cells divide roughly every 24 hours, histone pools are absent, and histone H 1 is present (29).Thus, the chromatin assembly mechanism used in Xenopus oocytes and Drosophila embryos might be a special histone H1-independent mechanism adapted to rapidly dividing cells, and may differ from the mechanism used in ordinary cells.
346
ARNOLD STEIN
B. Chromatin Assembly in a System Consisting of Purified Components
1. GENERALCONSIDERATIONS It has been known for some time that it is a simple matter to form nucleosomes on D N A in vitro, when the core histones are prepared by salt extraction, avoiding denaturing conditions. One simply gradually lowers the NaCl concentration from 2.0 M , in which histones and D N A do not interact, to physiological salt concentrations (or lower), in which the core histones bind to D N A essentially irreversibly, and nucleosomes form spontaneously (30).This material does not resemble chromatin, however, in that the nucleosome arrangement is highly irregular, with many nucleosomes packed closely together at physiological ratios of histone to DNA. MNase digestion of such material generally produces a continuum of DNA fragment lengths, on which multiples of about 150 bp, the close packing periodicity, can be detected. Including histone H1 in with the core histones only makes matters worse, because H1 nonspecifically coats regions of free D N A in a cooperative fashion, and leads to nucleoprotein aggregation (31, 32). Moreover, by 0.50 M NaCl, the core histones are already very tightly bound as octamers, before histone H1 interacts with D N A at all (33).
2. POLYGLUTAMATE-MEDIATED REACTIONS The early attempts to fractionate Xenopus egg extracts, although not successful with regard to the problem of reconstituting spaced nucleosome arrays, identified a very abundant acidic nuclear protein, nucleoplasmin, involved in chaperoning histones and in maintaining the stored histone pools (29). In uitro, nucleoplasmin serves as a nucleosome assembly factor by discouraging nucleoprotein aggregation, and aHlowing nucleosomes to form readily at physiological ionic strength. It was subsequently shown that the acidic polypeptide, sodium polyglutamate, has properties very similar to nucleoplasmin with regard to in vitro nucleosome assembly, and is very effective in discouraging nucleoprotein aggregation (34).It was further demonstrated that the presence of polyglutamate permits histone H1 to restore nucleosome ordering and spacing, at physiological salt concentrations, in H1-stripped chromatin for which the native nucleosome arrangement had been perturbed by nucleosome “sliding” at elevated salt concentrations (35). The reaction appears to work by the polyglutamate providing alternative H1 binding sites, thereby discouraging H1 binding to regions of naked DNA, which leads to aggregation. Histone H1 prefers to bind to nucleosomes, rather than to naked D N A (36). H1-nucleosome interactions result in the physiological spacing of closely spaced adjacent nucleosomes, provided that spacing is not hampered by the
347
SIGNALS I N EUKARYOTIC DNA
presence of another nucleosome on the DNA occupying the same space. This type of interference occurs when one tries to add histone H 1 (in the presence of polyglutamate) to nucleosomes reconstituted, at physiological ratios of core histone to DNA, on most plasmids or on randomly sheared mixed-sequence vertebrate DNA. Additionally, some of the randomly deposited nucleosomes remain too far apart. Thus, it is found that only about three or four nucleosomes in a row become properly spaced. The mechanism whereby histone H1 is able to move closely spaced nucleosomes apart is not clear because exactly how H 1 interacts with the nucleosome is still uncertain. In one model (37-41), the central globular domain of H 1 binds to the DNA regions that enter and exit a nucleosome, causing them to cross over, or stabilizing the crossover. This mode of binding could increase the amount of DNA wrapped around the histone core in each of a pair of closely spaced nucleosomes, requiring these nucleosomes to move apart. In another model (42), histone H1 binds asymmetrically to nucleosomal DNA, extending partially over one nucleosome linker. This mode of binding could also conceivably drive adjacent nucleosomes apart, by stiffening linker DNA, causing it to unwrap from around a closely spaced nucleosome and causing this nucleosome to slide over. 3. SYNTHETICPOLYNUCLEOTIDE PoLY(dA-dT).PoLY(dA-dT) SPONTANEOUSLY ASSEMBLES INTO CHROMATIN-LIKE STRUCTURES IN THE POLYGLUTAMATE-MEDIATED SYSTEM
An interesting result was obtained when the synthetic polynucleotide poly(dA-dT) was reconstituted with core histones and then incubated with chicken erythrocyte linker histone H5, an H 1 analog, at physiological ionic strength in the presence of polyglutamate (43). In contrast with what was observed with plasmids or sheared vertebrate DNA, the polynucleotide permitted the initially randomly distributed and closely packed nucleosomes to be spaced at physiological intervals, and chromatin-like higher order structures to form. Figure 5B (44)shows a typical MNase ladder, compared with that obtained from HeLa chromatin (Fig. 5A) using the same gel system. A denaturing gel system is required to prevent the poly(dA-dT) fragments from forming a variety of hairpin and secondary structures. The nucleosome repeat obtained from analysis of the gel (B) was 210 2 5 bp, slightly longer than a typical native chromatin repeat. For example, for native HeLa chromatin, the repeat from the denaturing gel (Fig. 5A) was 185 5 bp, the same as that obtained from an ordinary agarose gel. Figure 6 shows electron micrographs of poly(dA-dT) chromatin before (a) and after (b) incubation with histone H5. Before incubation, only randomly arranged nucleosomes are seen, whereas after incubation, nucleosomes have condensed into solenoid-like structures closely resembling those of native chromatin. Addi-
*
348
ARNOLD STEIN
FIG.5. MNase ladders from native and reconstituted chromatin on denaturing formamidecontaining (4%)polyacrylamide gels. (A) Single-stranded DNA fragments produced fiom HeLa chromatin digested with micrococcal nuclease for increasing times (lanes 1-6, respectively). (B) Poly(dA-dT) fragments produced from chromatin reconstituted in oitro using chicken erythrocyte histones, and digested with micrococcal nuclease for 30 seconds (lane 1)or 1 minute (lane 2). Lanes labeled L are 123-nucleotide ladders; several marker DNA fragment sizes in nucleotides are indicated. (Reprinted with permission from Ref. 44.)
tionally, in the first two panels of Fig. 6b, naked poly(dA-dT) duplex can be seen extending from solenoid-like structures that contain 30 or more nucleosomes. These results show that linker histone H5 (or H1) has a strong tendency to space nucleosomes apart and package the regularly spaced array into chromatin-like structures. In contrast with the Xenopus oocyte or Drosophila embryo systems, nucleosome ordering depends on the presence of linker
SIGNALS IN EUKARYOTIC DNA
349
FIC. 6. Electron micrographs of poly(dA-dT)chromatin lacking (a) or containing (b)linker histone H5. Bar = 200 nm. (Reprinted with permission from Ref. 4 3 . )
histone. In the absence of linker histone, nucleosomes do not form an ordered arrangement or higher order structures (Fig. 6a), and in MNase digests only the close packing periodicity can be detected (43). There are several reasons why poly(dA-dT) assembles into chromatin spontaneously. First, the monotonous base sequence (repeating A-T) eliminates nucleosome positioning preferences for certain sequences. Unless preferred positioning sites are arranged periodically (see Section V), they would be expected to interfere with the reaction. Second, it turns out that poly(dAdT) has an unusually high &nity for histone H1 (45). Third, it can be demonstrated (43) that nucleosomes readily slide along the poly(dA-dT) duplex, at physiological ionic strength, in contrast with natural DNA, for which sliding is limited to 10 or 20 bp (46). This property is consistent with the electron micrographs (Fig. 6b) showing that, on some molecules, the nucleosomes appear to have migrated along the polynucleotide and coalesced into a growing solenoid-like higher structure. Nucleosome sliding may be a feature in common with the Xenopus oocyte and Drosophila embryo systems, although in those systems, ATP may be required for it to occur. Further studies (47) show that nucleosome arrays with spacing periodicities that span the whole physiological range can be obtained using this system. In this system, the value of the spacing periodicity is controlled by the value of the initial average nucleosome packing density, rather than by the histone H1 type. For example, the full range of periodicities observed in nature is accessible to chicken erythrocyte core histones plus histone H5. However, different H1 types differ in the efficiency with which they can recruit nucleosomes into higher order structures. For example, the unusually long and basic sea urchin sperm H1 is particularly efficient and can package nucleosomes at low packing densities, thereby generating arrays
350
ARNOLD STEIN
with a long nucleosome repeat (up to 240 bp). At higher packing densities, shorter repeats are generated. In contrast, typical H 1 histone types require higher packing densities to form regular arrays and higher order structures. These differences between histone H1 types can be explained by their relative effectiveness in polynucleotide charge neutralization. If arrays with long linkers tend to form due to a low nucleosome packing density, a longer or more basic H1 is required for charge neutralization to allow stable array formation. For short linker lengths, obtained at higher nucleosome packing densities, typical H1 types suffice. Even though the value of the nucleosome spacing periodicity depends on the nucleosome packing density, the mechanism of poly(dA-dT) chromatin formation appears to be different from “statistical positioning.” In poly(dAdT) chromatin assembly, a particular packing density is required for a particular histone H1 type to nucleate a stable array or structure into which other nucleosomes can condense. In “statistical positioning,” nucleosomes are randomly arranged, constrained by the overall packing density. However, the degree to which the poly(dA-dT) chromatin assembly mechanism resembles that of cellular chromatin is currently not clear. 4. CHROMATIN ASSEMBLYON NATURAL DNA USING THE POLYGLUTAMATE-MEDIATED SYSTEM
The work presented thus far suggests that although there is a strong tendency for linker histones (H 1)to induce spontaneous chromatin assembly (when nucleoprotein aggregation pathways are discouraged), something more is required to package natural DNA sequences. For example, ATPdependent “machinery” might be required to slide nucleosomes along the DNA in order to overcome nucleosome positioning preferences for sequences that are incompatible with the formation of a regular array. However, a serendipitous finding suggested some additional possibilities. When one particular DNA construct was subjected to the same chromatin assembly procedure described above for poly(dA-dT) and the DNA fragments analyzed by MNase digestion, an unexpected and interesting result was obtained (48). Figure 7a (lane 1) shows a remarkably strong MNase ladder, obtained from the in uitro-assembled chromatin. Seventeen bands that were multiples of 210 4 bp could be resolved. This periodicity is very close to that observed for chicken erythrocyte (CE) chromatin, but the ladder extends considerably further than for the native chromatin. The ladder is also quite different from the digestion pattern obtained from the naked DNA, demonstrating that it is a property of the chromatin. The gel photograph on the right shows a completely independent experiment, where the ratio of core histone to DNA was slightly too high. In the absence of linker histone (Fig. 7b), essentially a continuum of
*
SIGNALS IN EUKARYOTIC DNA
351
DNA fragments sizes is observed, and many of the bands correspond to the preferred cleavage sites on the naked DNA (D) for the lowest extent of digestion (lane 1).For the highest extent of digestion (lane 3), multiples of 150 bp, indicative of nucleosome close packing, can be detected. This plasmid construct was pBR327 (3.3kb) containing a 301-bp insert. It turned out that the size of the insert is very important, but the sequence of the inserted DNA is not. For example, Fig. 7c shows the chromatin assembly reaction obtained using the plasmid vector alone, without an insert. Only three physiologically spaced nucleosomes could be detected, the same result obtained with mostly all other DNA samples tested. It seemed significant that the DNA length occupied by 17 nucleosomes with a 210-bp repeat, 17 x 210 bp = 3570 bp, fit nearly perfectly with the size of the length-adjusted circular DNA, 3575 bp. For example, if the 210-bp repeat were invariant, it would not be possible for an integer number of nucleosomes to fit neatly on the pBR327 vector (3274 bp). However, plasmid size alone was not sufficient for chromatin assembly, because adjustment of pUC plasmids or even pBR322, the parent of pBR327, to lengths close to integer multiples of 210 bp with appropriately sized inserts, or by making deletions, was not sufficient for nucleosome alignment. By making a series of deletions from different regions of pBR327, such that all lengths were close to multiples of 210 bp, it was demonstrated that one particular region of the plasmid was necessary. This approximately 800-bp region, termed the chromatin organizing region (COR), fortuitously positioned two nucleosomes about 40 bp apart in the absence of linker histone. In the presence of linker histone (H5 or Hl), nucleosomes became precisely positioned on the COR at 210-bp intervals, whereas away from the COR, nucleosome positioning with respect to the DNA sequence rapidly diminished. These data suggest that the COR nucleated nucleosome-array formation, with a 210-bp periodicity, which then spread around the plasmid. Small plasmids (200 kbp) open chromatin domain flanking the locus control region of the human p-globin gene cluster is highly asymmetric, beginning just upstream of the hypersensitive sites and extending far downstream (86). It is as if proteins interacting with the locus control region had prevented the long-range spreading on an inactive structure from a region upstream rather than merely perturbed the chromatin structure around the factor binding sites by breaking cooperative interactions. These findings suggest the possibility that the histone H1-induced
364
ARNOLD STEIN
spreading of nucleosome alignment is a cellular mechanism, and also that when isolated from the influence of neighboring chromatin, signals within a DNA domain might be involved in generating a distinctive chromatin structure, conducive to the regulation of genes in that domain. The chick p-globin gene cluster is a good system for studying the effects of chromatin structure on gene regulation during development. It is contained within a large DNase-I-sensitive active chromatin domain in erythroid cells (87, 88), and nucleosome arrays throughout this domain have distinctive shortened internucleosome spacings (9). It is plausible that the significantly shortened spacings, relative to bulk or inactive chromatin (approximately 180-bp repeat versus 200-bp repeat), could facilitate the formation of a distinctive higher order structure that is more suited for regulation by erythroid factors (89) than the bulk chromatin structure, and also contribute to the generalized DNase I sensitivity observed. Therefore, it is important to understand how the shortened spacings arise specifically in this DNA domain. A variety of mechanisms can be imagined. For example, shortened internucleosome spacings could conceivably arise from histone modifications (go), an altered mode of DNA replication (91),association with special nonhistone proteins, or signals in the DNA. Such mechanisms are difficult to distinguish or to demonstrate clearly by in vivo studies. Recently, using the in vitro system described in Section III,B, it was shown that the tendency to form a nucleosome array with the characteristic 180 bp repeat is encoded in p-globin DNA (92). Figure 12 shows a map of the cloned 6.2-kbp chicken p-globin DNA fragment. The three exons of the PA gene, about 1kb of upstream sequence, and the first two exons of the E gene are contained in this fragment (93, 94). The region between the two genes contains an enhancer that controls transcription of both of these genes and possibly controls the whole domain (9598). In the presence of linker histone H5, the chromatin assembled in vitro had a 180 5-bp nucleosome spacing periodicity over the whole 6.2-kbp insert (92). Moreover, around the enhancer (probe E) and the intergenic region (probe I), the 180-bp periodicity could be detected even in the absence of linker histone, and nucleosome positioning with respect to the base sequence could be detected. Interestingly, the cloned 2-kbp subfragment extending from the upstream EcoRI site to the Hind111 site about two thirds of the way into the second exon (Fig. 12) aligned nucleosomes poorly. This result suggests that in the 6.2-kbp fragment, the intergenic sequences facilitated the regular packaging of the P A gene. Control experiments using a construct with p-globin DNA inserted next to the fortuitous chromatin-organizing region of
*
365
SIGNALS IN EUKARYOTIC DNA
P
&
1 kbp
FIG. 12. Map of the 6.2-kb chicken p-globin EcoRI fragment. Restriction sites for EcoRI (R), Hind111 (H), and BamHI (B) are indicated. The three exons of the PA gene (p) and two exons of the gene are represented as heavy lines. Dashed lines represent plasmid sequence. Locations of the three large and three small hybridization probes used are indicated helow the map. (Reprinted with permission from Ref. 92.)
pBR327 (which encodes a 210-bp repeat) clearly demonstrated that the 180bp nucleosome spacing periodicity observed was encoded in p-globin DNA, because a 210-bp nucleosome spacing periodicity could be detected on the adjacent plasmid sequences in the same construct. It might be significant that nucleosome ordering in vitro on the P A gene promoter region requires cis-acting signals present on downstream flanking DNA. Such signals appear to exist in the intergenic region within and just downstream of the enhancer. It is reasonable to suppose that the binding of protein factors to the enhancer could block the spreading of nucleosome alignment toward the P A promoter. The poor nucleosome alignment inherent in the promoter region could make it more accessible to transcription factors. Alternatively, poor nucleosome alignment throughout the region upstream of the enhancer might prevent the formation of a typical inactive higher order structure and thereby facilitate looping out (96)of the chromatin between the enhancer and the PA promoter. The fact that P-globin chromatin appears to be packaged into chromatin with nucleosome repeats typical of the bulk chromatin in nonerythroid cells (78) is not inconsistent with the presence of a 180-bp periodic signal in p-globin DNA. It is plausible that the 180-bp periodicity present in p-globin DNA could be overridden by the spreading influence of adjacent chromatin. Limited spreading influences have been demonstrated in vitro (Section 111,B and here), and such effects are likely to be much more effective in uivo. The
366
ARNOLD STEIN
results obtained in vitro for chicken p-globin DNA and for the chicken ovalbumin gene (Section V,A) suggest that the nucleosome spacing periodicity obtained for a gene in uitro might correlate with the spacing periodicity in uiuo for cell types in which the gene is expressed. For example, the 180-bp p-globin spacing in chick erythroid cells (9), in which the gene is expressed, corresponds to the 180-bp spacing obtained in uitro. Similarly, the 196-bp spacing for the chicken ovalbumin gene in hen oviduct, in which this gene is expressed (78), corresponds to the in uitro value (Section V,A). The chicken ovalbumin gene is not restricted to this value because, in erythrocytes, it has a 207-bp nucleosome spacing periodicity, the bulk chromatin repeat (9). These considerations suggest that a principle for tissue-specific chromatin organization might apply. Thus, DNA domains of active or potentially active genes may be, in essence, isolated from flanking DNA. There is evidence for the existence of specialized sequences that serve as boundaries and presumably function to isolate transcriptional domains in appropriate cell types (19-21). This isolation then should permit relatively weak nucleosome alignment signals, inherent in the DNA of the domain to be realized, as they are when the (isolated) DNA is assembled into chromatin in uitro. The domain-specific packaging that would then occur might facilitate transcriptional regulation of the genes in that domain.
C. Rat Growth Hormone Gene lntrons
Stimulate Nucleosome Alignment in Vitro and in Transgenic Mice and Increase Transcription Efficiency
The use of cDNA constructs or heterologous promoters in transgenic mice often leads to poor gene expression, even for constructs that permit efficient gene expression when transfected into cultured cells (99, 100).The rat growth hormone (rGH) gene fused to the mouse metallothionein (mMT) promoter has been studied in some detail. In this case, the effect of introns on expression is at the level of transcription. The transcriptional efficiency in transgenic mouse liver increased 10- to 100-fold when the natural rGH introns were included (100). Improvement in both the average expression level and the number of mice that gave detectable expression was observed. The general lack of a marked stimulatory effect on transcription when introns were placed at various unnatural locations with respect to the promoter, along with the lack of a stimulatory effect when constructs were transfected into cultured cells, appears to rule out the existence of ordinary enhancers within introns. Moreover, although the first intron (intron A) alone, in its natural location, rescued expression to a level of 50%in transgenic mice, the
367
SIGNALS IN EUKARYOTIC DNA
presence of both introns A and B curiously led to lower average expression levels than when no introns were present (101). A possible explanation for these findings is that genomic DNA might contain sequence arrangements that facilitate the packaging of some genes into chromatin (14, 76, 92, 100, 101). Thus, unnatural sequence arrangements might lead to less well-defined chromatin structures that may be deleterious either to transcription initiation or elongation. To test this idea, the same set of mMT-rGH constructs (Fig. 13) that were initially used in the transgenic mouse studies (101) were assembled into chromatin in uitro. Additionally, high-copy-number transgenic mice were made for chromatin analysis. The in uitro system, which contains purified histones as the only cellular components (Section 11, B), assesses the inherent tendency of DNA sequences to assemble into chromatin. The natural rGH genomic sequence, compared with an intronless version, stimulated the formation of an ordered nucleosome array, both for chromatin assembled in vitro and in transgenic mice. Also, there was a good correspondence between the nature of the chromatin assembled in vitro for constructs that contained particular combinations of introns, and the expression results in transgenic mice (13).
.
b
C
1-5 I
m
1 2-5
d
A 12
3-5
FIG. 13. Descriptions of mMT-rGH gene constructs (a, a', b-f) containing various combinations of introns. The thin lines and filled boxes (rGH gene exons and fused exons) denote rat sequences; exons are numbered 1-5, introns are identified by letters A-D. The thicker lines denote the 1.8-kb EcoRI-XhoI mMT-I promoter fragment, and dashed lines denote plasmid sequences. Construct a' is the same as a except that sequences within the brackets are deleted. Relevant restriction sites are indicated; in a, the distance between the XhoI and EamHI sites is 5.0 kb. The arrows below construct a correspond to MNase cutting sites between positioned nucleosomes. Hybridization probes I-VI are indicated in a and b. (Reprinted with permission from Ref. 13.)
368
ARNOLD STEIN
A
-1NTRONS (b) M
D
C
+INTRONS (a)
C
ELECTROPHORESIS
D
M
-
FIG. 14. Effect of rCH gene introns on nucleosome alignment for chromatin assembled in uitro. (A) Comparison of the constructs (Fig. 1)lacking all introns (b) or containing all introns (a). Lanes labeled M contained HaeIII + AccI +X174 RF fragments as size standards; fragment sizes are indicated. Lanes labeled D show naked DNA digested with 2.5 units of MNase for 30
SIGNALS IN EUKARYOTIC DNA
369
Figure 14A shows that when introns are present, a highly regular 195 f 4-bp ladder was detected using probe I (see Fig. 13 map). The seventh band of the ladder runs slightly slower than the 1353-bp marker, consistent with 7 x 195 bp = 1365bp, and the twelfth band runs slightly faster than the 2352bp marker, consistent with 12 x 195 bp = 2340 bp. A densitometer scan resolved 13 peaks (Fig. 14B, tracing a). Significantly, several strong bands arising from preferred MNase cutting sites on the naked DNA control, in the region of the gel above 1078 bp, are well protected in the chromatin sample, providing strong evidence that the native-like chromatin ladder arose from a nucleosome array that was highly ordered, and not simply from the preferred cleavage sites in DNA. Virtually the same ladder was detected using probe 111, the rGH cDNA (not shown). Interestingly, appreciable nucleosome ordering did not occur on the approximately 3 kb of rat sequence flanking the rGH gene on the 3' side, as assessed using probes IV-VI (data not shown). When no introns were present (Fig. 14A), a less regular nucleosome ladder was detected. The densitometer scan (Fig. 14B, tracing b) shows that, although the first four peaks are very similar to those generated with the natural rGH gene, peaks after the fourth are in spurious positions or are not well resolved. Overall, the peaks after the fourth are not at multiples of 195 bp. Also, it can be seen from the gel photograph that, in this case, many of the bands that appear in the chromatin sample (lane C) are also present in the naked DNA control (lane D). This situation is what would be expected for a low degree of nucleosome order, where the preferred cutting sites in DNA dominate the pattern. Mice with high transgene copy numbers were necessary for the chromatin analysis so that the major hybridization signal would derive from the transgenes rather than endogenous genes. Thus, the DNA fragments were ligated to generate head-to-tail arrays before microinjection. Table 11 provides a summary of the mice, the gene copy number, and the amount of rGH mRNA in liver. Transcription was not induced with zinc because the very active transcription expected for the natural rGH gene might interfere with analysis of the chromatin structure. The six transgenic mice with the natural MTrGH gene averaged 354 mRNA molecules/transgene/cell compared to 24 for the seven transgenic mice with the intronless version. Sample 6 gave an mRNA level 1/30th of the mean, probably due to the extremely high copy number. Such a high copy number might result in transcription factors being limiting. If this sample is not included, the average rGH mRNA level value seconds; lanes labeled C show chromatin digested with 5.0 units of MNase for 1 minute. Hybridization probe I was used. (B) Densitometer scans of the autoradiogram. Lanes C for constructs a and b are shown. (Reprinted with permission from Ref. 13.)
370
ARNOLD STEIN
TABLE I1 EXPRESSION OF MT-rCH TRANSGENE 1?I Sample no.
+ Introns 1 2 3 4 5 6 -1ntrons 7 8 9 10 11 12 13
INTRONS IN
MOUSE LIVER
Mouse no./sex
Transgeneso (no./cell)
rGH rnRHAb (molicell)
rGH mRNAc (rnolicelllgene)
254-3 F 256-1 F 259-4 M 259-5 M 261-5 F 262-1 M
17 6 1 13 8 163
1305 1440 1030 8100 1140 1970
76.9 240 1030 623 142 12
266-1 M 266-9 F 271-5 F 273-7 F 274-2 F 274-4 M 274-5 M
23 18 25 43 6 49 10
120 1170 975 1110 0 621 227
5.2 65 39 25.7 0 12.6 22.7
Transgene copy number w a s measured by dot hybridization by comparing the relative intensity of duplicate dots hybridized with MT promoter probe to a reference gene (HOX locus). The ratio for nomal liver was 1.4, which was assumed to equal two genesicell. b rGH mRNA was determined by solution hybridization using oligo 150 (18).The amount of mRNA per cell was determined using M13 standards and assuming that 1 pg TNA = 0.15 pg D N A = 2.3 x lo4 cells. c Average value for the six +Introns mice is 354 mollcell/gene; average value, excluding mouse 6, is 422 rnollcelligene. Average value for the 7 -1ntrons mice is 24 molicelligene. 0
for the natural gene becomes 422 molecules/cell/gene, about 18 times that obtained for the intronless version. To perform the chromatin analysis, nuclei prepared from adult mouse livers were digested with MNase and the DNA was examined by Southern analysis using the same mMT fragment probe as in Fig. 14. To minimize the contribution to the hybridization signal from the endogenous mMT gene, and to better detect the transgenes, only mice that gave significantly stronger hybridization signals for their MNase digests (using probe I), compared to a nontransgenic control, were selected. In general, there was a good correspondence between the measured transgene copy number (Table 11) and the intensity of the MNase hybridization signal. As an important control for comparing the chromatin structures of the natural and intronless transgenes, the total chromatin was examined by ethidium bromide staining for the different mouse livers to demonstrate that no perturbations of the bulk chromatin had occurred during sample preparation. MNase ladders of high quality were obtained in all cases; nucleosome repeats were 195 k 5 bp. In Fig. 15, scans of autoradiograms from Southern blots corresponding to
371
SIGNALS IN EUKARYOTIC DNA
ELECTROPHORESIS
-
1
ELECTROPHORESIS
------
FIG.15. Effect of rCH gene introns on nucleosome alignment in transgenic mouse liver chromatin. Assessment of transgene chromatin structure by Southern hybridization. Hybridization probe I was used. Densitometer scans of autoradiograms are shown. Upper scans: mouse numbers 271-5 (b) and 262-1 (a) are 3-minute digests, 273-7 (b) is a 2.5-minute digest. Lower scans: 2-minute digests. (Reprinted with permission from Ref. 13.)
samples containing natural (a) or intronless (b) transgenes are compared. Samples compared were run on the same gel, and were blotted and hybridized to the same radiolabeled probe (probe I). This procedure largely eliminated variations arising from D N A transfer, probe labeling, and membrane washing. It is clear from Fig. 15 (top) that the periodic peaks obtained are more intense for the intron-containing transgene (trace a) than for the
372
ARNOLD STEIN
two intronless transgenes (tracings labeled b). Digests from two other transgenic mice are also compared (Fig. 15, bottom). Here, about 13 well-resolved periodic peaks were obtained for the intron-containing sample, mouse 254-3 (a), whereas fewer and less intense peaks were resolved for the intronless sample, mouse 274-4 (b). These results are similar to those obtained in the in vitro experiment (Fig. 14). The skewing of the DNA fragment distribution toward higher molecular weights in nuclear digests compared to the in vitro-assembled chromatin results from the much higher initial molecular weight of the chromatin in nuclei compared with the plasmid construct. Although introns did not serve to phase nucleosomes with respect to promoter sequences, their presence facilitated the formation of regularly spaced nucleosomes over the rGH gene and promoter, both in chromatin from transgenic mice and in chromatin assembled in uitro. In the absence of introns the nucleosome arrangement over the promoter and rGH gene was irregular and haphazard. It is reasonable to suppose that the negative influence on transcription might arise either from inhibition of transcription initiation or transcription elongation by the presence of irregularly arranged nucleosomes. For example, closely packed nucleosomes in the promoter region might be more difficult for transcription factors to displace. Alternatively, irregular nucleosome arrangements might lead to the formation of aberrant higher order structures that occlude transcription factors or interfere with the progression of RNA polymerase. The effect of introns on transcription was clearly evident in that the average level of rGH mRNA per cell per transgene from those mice with the natural rGH gene was about 15-fold higher than those with the intronless construct. More detailed in uitro analysis revealed an array of 5 or 6 strongly positioned nucleosomes over the 3' end of the natural rGH gene, including exons 3, 4, and 5 (Fig. 1).Generation of this positioned array depended on the presence of linker histone in the assembly reaction. The l-kb region where this positioned array forms may constitute part of the nucleosome alignment signal responsible for the apparent spreading of the 195-bp repeat throughout the rGH gene and the mMT promoter. Because this kilobase of DNA includes exons 3 to 5 and introns C and D, it is easy to understand why removing the introns from this region could impair proper nucleosomal organization. Although the idea of a chromatin-organizing region (Section 111,B,4) located at the 3' end of the gene is attractive, other sequences can also influence the nucleosome alignment. For example, if we start with the intronless construct b that is not expressed well and insert only intron A to yield construct c, nucleosome alignment in the promoter region improves (data
SIGNALS IN EUKARYOTIC DNA
373
not shown) and expression increases to about half that of the natural gene (101). Over the course of these experiments, we observed that the 1.8-kb mMT sequence alone has a weak tendency to align nucleosomes, and also that a positioned nucleosome formed over intron A (data not included). It seems likely that the presence of intron A alone fortuitously strengthened nucleosome alignment over the mMT sequence by adding a positioned nucleosome in phase with the mMT signal. Addition of introns A and B (construct d) inhibited expression and nucleosome alignment, compared with intron A alone, perhaps because intron B (718 bp), which contains a 195-bp tandem repeat sequence (102),has a tendency to position nucleosomes in phase with the nucleosomes that form downstream on the natural rGH gene (data not shown), but out of phase with the mMT signal; hence, the promoter region receives two weak conflicting nucleosome alignment signals. Thus, it is not surprising that complex effects can occur when unnatural sequences are juxtaposed. Nucleosome alignment in vitro appears to be directed over the rGH gene (with spreading over the promoter region), but not over the 3’ flanking sequences. Nucleosome alignment in uitro exclusively on the transcribed region was also observed for the chicken ovalbumin gene (Section V,A). In light of these observations, it is plausible that strong nucleosome alignment signals might be present in transcribed regions. There is now strong evidence that the transcription process causes disruption of nucleosome arrays (103, 104). Thus, it makes sense that a replication-independent mechanism that does not rely on chromatin assembly factors evolved to realign nucleosome arrays on transcribed regions of DNA after the disruption incurred by the passage of RNA polymerase. Irregular arrays might otherwise condense into tightly compacted irregular higher order structures that would interfere with subsequent rounds of transcription, similar to what appears to occur in mice for intronless constructs, where the nucleosome alignment signals present in the genomic DNA are perturbed by intron removal. To summarize this section, it has been shown that vertebrate genomic DNA contains nucleosome-aligning signals. Such signals are not found in Escherichia coti DNA (K. Liu and A . Stein, unpublished observations). The overall density and arrangement of these signals in vertebrate genomic DNA are not yet known. Analysis of several large continuous regions of genomic DNA and cellular chromatin should provide this information. Preliminary data suggests a “mosaic” model of chromatin organization. Well-ordered regions of cellular chromatin (with varying periodicities), generally containing no more than 10 nucleosomes, appear to alternate with less-ordered regions. This arrangement appears to result from nucleosome-aligning signals present in genomic DNA.
374
ARNOLD STEIN
VI. Chromatin Assembly on Plasmids in Transfected Cells Although transient transfection assays have been widely used to study gene regulation, little attention has been paid to the chromatin structure of the transfected DNA template. Because chromatin structure can influence gene expression, it seems important to know how the transfected DNA is packaged on its entry into the cell nucleus. It has been reported, for example, that calcium-phosphate-transfected DNA is sometimes assembled into nonnucleosomal material of an unknown nature (105, 106). It is known that this method leads to formation of large concatamers (107),which apparently facilitate incorporation into genoinic DNA, generally required for stable transfection (107-109). On the other hand, DEAE-dextran-transfected DNA remains episomal(110) and has been reported to be efficiently assembled into “typical” chromatin structures (111). Recent studies using the DEAE-dextran method have provided some additional insights.
A. DNA Sequence Affects Nucleosome Ordering on Replicating Plasmids in Transfected COS-1 Cells and in Vitro Plasmids that contain the SV40 replication origin replicate (112) and are assembled into minichromosomes when transfected into COS-1 cells using the DEAE-dextran technique (111). Chromatin assembly on replicating plasmid DNA in the nuclei of these monkey kidney cells, maintained in culture, should resemble that of the cellular DNA to some extent. Moreover, histone HI is abundant in the nucleus of these cells and should be expected to interact with and exert its influence on the nucleosomes of the plasmid chromatin. However, in the one transfection study where MNase ladders were reported ( I l l ) , they were rather poor, suggesting that the nucleosome arrangement was not very regular. This result is in apparent conflict with results obtained with SV40 minichromosomes (113,114), which exhibit extended MNase ladders. Because of the apparent differences observed in the regularity of nucleosome spacing in these studies, a number of constructs containing the SV40 replication origin, and in some cases additional SV40 sequences, were transfected into COS-1 cells and the chromatin structures of the transfected DNA were examined by MNase digestion. It was found that constructs containing the SV40 early-region (approximately base-pair numbers 2600-5000 on the 5243-bp circular map) formed nucleosome arrays significantly more ordered than constructs lacking this region. Moreover, this region of SV40 DNA assembled into a highly ordered nucleosome array in uitro, with the same
SIGNALS IN EUKARYOTIC DNA
375
200-bp repeat observed in transfected cells (14). These results suggest that the SV40 early-region contains nucleosome alignment signals that are largely responsible for forming the well-ordered nucleosome arrays found on SV40 minichromosomes assembled in cell nuclei.
B. Nucleosome Ladders Having Anomalous DNA Lengths Are Generated from Chromatin Assembled on Nonreplicating Plasmids in Transfected Cells
Unexpected and curious results were obtained when nonreplicating plasmids were transfected into a variety of cell types. Highly ordered nucleosome arrays were detected, irrespective of the construct used, but the nucleosome ladders generated were anomalous (Fig. 16). Instead of the typical 180- to 190-bp multiples generated from bulk cellular chromatin (B), ladders of DNA fragments with lengths of approximately 300, 500, 700, 900, etc. were generated (A). Analysis of such ladders (C) shows that mononucleosome bands were absent and all other oligomer lengths were shortened by about 116 b p (115).These anomalous ladders bear a marked similarity to what has been observed in some studies of active chromatin (116, 117). It has been suggested (116)that active nucleosomes have an altered protein composition and consequently are much more susceptible to exonucleolytic trimming by MNase, thereby leading to oligomers shortened by about 50 bp from each end. Although it is tempting to attribute the differences between replicating and nonreplicating plasmids in transfected cells to the replication process directly, this may not be the best explanation. In this study, it was also demonstrated that nonreplicating plasmid chromatin in transfected cell nuclei is largely insoluble after fragmentation by MNase, in contrast with bulk cellular chromatin or with replicating plasmid chromatin (115).Such insolubility suggests that it is associated with nuclear structures. Association with nuclear structure is also a characteristic of active chromatin (117-120). An interesting hypothesis (115)is that most of the transfected DNA that enters the nucleus through nuclear pores remains associated with nuclear structures. Assembly into chromatin under such conditions then leads to altered chromatin structures with properties similar to those of active chromatin. For replicating plasmid chromatin, the DNA molecules produced by replication are not associated with nuclear structures and are present in large excess. This idea is consistent with the hypothesis of Blobel (121), which proposes that chromatin regions to be activated associate with the nuclear pore complex and associated nuclear structures in a cell-type-specific fashion.
A
C
M
1
2
B
2
1800 1
oligomer number
1
M
377
SIGNALS IN EUKARYOTIC DNA
VII. Perspective It has been demonstrated theoretically that by simply specifying that a nucleosome (generally containing histone H1) should occupy 166 bp and that the average nucleosome repeat should be 196 bp, nucleosomes with entirely random positions and linker lengths would generate MNase ladders essentially indistinguishable from prototype rat liver chromatin (16). It needs to be kept in mind that MNase ladders alone contain only a superposition of average excised nucleosome oligomer DNA lengths, which provides a limited amount of information. Also, this “statistical positioning” model is not in good agreement with experiment when applied to cellular chromatin with shorter or longer average nucleosome repeats than that found in rat liver (Fig. 4). Nevertheless, this simple model might satisfactorily describe the nucleosome arrays formed in Xenopus oocytes or Drosophila embryos and in the in uitro systems prepared from extracts of these cells. Random nucleosome formation significantly limits the amount of information that can be contained in chromatin. However, there is reason to believe, that chromatin can be assembled in more than one way, with the net result being the formation of apparently ordered nucleosome arrays. The results obtained with the histone H1-dependent in vitro chromatin assembly system described here indicate that the DNA base sequence can influence chromatin assembly. Formation of highly ordered nucleosome arrays requires the presence of particular DNA sequences that appear to have a tendency to position some nucleosomes about the right distance apart. When reconstituted chromatin was incubated with histone H1 and polyglutamate, nucleosome positioning became stronger and ordered arrays of nucleosomes with physiological spacings formed. Interestingly, the nucleosome spacing periodicity appears to be encoded in the DNA, and in some cases, a nucleosome array with a well-defined periodicity was found to spread from a nucleating region of DNA onto adjacent DNA sequences that ~~
~~
FIG.16. Nucleosome ladders generated by MNase digestion of nuclei from transfected mouse Ltk- cells. The cells were transfected with pBR327 containing a 1.9-kbp chicken ovalbumin gene PuuII fragment. Nuclei were digested for 2 minutes (lane 1) or 4 minutes (lane 2). Lane M contained 32P-labeled (plus unlabeled) 4x174 RF HaeIII + AccI fragments as size markers; lengths in base pairs are indicated. (A) Southern blot specifically detecting transfected DNA; pBR327 DNA was used as the hybridization probe. A mononucleosome band is not present, and the other bands of the ladder appear to he about 100 bp shorter than expected. The arrowhead identifies heterogeneous length DNA